Tuesday 20 June 2017

Syllable segmentation: showing an error message if input not understood in full

In my previous post about filling in the gap in Bewnans Ke at the Cornish language weekend, I noticed that the Cornish word 'vyajya' (to travel) is not understood in full by the syllable segmentation module of taklow-kernewek, if the reverse segmentation mode is used starting from the end and working backwards, since it assumes the penultimate syllable is 'yaj' starting with a semivocalic y rather than a vowel y. This leaves 'v' on its own which is not matched by the regular expression as a syllable.

This can be compounded if accented or non-alphabetic characters are included. A warning can now be given in cases where not all of the input word is matched, using the command-line syllabenn_ranna_kw.py with the --warn option, or by checkboxes in sylrannakwGUI.py and sylrannacyGUI.py

The window in my netbook. If the box is ticked, a warning is given. One of my next things to do is to make the output box a little cleverer to avoid splitting lines in the middle of words.
The interface language is a bit confused, since the explanatory text here remains in Cornish, although the warning message is in Welsh this was done in not a very rational way, since this is hard-coded at the moment to appear in Cornish only in short or line mode, and bilingually Cornish/English in long mode, except if the CYmode flag is set this gets overridden and it displays the Welsh version. It probably needs a bit of an overhaul to change the language in a similar way to the corpus statistics module.

No comments:

Post a Comment