Tuesday, 20 June 2017

Syllable segmentation: showing an error message if input not understood in full

In my previous post about filling in the gap in Bewnans Ke at the Cornish language weekend, I noticed that the Cornish word 'vyajya' (to travel) is not understood in full by the syllable segmentation module of taklow-kernewek, if the reverse segmentation mode is used starting from the end and working backwards, since it assumes the penultimate syllable is 'yaj' starting with a semivocalic y rather than a vowel y. This leaves 'v' on its own which is not matched by the regular expression as a syllable.

This can be compounded if accented or non-alphabetic characters are included. A warning can now be given in cases where not all of the input word is matched, using the command-line syllabenn_ranna_kw.py with the --warn option, or by checkboxes in sylrannakwGUI.py and sylrannacyGUI.py

The window in my netbook. If the box is ticked, a warning is given. One of my next things to do is to make the output box a little cleverer to avoid splitting lines in the middle of words.
The interface language is a bit confused, since the explanatory text here remains in Cornish, although the warning message is in Welsh this was done in not a very rational way, since this is hard-coded at the moment to appear in Cornish only in short or line mode, and bilingually Cornish/English in long mode, except if the CYmode flag is set this gets overridden and it displays the Welsh version. It probably needs a bit of an overhaul to change the language in a similar way to the corpus statistics module.

Pennseythun Gernewek 2017

Back in April I attended the Pennseythun Gernewek, the annual Cornish language weekend run by Kowethas an Yeth Kernewek where Cornish speakers from around Cornwall, and beyond gather to speak the language, and participate in lessons and activities. There is something for every level of previous knowledge of the language, from Dallethoryon Lan (complete beginners) to Bagas Freth (the fluent speakers group). I definitely recommend it.
A couple of things that Bagas Freth did this year was a talk by Ken George on translating song lyrics into Cornish, and a workshop on composing a part of the lost part of Bewnans Ke (events taking place during the missing part of the manuscript before the portion that was preserved).

Bewnans Ke an Blydhynnyow Kellys

We divided into groups and composed several stanzas covering Ke in his time in Brittany before his coming to Cornwall. I partly composed this:

7 My a'th pys Duw pyth yw res     A
4 dhymm gul ragos     B
7 A wre'ta ow dannvonn mes A
4 A'n benneglos B
4 Ny wrav govynn      C
7 bywnans fethus yn palas    D
7 Argheskop gans boes a vlas     D
7 gwell yw genev rewlys tynn     C

7 Vyajya a wre'ta'n skon A
4 Bys Rosewa B
7 War an mor y glywydh son A
4 Klogh y'n le na B
4 Vydh ow seni C
7 Grasek ov bos yn ayr fresk  D
7 Y'n le skrifa orth ow desk D
7 Dhe Gernow mos a wren ni C

The numbers at the left are the number of syllables, and the letters at the right mark the rhymes.
There are two formats used in Bewnans Ke that we followed in composing these stanzas. This had a structure of 7, 4, 7, 4, 4, 7, 7,7 syllables per line. The other possible structure was 7,7,7,7,4,7,7,7. With either of these structures either one person was speaking for all 8 lines, or person 1 for 5 lines and person 2 for 3. In the first of the stanzas above, it is all Ke speaking, and the second has 5 lines of God in reply and 3 with Ke again.
A broad translation is as follows:

I pray God what must I do for thee. Will you send me away from the cathedral?
I do not ask for a luxurious life in an archbishops palace with fine food. I prefer a strict rule.
You will voyage soon, to Rosewa (placename, meaning the Roseland?) On the sea you will hear a sound of a bell ringing there.
I am grateful to be in fresh air rather than writing at my desk. To Cornwall we will go.

Output of taklow-kernewek syllable segmentation tool.

Output of taklow-kernewek syllable segmentation tool. I changed to forward segmentation due to an issue with the word 'vyajya' see below.

Detailed output for 'Vyajya' - forward segmentation. The regular expression starts at the beginning of the word and works forwards.

Detailed output for 'Vyajya' - forward segmentation. The regular expression starts at the end of the word and works backwards. Here we see the '-ya' at the end is matched first, then 'yaj' is matched with a semi-vocalic y which leaves the 'V' unmatched
The issue with the word vyajya shows that perhaps more work is needed on the syllable segmentation program, perhaps to warn the user if not all of the input is consumed by the segmentation, and also to perhaps return multiple segmentations where they exist.

Song lyrics

In another of the sessions, Ken George covered some aspects of translating a song, including observing its metre and rhyme, and the different styles of rhyme in songs. After the talk, we divided into smaller groups and tried translating a song each.
One group got a song in French "Charles Trenet - La Mer" which I think was quite a difficult challenge for them. I don't have a copy here of what they came up with.
I was part of a group working on Moon River (Louis Armstrong).
With a bit of artistic licence and adaptation:

Original

Moon river, wider than a mile
I'm crossing you in style some day
Oh, dream maker, you heart breaker
Wherever you're goin', I'm goin' your way

Two drifters, off to see the world
There's such a lot of world to see
We're after the same rainbow's end, waitin' 'round the bend
My huckleberry friend, moon river, and me

Noting rhymes


mile, style

day, way

maker, breaker

world, world

see, me

end, bend, friend

Translation/Adaptation


Dowr Tamar, moy es mildir bras
Y treusyav dha dhowr glas neb jydh
Ty a wra hunros, mestres fell dell os
Pub le a wre'ta mos, ena my a vydh
Dew wandror mos a-dro dhe'n bys
Meur a'n norvys a welyn ni

Ni a helgh penn kammneves, ni a wort a-dro dhe'n kamm
Ha my a wra ri dhis amm
Dowr Tamar, ha my.

Saturday, 10 June 2017

General Election 2017 - and trying out the Python pandas library

I am happy to report that my computer is working again, after a reformat, Windows 7 installation, then a long delay while Ubuntu partition resizer stalled, then fixed that with a GParted iso, and then reinstalled Ubuntu 17.04 and I am part way through restoring data from backups.
QGIS is working with version 2.18.

Although I have used Python csv, matplotlib and numpy libraries to read data from files and plot I hadn't used the pandas library for anything much, so I thought I'd do so. I have in previous code often built up a list manually by setting data = [] and then using append to build up the list, which can be slow for large datasets.

First I need some data, and I will use the general election results for the 6 parliamentary constituencies for the House of Commons in Cornwall:


Constituency,Surname,Forenames,Description,Votes,Turnout
Camborne and Redruth,EUSTICE,Charles George,Conservative Party,23001,70.96
Camborne and Redruth,WINTER,Graham Robert,Labour Party,21424,70.96
Camborne and Redruth,WILLIAMS,Geoffrey,Liberal Democrats,2979,70.96
Camborne and Redruth,GARBETT,Geoffrey George,Green Party,1052,70.96
North Cornwall,MANN,Scott Leslie,Conservative Party,25835,74.2
North Cornwall,ROGERSON,Daniel John,Liberal Democrats,18635,74.2
North Cornwall,BASSETT,Joy,Labour Party,6151,74.2
North Cornwall,ALLMAN,John William,Christian Peoples Alliance,185,74.2
North Cornwall,HAWKINS,Robert James,Socialist Labour Party,138,74.2
South East Cornwall,MURRAY,Sheryll,Conservative Party,29493,74.2
South East Cornwall,DERRICK,Gareth Gwyn James,Labour Party,12050,74.2
South East Cornwall,HUTTY,Philip Andrew,Liberal Democrats,10346,74.2
South East Cornwall,CORNEY,Martin Charles Stewart,Green Party,1335,74.2
St Austell and Newquay,DOUBLE,Stephen Daniel,Conservative Party,26856,69.3
St Austell and Newquay,NEIL,Kevin Michael,Labour Party,15714,69.3
St Austell and Newquay,GILBERT ,Stephen David John ,Liberal Democrats,11642,69.3
St Ives,THOMAS,Derek,Conservative Party,22120,76.1
St Ives,GEORGE,Andrew Henry,Liberal Democrats,21808,76.1
St Ives,DREW,Christopher John,Labour Party,7298,76.1
Truro and Falmouth,NEWTON,Sarah Louise,Conservative Party,25123,75.9
Truro and Falmouth,KIRKHAM,Jayne Susannah,Labour Party,21331,75.9
Truro and Falmouth,NOLAN,Robert Anthony,Liberal Democrat,8465,75.9
Truro and Falmouth,ODGERS,Duncan Charles,UK Independence Party,897,75.9
Truro and Falmouth,PENNINGTON,Amanda Alice,Green Party,831,75.9

Here is the Python code, which expects the above data in a file called electionresults2017.csv which it reads using csv.DictReader which produces an iterator which I convert to a list and create a pandas data frame object.
The code is also available in the dataviz-sandbox repository at my Bitbucket account.


import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import csv

def chooseColour(desc):
    if desc == "Labour Party":
        return "red"
    elif "Liberal" in desc:
        return "yellow"
    elif "Green" in desc:
        return "green"
    elif "Conservative" in desc:
        return "blue"
    else:
        return "magenta"
    
with open('electionresults2017.csv', 'r') as spamreader:
    dframe = pd.DataFrame(list(csv.DictReader(spamreader)))

consts = set(dframe['Constituency'])
print(consts)
fig = plt.figure()
plt.suptitle("Distribution of Votes in Cornwall\nGeneral Election 8th June 2017")
plt.axis('equal')
plt.xticks([])
plt.yticks([])
for p, c in enumerate(consts):
    # print("Constituency of {}".format(c))
    surnames = dframe.loc[dframe.Constituency == c, ['Surname']].values
    forenames = dframe.loc[dframe.Constituency == c, ['Forenames']].values
    descs = [d[0] for d in dframe.loc[dframe.Constituency == c, ['Description']].values]
    plotcolours = [chooseColour(d) for d in descs]
    forenames = [f for f in forenames]
    forename = [f[0].split()[0] for f in forenames]
    names = [f+" "+s for f,s in zip(forename, surnames)]
    names = [n[0] for n in names]
    votes = dframe.loc[dframe.Constituency == c, ['Votes']].values
    votes1 = [v[0] for v in votes]
    namedescsvotes = [n+"\n"+d+"\n"+v for n,d,v in zip(names, descs, votes1)]

    totalvotes = np.sum(votes, dtype=np.int)
    # print(totalvotes)
    ax = fig.add_subplot(2, 3, p+1)
    ax.axis('equal')
    ax.set_title("{C}: {t} votes cast".format(C=c, t=totalvotes))
    ax.set_xticks([])
    ax.set_yticks([])
    ax.pie(votes, radius = np.sqrt(totalvotes/50000.0), labels=namedescsvotes, colors=plotcolours, autopct='%1.1f%%')

                          
fig2 = plt.figure()
plt.suptitle("Representation of Cornwall in the House of Commons")
plt.axis('equal')
plt.xticks([])
plt.yticks([])
for p, c in enumerate(consts):
    descs = dframe.loc[dframe.Constituency == c, ['Description']].values
    descs = descs[0]
    plotcolours = [chooseColour(d) for d in descs]
    print(descs)
    ax = fig2.add_subplot(2, 3, p+1)
    ax.axis('equal')
    ax.set_title(c)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.pie([1], labels=descs, colors=plotcolours)

plt.show()

And the results:

The votes cast for the various candidates and parties in each of the 6 constituencies covering Cornwall and the Isles of Scilly. George Eustice MP uses his middle name rather than his first name Charles.
In comparison to votes cast, here are the parties represented in the House of Commons for constituencies in Cornwall.

I have also tried out matplotlib_venn. It takes as arguments the keyword subsets, which for the function venn2 expects A and (not B), (not A) and B, and (A and B). Below, voteothers is the number who voted for non-elected candidates, the second is by definition an empty set (those who didn't vote and cast a vote for the winner), and votewinner is those who voted for the winner.

Since Cornwall is a one-party state, the colours can be hard-coded.

import matplotlib.pyplot as plt
import matplotlib_venn as venn
...
    # subsets = (Ab, aB, AB)
    v = venn.venn2(subsets=(voteothers, 0, votewinner), set_colors =('lightgray', 'navy'), set_labels=('Other candidates', winnername))
...


The function venn3 expects a 7 element tuple as below. In this case, A is the electorate, B is those who voted for the winner, and C is those who voted for other candidates.

import matplotlib.pyplot as plt
import matplotlib_venn as venn
...
    # subsets=(Abc, aBc, ABc, abC, AbC, aBC, ABC)
    v = venn.venn3(subsets=(novote, 0, votewinner, 0, voteothers, 0, 0), set_colors =('lightgray', 'blue', 'red'), set_labels=('electoral register', winnername,'Other candidates'))
...

It would be nice to make the zero sets disappear, maybe there is a way to do this in the documentation somewhere.

Sunday, 4 June 2017

Computer troubles and further updates to TaklowKernewek tools

My desktop computer is currently incapacitated, it has for the past day and a half been resizing a NTFS partition after I made the decision, after breaking my QGIS installation while attempting to upgrade it to 2.18, to go back to a dual-boot system rather than a Windows virtual machine running in Ubuntu. Meanwhile I had also broken my netbook because it crashed while trying to upgrade to Ubuntu 17.04. I fixed this by doing a reinstall from a bootable USB stick using the UbuntuMATE iso, formatting only the root partition leaving data in /home intact.

This shouldn't deter anyone from using QGIS, or Ubuntu, just that I was doing odd things with GDAL and kealib, making myself able to use multiband .kea files  in QGIS running with the system environment variables, but use conda to run RSGISLib in Python 3, and also conda to run the European Space Agency's SNAP toolbox to process Sentinel 2 images.
There was a resulting mismatch between different versions of GDAL which caused problems when I tried to update to QGIS 2.18, which failed due to conflicting dependencies, and then trying to revert to 2.14 also failed, so therefore QGIS no longer worked at all.
With my netbook, I had installed so much on it that the root partition was close to full and I think that was what started causing problems.

Fortunately all data is backed up on external drives.

However it is not so easy to do much mapping work on my netbook, although I do now have QGIS 2.18 installed on it, but I have done a bit more on TaklowKernewek, including making the netbook mode work for more of the apps, and developing the mathematics quiz app.

The netbook modes were modified to allow the corpus statistics app, and a couple of others to fit the screen of my netbook (EeePC 1005HA), by adjusting font sizes as well as some input/output box heights.
Inflecting the verb 'covhe' (in SWF Tradycyonal) - Cornish for 'remember'



Inflecting the verb 'covhe' (in SWF Tradycyonal) - Cornish for 'remember'


Basic translation memory using Skeul an Yeth 1 example sentences. The font size of three of the four buttons has been shrunk a little so that labels do not grow bigger than the box itself.

more of the output

some further output of the translation memory


word frequency table for words of 5 or more letters in 'Origo Mundi'. The font sizes are quite small to read to make sure that the boxes do not fall off the bottom of the screen.
The word frequency bar chart has been tweaked as to not use white as a colour, since matplotlib may not always outline the bars.

counting syllables


Transliterating from Kernewek Kemmyn to Standard Written Form. It is possible that goelann should instead go to goolan in SWF, it certainly did in 2008 SWF, and may still do so despite being a multisyllable word.

Mathematics Quiz app

A new difficulty level allow input numbers up to 100, and the GUI has been adjusted to work better on smaller screens (though further work on this may be needed).
The options radio buttons have been consolidated into one column in the GUI

The program now reports back the answer to the previous question that you gave if you were correct as well as if you were wrong. This still looks a little confusing since the question now in the upper box is the second question whereas the answer is for the first. If you get a question right, you get 1 point. The bonus for speed has been reduced compared to earlier versions, to get any speed bonus at all you need to answer within 10 seconds.

'Pur gales' allows the computer to choose numbers up to 100, in this version the addition and subtraction are shown as symbols, due to possible confusion with 'ha' internal to the number such as 'dew ha dew ugens' (42). 'tri ha dew ugens marnas dew ha dew ugens' might be interpreted as 43 - 2 + 40 = 81 rather than 43 - 42 = 1.

There is a need to make some further adjustments to the Tkinter GUI code since at present the widgets aren't filling the window after it is maximized.