UIT The arctic university of Norway > Giellatekno
 

Meeting_2007-12-10

Meeting setup

  • Date: 10.12.2007
  • Time: 09.30 Norw. time
  • Place: Internet
  • Tools: SubEthaEdit, iChat/Skype

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from last week
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 09: 57.

Present: Børre, Ilona, Sjur, Thomas

Absent: Risten, Per-Eric, Trond, Tomi

Agenda accepted as is.

2. Updated task status since last meeting

Børre

  • fix bug 550
    • Finally done
  • finalise InDesign hyphenator
    • not done
  • update usage and installation documentation
    • read the present documentation carefully, and see if there are missing parts
      • done
    • documentation meeting on Wednesday with Sjur, Thomas
      • done
  • 100 CD covers should be picked up in Tromsø
    • done
  • new/updated front page (old front page to history page)
    • not done
  • press release
    • Thomas did it
  • fix bugs!

Ilona

  • lexicalise smj missing words.
  • Work with missing forms in sme typos.txt
    • Not done yet, but working on it.
  • Translation
    • Hopefully done.

Maaren

  • lexicalise actio compounds

Per-Eric

  • check some unusual words from the Olavi missing list which are still not lexicalised
  • derivations tests
  • fix bugs!

Risten

  • set up CD-printing printer
  • test printer
  • finish the cd cover and cd design
  • Print 50 CDs, take them to Oslo as backup

Saara

  • add new XSL/XML headers for proofing test docs
  • Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not)
  • add nested error markup to xml conversion
  • discuss more parallel texts

Sjur

  • fix bug 550
    • done
  • document Windows CD installation work-around
    • not yet
  • finalise InDesign hyphenator
    • done a lot, but no documentation, and the download package isn't yet finished
  • update usage and installation documentation
    • done, not complete
    • documentation meeting on Wednesday with Thomas, Børre
      • done
  • analyse hyphenation test results
    • done, with feedback from Polderland
  • fix remaining bugs - golden master by end of this Monday
    • all serious bugs fixed, some smaller ones left - GM declared on Friday
  • new/updated front page (old front page to history page)
    • not yet done
  • press release
    • Thomas started on this one
  • fix bugs!

Thomas

  • sme->smj lexicon conversion to build bilingual lexicon resources
    • not any this week
  • test hyphenation
    • little done
  • analyse hyphenation test results
    • little done
  • look at test cases still not behaving properly
    • little done
  • update usage and installation documentation
    • done
    • search Bugzilla for documentation issues in Divvun => add to docu
    • done
    • documentation meeting on Wednesday with Sjur, Børre
    • done
  • fix bugs!
    • little done

Tomi

  • Hunspell lexicon conversion
  • fix remaining bugs - golden master by end of this Monday
  • fix bugs!

Trond

  • sme->smj lexicon conversion to build bilingual lexicon resources
  • analyse hyphenation test results
  • fix bugs!.

3. Documentation

Needs to be fixed by Wednesday.

TODO:

  • fix bug 550 ( Børre, Sjur)
    • investigation and fix scheduled for Tuesday

4. Corpus infrastructure

Nothing.

5. Infrastructure

TODO:

  • add Jabber account in iChat (all)

6. Linguistics

North Sámi

Hyphenation test results:

Correct:
--------
konseartaprográmma      kon-sear-ta-pro-grám-ma
konseartaeahkedis       kon-sear-ta-eah-ke-dis
Márkomeanu      Már-ko-mea-nu
lávvardateahkeda        láv-var-dat-eah-ke-da
konseartaeahkedis       kon-sear-ta-eah-ke-dis
lávvardateahkeda        láv-var-dat-eah-ke-da
servodatberošteaddji    ser-vo-dat-be-roš-tead-dji
sámegillii      sá-me-gil-lii
sátnegovat      sát-ne-go-vat
morašluohti     mo-raš-luoh-ti
Justislávdegoddi        Jus-tis-láv-de-god-di
Sámedikkiin     Sá-me-dik-kiin
olggosaddán     olg-gos-ad-dán
luitet  lui-tet
kristtalaš      krist-ta-laš
orrun   or-run
Lotnolasealáhusas       Lot-no-las-ea-lá-hu-sas
juoiganjuristan juoi-gan-ju-ris-tan
issoras is-so-ras
suinna  suin-na
dáinna  dáin-na
Háliidivččen    Há-lii-divč-čen
bisánivčče      bi-sá-nivč-če
duostan duos-tan
Gárdegobba      Gár-de-gob-ba
Čeakčačahca     Čeak-ča-čah-ca
Gobba   Gob-ba
Sáivagobba      Sái-va-gob-ba

Missing hyph points (/):
------------------------
olgobáikkis     ol-go/báik-kis
máilmmi má/ilm-mi
gaskaijabeaivváš        gas-ka/i-ja-beaiv-váš
dehálaš de-há#laš

Incorrect (#):
--------------
lotnolasealáhussan      lot-no-la#s/e#a/lá/hus-san
bearrašiin      be#ar-ra/ši#in


PLX XFST code:
--------------
ol^go^báik^kis  NIE

gas^kai^ja^beaiv^váš-   NBOX
gas^kai^ja^beaiv^váš    NIOE
gas^kai^ja^beaiv^váš-   NIE
gas^kai^ja^beaiv^váš-   GaIE
gas^kai^ja^beaiv^váš    NBO
gas^kai^ja^beaiv^váš    GaBO

bear^ra^šiin    NIE

lot^no^la^sea^lá^hus^san        NIE

TODO:

  • test latest hyphenator (Sjur)
  • analyse test results (Thomas, Sjur, Trond)

Lule Sámi

Hyphenation test results:

Correct:
--------
viessomguhkes   vies-som-guh-kes
viessomvuohkáj  vies-som-vuoh-káj
árvvovuodo      árv-vo-vuo-do
väráltárbbe     vä-rált-árb-be
åhpadusorganisásjåvnån  åh-pa-dus-or-ga-ni-sá-sjåv-nån
häjmmadáfo      häjm-ma-dá-fo
árbbedáhpe      árb-be-dáh-pe
barggovuogijt   barg-go-vuo-gijt
rijkadajva      rij-ka-daj-va
ássjedåbdde     ás-sje-dåbd-de
láhkaásadimesa  láh-ka-á-sa-di-me-sa
giellalágajn    giel-la-lá-gajn
sámegiellaj     sá-me-giel-laj
buorrelágásj    buor-re-lá-gásj
árbbedábálattjat        árb-be-dá-bá-lat-tjat
láhkatæksta     láh-ka-tæks-ta
guosski guoss-ki
rijkalattjat    rij-ka-lat-tjat
organisásjåvnån or-ga-ni-sá-sjåv-nån
hábbmidime      hább-mi-di-me
rábmakonvensjåvnå       ráb-ma-kon-ven-sjåv-nå
árvvalasstet    árv-va-lass-tet
ássjedåbddejuogos       ás-sje-dåbd-de-juo-gos
unneplågogielajt        un-nep-lå-go-gie-lajt
láhkaásadimesa  láh-ka-á-sa-di-me-sa
biejvveavijsav  biejv-ve-a-vij-sav
sámegiellaj     sá-me-giel-laj
unneplågogielajn        un-nep-lå-go-gie-lajn
ministarjuohkusis       mi-nis-tar-juoh-ku-sis
guosski guoss-ki
Dussnagiehtje   Duss-na-gieh-tje
Divtasvuodna    Div-tas-vuod-na
Gåhpejávrre     Gåh-pe-jávr-re
Gásluokta       Gás-luok-ta
Helmukvuodna    Hel-muk-vuod-na
Ibboluokta      Ib-bo-luok-ta
Julevædno       Ju-lev-æd-no
Jåhkmåhkke      Jåhk-måhk-ke
Jåkmåhkke       Jåk-måhk-ke
Jåhkåmåhkke     Jåh-kå-måhk-ke

Missing hyph points (/):
------------------------
Jienastim#njuolgadusá   Jie/nas/tim#njuol/ga/du/sá (# not removed from input)
sierralágásj    sier-ra/lá/gásj
orgánajs        or-gá/najs
Orgánajs        Or-gá/najs


Incorrect (#):
--------------
javllamáno      ja#vl-la/má/no
biejvveávijsav  bie#jv/ve/á-vij/s#av
suomagiella     su#o-ma-giel-la
sierraláhkáj    sier-ra#láh-káj

PLX XFST code:
--------------
javl^la#má^no   NIE
javl^la#má^no   GaIOE

or^gá^najs      NIE

suo^ma#giel^la- NIE
suo^ma#giel^la- NBOX
suo^ma#giel^la  NIOE
suo^ma#giel^la  NBO

TODO:

  • sme->smj lexicon conversion to build bilingual lexicon resources, and increase smj coverage (Trond, Thomas, Svenne). Add the words.
  • test hyphenation (Sjur, Thomas)

7. Name lexicon infrastructure

Delayed till Divvun2 (or after release of Divvun1).

Decisions made in Tromsø can be found in this meeting memo.

TODO:

  1. fix bugs in lexc2xml; add comments to the log element (Saara)
  2. finish first version of the editing (Sjur)
  3. test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
  4. make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
  5. convert propernoun-($lang)-lex.txt to a derived file from common xml files ( Sjur, Tomi, Saara)
  6. implement data synchronisation between risten.no and the cvs repo, and possibly other servers (ie the G5 as an alternative server to the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way)
  7. start to use the xml file as source file
  8. clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
  9. merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
  10. publish the name lexicon on risten.no (Sjur)
  11. add missing parallel names for placenames (linguists)
  12. add informative links between first names like Niillas and Nils ( linguists)

8. Proofing tools

Hunspell

Continuously improving.

TODO:

  • Hunspell lexicon conversion (Tomi, Børre)

Testing

Spelling Error Markup

This will wait till after the release.

TODO:

  • Set up ways of adding meta-information (source info, used in testing or not, added to lexicon or not) (Saara)
  • move Steinar's error markup in the xml files to (a copy of) the original ( Børre, Kimme)
  • add nested error markup to xml conversion (Saara)
  • test new and nested error markup (Sjur)

Automated testing

TODO:

  • improve hyphenation testing (Sjur)
    • not done yet

MS Office

Windows was done = ok, Ilona has tested PowerPoint and Word on Mac, and Trond tested Excel (you have to turn on correction, otherwise it will give you English spell checking, it is the same for all languages). Entourage is still untested.

TODO:

  • test the proofing tools with all MS Office applications (Børre, Thomas)
    • all but Entourage done (Kimme, Børre)
      • Entourage ok
  • add Excel behaviour to the documentation (Trond, Sjur)
    • done

Lexicon conversion to the PLX format

Open issues based on test results:

sámi-dáru - not accepted => Gen+hyph compound, is not allowed with hyphen. We can allow such compounds without too much overgeneration by adding the hyphen to the last part, ie -dáru in the PLX entry. => Bugzilla as feature request.

smj

  • 419 - flagged, 1. sugg same as input => PLD bug
  • 482 - fixed
  • 484 - fixed
  • 518 - regression - Fuoskok = pl+clitic as well as derivation = won't fix
    • stuorFuosskok, StuorFuosskok suggested, should not be (neither accepted)
  • 575 - name+name = double hyphens in sugg
  • 584 - input and sugg is the same
  • 589 - not fixed, sent to Polderland
  • Svierigadárogielan - fixed!

sme

  • 397 - fixed
  • 425 - roman number - will not be fixed in 1.0 release
  • 431 - fixed
  • 461 - ovda fixed
  • 489 - fixed
  • 518 - regression - plural same as derivation, won't fix
    • stuorGuovdageainnut suggested, should not be
    • Stuoraguovdageainnut flagged and suggested => PLD
  • 588 - regression - r. accepted as final part
  • 592 - cases of caritives on -heapmi
  • 397 - Guovdageaidnu-láđđi - fixed

TODO:

  • look at test cases still not behaving properly (Thomas, Tomi)
  • check that the smj R lexicon is identical to sme ( Thomas)
    • done

InDesign tools

Almost finished, found a bug (reported to Polderland).

TODO:

  • improve hyphenation testing (Sjur)
  • upload (Sjur)
  • documentation (Børre, Sjur)

Hyphenators

Testing! - Done, see above.

Final release

Schedule and tasks for the remaining weeks:

This week:

  • Monday - fix remaining speller bugs
  • Tuesday - get Polderland updates/fixes, fix bug 550, documentation fixed
  • Wednesday - finish documentation, start translation
  • Thursday - finish translation, spell check and QA, travel plans ready
  • Friday - ready to burn/release; press release

Next week:

  • Monday - regular meeting, sum up & check
  • Tuesday - we go to Oslo in the morning, meeting and preparations in the evening, our own celebration / dinner
  • Wednesday:
    • 10 AM: ceremony, official delivery to SD president, potentially also minister
    • 11 AM: official lunch, SD invites the minister for lunch
    • after lunch - finished, we go home

Hotel is booked, you have to arrange the travelling to Oslo yourself (but make sure the travel agency sends the bill to SD).

TODO:

  • set up CD-printing printer (Risten, Leif Åge)
    • done and tested
  • fix Windows CD installation bug (Sjur, Børre)
    • work-around should be documented (Sjur)
  • 100 covers will be picked up in Tromsø (Børre)
    • done
  • Print 50 CDs, take them to Oslo as backup (Risten, Julie)
    • not yet - today
  • fix remaining bugs - golden master by end of this Monday (Tomi, Sjur)
    • done - GM last Friday
  • finalise InDesign hyphenator (Sjur, Børre)
    • documentation
      • not yet finished - after 1.0 release
  • update usage and installation documentation (Børre, Thomas, Sjur)
    • search Bugzilla for documentation issues in Divvun => add to docu ( Thomas)
    • read the present documentation carefully, and see if there are missing parts => fill them in (Børre)
    • documentation meeting on Wednesday (Sjur, Børre, Thomas)
  • new/updated front page (old front page to history page) (Sjur, Børre)
  • press release (Sjur, Børre)
  • translate all new documentation (all)
  • QA all documentation (all)
  • do as much hunspell as possible (Børre, Tomi)

9. Other

Corpus contracts

Delayed till after final release.

TODO:

  • publish corpus contracts and project infra as open-source on NoDaLi-sta ( Sjur)

Bug fixing

When fixing bugs, record the version number containing the fix in the Bugzilla bug report, such that for each bug, we know exactly when it should have been fixed, in what file(s) and what version.

83 open Divvun/Disamb bugs (45 of these 83 are speller-related bugs, 38 are other bugs), and 23 risten.no bugs

10. Next meeting, closing

The next meeting is 10.12.2007, 09: 30 Norwegian time.

The meeting was closed at 10: 09.

Appendix - task lists for the next week

Boerre

iCal

  • fix bug 550
  • finalise InDesign hyphenator
  • new/updated front page (old front page to history page)
  • fix bugs!

Ilona

iCal

  • lexicalise smj missing words.
  • Help Trond with the smj dictionary.
  • Translation

Maaren

  • lexicalise actio compounds

Per-Eric

iCal

  • check some unusual words from the Olavi missing list which are still not lexicalised
  • derivations tests
  • fix bugs!

Risten

iCal

  • Print 50 CDs, take them to Oslo as backup

Saara

iCal

  • add new XSL/XML headers for proofing test docs
  • Set up ways of adding meta-information for proofing correct corpus docs (source info, used in testing or not, added to lexicon or not)
  • add nested error markup to xml conversion
  • discuss more parallel texts

Sjur

iCal

  • document Windows CD installation work-around
  • finalise InDesign hyphenator
  • update usage and installation documentation
  • new/updated front page (old front page to history page)
  • press release
  • fix bugs!

Thomas

iCal

  • sme->smj lexicon conversion to build bilingual lexicon resources
  • test hyphenation
  • analyse hyphenation test results
  • look at test cases still not behaving properly
  • fix bugs!

Tomi

iCal

Trond

iCal

  • sme->smj lexicon conversion to build bilingual lexicon resources
  • analyse hyphenation test results
  • fix bugs!.