UIT The arctic university of Norway > Giellatekno
 

Meeting_2007-02-05

Meeting setup

  • Date: 05.02.2007
  • Time: 09.00 Norw. time
  • Place: Internet
  • Tools: SubEthaEdit, iChat

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from last week
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Infrastructure
  7. Linguistics
  8. name lexicon infrastructure
  9. Spellers
  10. Other issues
  11. Summary, task lists
  12. Closing

1. Opening, agenda review, participants

Opened at 10: 12. Part two of the meeting opened at 14: 45.

Present: Børre, Maaren, Sjur, Steinar, Thomas

Absent: Saara, Tomi, Trond

Agenda accepted as is.

2. Updated task status since last meeting

Børre

  • send smj translations to Polderland
    • Both smj and sme are sent.
  • write form to request corpus user account
    • Not done
  • document how to apply for access to closed corpus, and details on the corpus and its use in general
    • Not done
  • add short description on our front page on anonymous cvs and corpus access, with links to relevant documentation
    • Done
  • update and fix our documentation and infrastructure as Steinar finds problem areas
    • Not done
  • continue work on script for automatic testing of the spell checker in Word
    • Not done
  • fix sme texts in corpus this month
    • Not done
  • find missing nob parallel texts in corpus
    • Not done
  • translate Windows installer text to sme
    • Thomas helped out
  • work on the Polderland data generation (PLX format conversion)
    • Concentrate on compounding
      • Not done
  • go through other directories, fix parallellity information for other documents
    • Not done
  • add sma texts to the corpus repository
    • Not done
  • order Intel Macs
    • Received them last week. Both machines are in use. Will install Windows and MS Office on Thomas' machine this evening.
  • fix bugs!

Maaren

  • tasks according to Thomas
    • Not done

Saara

  • fix sme texts in corpus this month
  • continue aligning the rest of the parallel files
  • fix problems with xml2lexc if needed
  • fix bugs!

Sjur

  • name lexicon:
    • restructure interface code for easier maintenance, coding and use
      • done and working in all checked browsers except Opera
    • refactor the rest of the SD-terms editor code
      • major rewrite and simplification ahead, based on the new interface
    • implement missing propnouns editing functions
      • started looking at these again
    • implement improvements decided upon in Tromsø
  • hire linguist and programmer
    • nothing done
  • publish corpus contracts and project infra on NoDaLi-sta
  • fix stuorra-oslolaš lower case o
  • write form to request corpus user account
  • document how to apply for access to closed corpus, and details on the corpus and its use in general
  • fix bugs!

Steinar

  • test our infrastructure and documentation - follow the documentation exactly, and find problem areas - report problems to Børre. Start: At the front page.
    • started to read the documentation and try the anonymous cvs
  • Complete the semantic sets in sme-dis.rle
    • done some work
  • missing lists
  • report conversion errors to Saara
    • not done
  • Look at the actio compound issue when adding from missing lists
    • not done
  • lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
  • Go through the Num bugs
    • not done
  • fix bugs!

Thomas

  • refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
    • not done
  • work with compounding
    • not done
  • lexicalise actio compounds
    • not done
  • Lack of lowering before hyphen: Twol rewrite.
    • not done
  • Go through the Num bugs
    • working with sme
  • fix stuorra-oslolaš lower case o
    • not done
  • implement discontinous case inflection for numbers
    • working with sme
  • produce correct number base forms in the analyzer
    • working with sme
  • fix bugs!
    • not this week

Tomi

  • add compound stems to the PLX generation
    • done
  • include numerals in the speller
    • done
  • add prefixes to the PLX
    • not done
  • add smj to PLX conversion
    • done
  • add derivations to the PLX generation
    • not done
  • fix bugs!

Trond

  • update the smj proper noun lexicon, and refine the morphological analysis, cf. the propernoun-smj-lex.txt
  • no smj.
  • fix sme texts in corpus this month
    • Worked on this
  • find missing nob parallel texts in corpus, go through Saara's list
    • Not found
  • report conversion errors to Saara
    • Worked on this
  • Go through the Num bugs
    • Not done.
  • implement discontinous case inflection for numbers
    • Not participated here
  • produce correct number base forms in the analyzer
    • Only wrote testbed, not participated otherwise
  • write form to request corpus user account
    • Not done
  • document how to apply for access to closed corpus, and details on the corpus and its use in general
    • Not done
  • Write project presentation
    • Done mostly this, 3 presentations in the pipeline
  • fix bugs!.

3. Documentation

Børre updated the front page, and Steinar started to test the quality of our documentation.

TODO:

  • write form to request corpus user account (Børre, Sjur, Trond)
  • document how to apply for access to closed corpus, and details on the corpus and its use in general (Børre, Sjur, Trond)
  • add short description on our front page on anonymous cvs and corpus access, with links to relevant documentation (Børre)
    • done

4. Corpus gathering

TODO:

  • sme texts: no new additions, fix corpus errors during this month ( Børre, Trond, Saara)
  • missing nob parallel texts should be added if such holes are found ( Børre, Trond)
  • Go through the list of missing or errouneous nob texts, based upon Saara's perfect list (Børre, Trond)
  • add sma texts to the corpus repository (Børre)

5. Corpus infrastructure

Alignment

TODO:

  • go through other directories (nob dicrectories, sd directories), fix parallellity information for other documents (2 hours) ( Børre)
    • nothing more yet

Conversion issues

TODO:

  • report conversion errors to Saara ( Trond, Steinar)

6. Infrastructure

Steinar started the QA, cf above.

TODO:

  • test our infrastructure and documentation - follow the documentation exactly, and find problem areas - report problems to Børre. Start: At the front page. (Steinar)
    • started
  • update and fix our documentation and infrastructure as Steinar finds problem areas (Børre)

7. Linguistics

North Sámi

Maaren is now working on lexicalising the actio compounds.

TODO:

  • lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji ( Maaren)
  • fix stuorra-oslolaš lower case o ( Sjur, Thomas, Trond)
    • nothing yet

Numbers:

TODO:

  • discontinous case inflection (but only for maximally three-part compound numerals) (viđain/goalmmát/logiin and guvttiin/logiin/viđain) ( Thomas)
    • working on sme
  • produce correct number base forms in the analyzer (Thomas)
    • working on sme
  • Go through the Num bugs (Thomas)
    • working on sme

Lule Sámi

Number inflection and analysis is now fixed and working, thanks to Thomas.

TODO:

  • refine smj proper noun lexica, cf. the propernoun-smj-lex.txt ( Thomas, Trond)
  • Lack of lowering/fronting before hyphen: Twol rewrite. (Thomas, Trond)

8. Name lexicon infrastructure

Decisions made in Tromsø can be found in the meeting memo.

Postponed:

  • data synchronisation between risten.no and the cvs repo

TODO:

  1. restructure interface code for easier maintenance, coding and use
    1. done, except a glitch in Opera (won't fix for the time being)
  2. finish first version of the editing (Sjur)
  3. test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)
  4. make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nob as well) (the morphological section should be kept intact, in e.g. propernoun-sme-morph.txt) (Sjur, Saara)
  5. convert propernoun-($lang)-lex.txt to a derived file from common xml files ( Sjur, Tomi, Saara)
  6. start to use the xml file as source file
  7. clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, Maaren, linguists)
  8. merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)
  9. publish the name lexicon on risten.no (Sjur)
  10. add missing parallel names for placenames (linguists)
  11. add informative links between first names like Niillas and Nils ( linguists)

9. Spellers

Polderland data generation

Polderland has now given instructions for how to handle genitive compounding and proper nound prefixing. We'll need to decide how to specify nouns that should take genitive first parts when compounding.

TODO:

  1. add smj to PLX conversion (Børre, Tomi)
    1. done
  2. send smj PLX data to Polderland (Børre, Tomi)
  3. decide how to specify nouns requiring genitive first parts ( Sjur, Thomas, Trond)
  4. Include numerals in the speller (Børre, Tomi)
    1. first version done, but needs more work
  5. add prefixes to the PLX (Børre, Tomi)
    1. still not
  6. add derivations to the PLX generation (Børre, Tomi)
    1. next after numbers are fixed

OOo speller(s)

TODO when the major part of the PLX conversion is done:

  • add Aspell/Hunspell data generation to the lexc2xspell (Tomi - after the PLX data generation is finished)
  • study Hunspell, perhaps also Soikko (Børre, Sjur, Tomi)

Testing

TODO:

  • get an Intel Mac for Tomi (Sjur)
    • not yet

Localisation

TODO:

  • translate Windows installer text to sme ( Børre, Thomas)
    • done
  • send smj translations to Polderland (Børre)
    • done

10. Other

Corpus contracts

TODO:

  • publish corpus contracts and project infra as open-source on NoDaLi-sta ( Sjur)
    • nothing yet

Bug fixing

59 open Divvun/Disamb bugs, and 23 risten.no bugs

Moving G5

The G5 could now be moved out of Børre's office, to the basement where the present web server is located. That will reduce the noise level in the office quite substantially: -)

TODO:

  • move the G5 to the basement (Børre)

11. Next meeting, closing

The next meeting is 12.2.2007, 09: 30 Norwegian time.

The meeting was closed at 10: 43.

Appendix - task lists for the next week

Boerre

iCal

  • write form to request corpus user account
  • document how to apply for access to closed corpus, and details on the corpus and its use in general
  • update and fix our documentation and infrastructure as Steinar finds problem areas
  • continue work on script for automatic testing of the spell checker in Word
  • fix sme texts in corpus this month
  • find missing nob parallel texts in corpus
  • work on the Polderland data generation (PLX format conversion)
    • Concentrate on compounding
  • go through other directories, fix parallellity information for other documents
  • add sma texts to the corpus repository
  • move the G5 to the basement (Børre)
  • fix bugs!

Maaren

iCal

  • lexicalise actio compounds

Saara

iCal

  • fix sme texts in corpus this month
  • continue aligning the rest of the parallel files
  • fix problems with xml2lexc if needed
  • fix bugs!

Sjur

iCal

  • name lexicon:
    • refactor the rest of the SD-terms editor code
    • implement missing propnouns editing functions
    • implement improvements decided upon in Tromsø
  • hire linguist and programmer
  • publish corpus contracts and project infra as open-source on NoDaLi-sta
  • fix stuorra-oslolaš lower case o
  • write form to request corpus user account
  • document how to apply for access to closed corpus, and details on the corpus and its use in general
  • get an Intel Mac for Tomi
  • fix bugs!

Steinar

iCal

  • test our infrastructure and documentation - follow the documentation exactly, and find problem areas - report problems to Børre. Start: At the front page.
  • Complete the semantic sets in sme-dis.rle
  • missing lists
  • report conversion errors to Saara
  • Look at the actio compound issue when adding from missing lists
  • lexicalise actio compounds. Example: vuolggasadji vs. vuolginsadji
  • Go through the Num bugs
  • fix bugs!

Thomas

iCal

  • refine smj proper noun lexica, cf. the propernoun-smj-lex.txt
  • work with compounding
  • Lack of lowering before hyphen: Twol rewrite.
  • Go through the sme Num bugs
  • fix stuorra-oslolaš lower case o
  • implement discontinous case inflection for sme numbers
  • produce correct number base forms in the sme analyzer
  • fix bugs!

Tomi

iCal

  • improve numerals in the speller
  • add prefixes to the PLX
  • add derivations to the PLX generation
  • fix bugs!

Trond

iCal

  • update the smj proper noun lexicon, and refine the morphological analysis, cf. the propernoun-smj-lex.txt
  • fix sme texts in corpus this month
  • find missing nob parallel texts in corpus, go through Saara's list
  • report conversion errors to Saara
  • Go through the Num bugs
  • implement discontinous case inflection for numbers
  • produce correct number base forms in the analyzer
  • Write project presentation
  • fix bugs!.