UIT The arctic university of Norway > Giellatekno
 

Meeting_2006-02-20

Meeting setup

  • Date: 21.02.2006
  • Time: 09.30 Norw. time
  • Place: Wherever we are : -)
  • Tools: iChat, SubEthaEdit

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from two weeks ago
  3. Linguistics
  4. name lexicon infrastructure
  5. Spellers
  6. Other issues
  7. Summary, task lists
  8. Closing

1. Opening, agenda review, participants

Opened at 09: 44.

Present: Børre, Maaren, Sjur, Thomas, Tomi

Absent: Maaren, Saara, Trond

Main secretary: Børre

Agenda shortened substantially due to time constraints.

2. Reviewing the task list from the last meeting

Børre

  • send out contracts with accompanying letter
    • Anders Kintel
  • Gather public texts, preferrably also parallel ones
    • Gathered some from samediggi, will add them soonish
  • Continue converting text from input format to our xml
    • Not done
  • convert nob and nno bible texts to be used as part of a parallel corpus
    • Not done
  • review the paratext2xml converter
    • Not done
  • convert smj NT to paratext
    • not done, we don't have the nny and nob NT paratexts
  • Call Ove Sæth and Olavi Korhonen
    • Not done
  • Correct Forrest integration for new projects and project ideas
    • Helped Sjur a bit
  • Move complex name lexicon issue to bugzilla
    • Not done
  • fix bugs!

Maaren

  • work with risten.no
    • Not done. Worked with top ten missinglist
  • discuss with relevant people regarding seminar on proofing tools, normativity and SGL in February/March, including place.
    • waiting for an answer from SGL when and where is best to hold normativity seminar.

Saara

  • continue discussion on the new lexicon format
  • Move the issue "Refine language detection for Finnish" to Bugzilla
    • done.
  • Move the issue "Finnish the review of the hyphenation detection" to Bugzilla
    • done.
  • Add version information of the tools to part of the corpus infra.
    • done.
  • Fix the preprocess script and optimize it.
  • finalize an improved working version of the CGI and command line scripts for corpus additions
    • done.
  • update conversion from lexc to xml (proper names) with the latest refinements
    • not done.
  • Try to add numeral treatment as part of the analyzator.
    • not done.
  • Look at crontab ga/ directory issue with Trond.
    • not done. I will move the gt2ga.sh -script to darwin this week, because it is way too slow in cochise. The more extensive test routines are postponed for now.
  • fix bugs!
    • Changed the status of some corpus infra bugs to fixed.

Sjur

  • Follow up the lawyer treatment of the contracts
  • Lule Sámi twol problems, with Thomas and Trond
  • project planning with Trond, continued
  • Follow up on place names from Norge Digitalt
  • Evaluate SFST as speller (and analyzer) lexicon
  • write a background document on the corpus contracts
  • public tender:
    • review draft tender document from Finnut
      • almost ready, to be published next Monday or Tuesday
  • smj G3 issue with Thomas and Trond
  • sme G3 issue with Thomas and Trond
  • call EDD/ Christian Emil Ore about national place name lexicon
  • risten.no/proper noun lexicon development: fix bugs, continue development
    • more work
  • fix bugs!

Thomas

  • work on North Sámi compounding and derivation
    • nothing due to sickleave
  • review corpus usage documentation
    • nothing due to sickleave
  • smj G3 issue with Sjur and Trond
    • nothing due to sickleave
  • sme G3 issue with Sjur and Trond
    • nothing due to sickleave

Tomi

  • move aspell UTF-8 suffix bug to Bugzilla
  • corpus infrastructure:
    • dtd location (both public and internal)
  • Document aspell infrastructure: finish doc/proof/spelling/X-spell/aspell.xml (it's almost done, but there are a couple of loose ends)
  • new proper name lexicon
    • discuss the new lexicon format and other issues in the newsgroup
    • Look into data synchronisation of proper nouns between risten.no and CVS
    • new version of xml2lexc (based on ccat), should handle complex names correct: construct entries like we have now from the different parts of a complex name entry
  • fix bugs!
  • Other
    • On sick leave

Trond

  • Work on corpus texts with Børre.
  • Contact the Finnish and Swedish Bible societies to get Bible texts.
  • Look at ga/ directory issue with Saara.
  • News group discussion followup.
  • Do a bug report (if not done) on commandline (mis)behaviour in the Xerox tools
  • Ask IT guys for an e-mail adress for corpus upload script: corpus@giellatekno.uit.no (forwarded to Børre)
  • fix bugs!.

3. Linguistics

North Sámi

Maaren has been adding words from the top-ten missing list.

Lule Sámi

We need to get a working version of the Lule Sámi lexicon by Monday next week. The lexicon doesn't need to be "complete" in any way, but should approach 6-7000 lemmas.

Børre, Thomas and Trond will convert the material to lexc, in cooperation with Sjur.

4. Other

SGL Seminar

SGL has now been elected, with the folowing members:

  • Rolf Olsen (Else Turi)
  • Tor Magne Berg (Marit Breie Henriksen)
  • Elle Marja Vars (-)
  • Lena Kappfjell (Albert Jåma)
  • Heidi Andersen (-)

SGL/normativity seminar:

  • all members = potentially/likely all languages
    • not all languages, only North Sámi
  • date? As early as possible, end of February/beginning of March
  • place? Maaren will investigate
    • I am waiting for the answer from Laila. Place? Do you have ideas? Karasjok, Helsinki, Kautokeino? Date: end of March?

Leaves and vacations

  • Maaren will be away from the end of March and three weeks.
  • Sjur is on winter holidays this week, as well as Maaren from Wednesday
  • Børre will be in Karesuando in two weeks, but will be working (some, at least: -)

5. Summary, task list

Børre

  • send out contracts with accompanying letter
  • Gather public texts, preferrably also parallel ones
  • Continue converting text from input format to our xml
  • convert nob and nno bible texts to be used as part of a parallel corpus
  • review the paratext2xml converter
  • convert smj NT to paratext
  • Call Ove Sæth and Olavi Korhonen
  • Correct Forrest integration for new projects and project ideas
  • Move complex name lexicon issue to bugzilla
  • fix bugs!

Maaren

  • work with top-ten missing list
  • discuss with relevant people regarding seminar on proofing tools, normativity and SGL in March/April, including place.

Saara

  • continue discussion on the new lexicon format
  • Move the issue "Refine language detection for Finnish" to Bugzilla
  • Move the issue "Finnish the review of the hyphenation detection" to Bugzilla
  • Add version information of the tools to part of the corpus infra.
  • Fix the preprocess script and optimize it.
  • finalize an improved working version of the CGI and command line scripts for corpus additions
  • update conversion from lexc to xml (proper names) with the latest refinements
  • Try to add numeral treatment as part of the analyzator.
  • Look at crontab ga/ directory issue with Trond.
  • Create a parallel corpora of the new testaments.
  • Routine for adding new languages to the propernoun xml-structure.
  • fix bugs!

Sjur

  • Follow up the lawyer treatment of the contracts
  • Lule Sámi twol problems, with Thomas and Trond
  • project planning with Trond, continued
  • Follow up on place names from Norge Digitalt
  • Evaluate SFST as speller (and analyzer) lexicon
  • write a background document on the corpus contracts
  • public tender:
    • review draft tender document from Finnut
  • smj G3 issue with Thomas and Trond
  • sme G3 issue with Thomas and Trond
  • call EDD/ Christian Emil Ore about national place name lexicon
  • risten.no/proper noun lexicon development: fix bugs, continue development
  • fix bugs!

Thomas

  • convert the Lule sámi lexicon
  • write convertion list for Lule sámi propernoun cont. lexicas and define the lexicas
  • work on North Sámi compounding and derivation
  • review corpus usage documentation
  • smj G3 issue with Sjur and Trond
  • sme G3 issue with Sjur and Trond

Tomi

  • move aspell UTF-8 suffix bug to Bugzilla
  • corpus infrastructure:
    • dtd location (both public and internal)
  • Document aspell infrastructure: finish doc/proof/spelling/X-spell/aspell.xml (it's almost done, but there are a couple of loose ends)
  • new proper name lexicon
    • discuss the new lexicon format and other issues in the newsgroup
    • Look into data synchronisation of proper nouns between risten.no and CVS
    • new version of xml2lexc (based on ccat), should handle complex names correct: construct entries like we have now from the different parts of a complex name entry
  • fix bugs!

Trond

  • Work on corpus texts with Børre.
  • Contact the Finnish and Swedish Bible societies to get Bible texts.
  • Look at ga/ directory issue with Saara.
  • News group discussion followup.
  • Do a bug report (if not done) on commandline (mis)behaviour in the Xerox tools
  • Ask IT guys for an e-mail adress for corpus upload script: corpus@giellatekno.uit.no (forwarded to Børre)
  • fix bugs!.

6. Next meeting, closing

27.02.2006 09: 30

Closed at 10: 13