UIT The arctic university of Norway > Giellatekno
 

Meeting_2005-05-23

Meeting setup

  • Date: 23.05.2005
  • Time: 12.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues
    1. Sjur: brief report from Joensuu
    2. The gathering
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 12.20. Agenda accepted as is.

Present: Sjur, Thomas, Tomi, Trond, Børre

Main secretary: Børre

2. Reviewing the task list from the last meeting

  • Børre:
    • Contact Univ. of Oslo on the contract issue.
      • Got a contract from them
    • Finish the divvun e-mail alias
      • Leif Åge said he would fix it, haven't got any feedback
    • Contact people about Lule Sámí texts
      • Nothing done
    • Contact skolelinux and Knut Hofland about webcrawler.
    • Here is Knut's thoughts on the issue in 1999: http://gandalf.aksis.uib.no/knut/mons/kh-avis.htm
      • Nothing done
    • Finish the setup of divvun.no
      • Haven't had contact with Thor Øivind Johansen, Børre will try to contact him in person today
    • Look at how forrest handles i18n, try to get work done on that.
      • Done some brief work
    • Set up programme and other details about the gathering in Guovdageaidnu 25th-27th of May
      • Done
    • Contact Roy Dragseth about cvs-commit mailing
      • Sent an e-mail, phoned him a few days later. He is away untill the 26th of May.
    • Finish the setup of http://giellatekno.uit.no/
      • Done some work.
  • Tomi:
    • Template for manual xsl conversion script
      • Done, Sjur helped
    • Script for processing the corpus for preprocessor
      • Done (catxml)
    • Install "Tiger" (10.4), document all issues, together with steps to resolve them
      • Postponed to Kautokeino
    • Look at the twolc part of how to handle shortening of the middle part in three part compounds. Discuss the data / linguistic side with the linguists.
      • Nothing done
    • Look at decapitalisation of proper nouns when compounded or derived to a general noun
      • Nothing done
  • Sjur:
    • Meet with Kimmo:
      • Discuss text licensing, and get a copy of the Helsinki licensing model/text Got a copy
      • Bring up the kwic-snt issue with Kimmo, make it utf-8 compatible (now, each byte counts as one, instead of each character counting as one (what is needed is a counting mechanism that counts only bytes starting in 01 or 110, and ignoring bytes starting in 10.) Forgot that, kwic-snt issue still open.
    • Still some work on the Termdb
    • Write presentation for the Joensuu meeting
    • Decide upon the public opening of divvun.no with Anne Britt
    • Discuss with Thor Øivind Johansen about Forrest/divvun/Tomcat. Tel: +47 7764 6741
    • Ask Leif Åge to check the opening/forwarding of ports number needed for iChat voice and video conferencing
  • Maaren:
    • continue with proper nouns, work on the Bugzilla bug list
    • reserve rooms at Villmarkssenteret
    • Translate divvun.no front page to Finnish
  • Thomas:
    • continue with verb valency about 3000 verbs left now
    • Reply to Teemu Leskinen, thanking for the material but asking that we get a new version, now with the Finnish names coupled with the Sámi names, if at all possible. Teemu will send us a new version.
  • Trond:
    • Work with the bug list, the lexicon, disambiguation. Prepare Lule Sámi etc. for the meeting. Has worked on linguistic issues, but not on Lule Sámi. Have worked on bugs.
  • All:
    • Have a look at the bug list in Bugzilla (mainly Trond's bugs, but give me a hand, will you?) http://giellatekno.uit.no/bugzilla/ We are now down at 14 open bugs.
    • Prepare the Guovdageaidnu meeting
    • Order tickets to the Guovdageaidnu meeting

3. Documentation - divvun.no

Nothing has happened - no contact with Thor Øyvind yet.

4. Corpus gathering

Licenses received. Received two books from the Old Testament, was promised a new version of the New Testament.

5. Corpus infrastructure

Tomi has made templates for manual conversion scripts and ...

Sjur disc with LarsN and Trond about tagged corpus format in Joensuu

6. Linguistics

See above.

7. Term db

It was opened as an internal version for beta testing at the Sámediggi on the 19th of May.

8. Other issues

8.1 Report from Jooensuu:

- Welcome to give a system demo on the Finite State workshop in Helsinki - Tools for automatical creation of bilingual dictionarie4s on the basis of aligned parallel corpora.

8.2 This week + gathering

8.2.1 Gathering - Programme:

  • Wednesday before lunch
    • Common issues:
      • Evaluate first half year
      • Discuss issues/improvements
      • Bugzilla methodology
  • Wednesday after lunch
    • Programmers' session
      • Upgrade to 10.4, fix things
      • Bugzilla bugs?
    • Linguist session
      • Lule Sámi: Discussion on how to proceed, UTF-8 vs. Latin 1, make a task overview for the programmers
      • Linguistic issues in Northern Sámi
      • Work procedures for linguistics The missing list, the names, the loan words
      • Kárás1*joh*las1, case agreement within numerals, Vowel shortening of compounds with short 1st syllable (bivdo- pro bivdu-). The linguistic facts must be clearified.
      • Bugzilla bugs
  • Thursday before lunch
    • Common session
      • xml format for the lexicon
      • Lule Sámi common things
  • Thursday after lunch
    • Linguists' session
      • Linguistic issues cont. from Wednesday.
    • Programmers' session
      • Forrest i18n
      • Outsourcing: specification of the outsourced work
      • Make sure video and voice conferencing is working through the Sámediggi firewall
  • Friday before lunch
    • Computational linguist session (Tomi, Sjur, Trond)
      • Discussing things that has come up during the week, relevant for twolc, lexc, xfst
    • Session for the rest
      • Ask Børre about everything you were afraid to ask about (TM)
  • Friday after lunch
    • Programmers
      • xml format for the lexicon, the more technical details

8.2.2 Practical things:

  • We will stay at Villmarkssenteret
    • Sjur: 23.-27.5.
    • Trond 23.-27.5.
    • Tomi 23.-27.5.
    • Børre: Monday - Friday (Private accomodation)
  • Schedule. Wednesday, thursday, friday, full days (Trond to leave after lunch on Friday) and Maren about 11.00.
  • The meeting room has been reserved.

9. Summary, task list

TODO:

  • Børre:
    • Contact Thor Øivind
  • Thomas:
    • Change reservations at Villmarksenteret
  • Trond: Disamb documentation with Børre.
  • All:
    • The Guovdageaidnu gathering

10. Next meeting, closing

30.05.2005 10.00

Closed at 13.35