UIT The arctic university of Norway > Giellatekno
 

Meeting_2005-06-06

Meeting setup

  • Date: 06.06.2005
  • Time: 11.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues
  9. Summary, task lists
  10. Closing

1. Opening, agenda review, participants

Opened at 11.00. Agenda accepted as is.

Present: Børre, Sjur, Thomas, Trond

Away: Tomi (Finland), Maaren (Iceland)

Main secretary: Trond

2. Reviewing the task list from the last meeting

Børre:

  • Disamb documentation within giellatekno.uit.no (with Trond)
    • The alpha version of the forrest-based giellatekno external site is up and running, but there are still open issues:
      • word-form generation not working
      • no link back to Divvun project page
  • Add corpus texts that are unproblematic from a juridical point of view
    • nothing done
  • Contact people about Lule Sámí texts
    • Contacted Kåre Tjihkkom, he said he didn't have any texts himself. He asked me to contact Árran. Talked to Bård Eriksen at Árran, who was interested in contributing with text and also would like to be a tester.
  • Contact skolelinux and Knut Hofland about webcrawler.
  • Finish the setup of divvun.no
    • Contact Thor Øivind:
      • Contacted him, he was going to update Tomcat (had an old version).
  • Contact Roy Dragseth about cvs-commit mailing
    • Sent an e-mail some time ago. He has not answered, will phone him.
  • continue Forrest i18n
    • Nothing done.
    • Please also add an issue in Forrest's own issue tracker (bug db) (Sjur)
  • continue http://giellatekno.uit.no, encoding problems
    • Solved, using decimal codes : -( (add to Forrest issue tracker)

Maaren:

  • continue missing list
  • find and prepare issues for the language board meeting
  • translate Helsinki contract into rough Norwegian; send it to Sjur, then lawyer

Sjur:

  • Scan and OCR the Helsinki contract; mail it to Maaren
    • Done (not OCR)
    • Resend to Trond (Maaren is away)
  • Ask Anne-Britt to update the contract with the University (we are getting 3, not 2, persons there from the fall)
    • not done yet
  • write the LIST option to our test tools, as requested in the discussion group.
    • not done
  • complete the action summary after our half-year evaluation
    • not done : -(
  • fix risten.no bugs
    • fixed some
    • added risten.no to bugzilla @ giellatekno.uit.no
    • updated the version on the server
    • discussed bugs in eXist, and feature requests
  • contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
    • not yet
  • voice group-chat not working to Sámediggi - should be fixed
    • tried, but no info from LÅH yet
  • Also done:
    • Tiger-updates (a.o. Aspell), getting Sámediggi intranet access to work

Thomas:

  • Find issues to giellastivra čoahkkin
    • Found many
  • help Maaren prepare those same issues
    • done, but should go to giellalávdegoddi instead of giellastivra
  • Continue with the verbs
    • A few hundred left, should finish tomorrow

Tomi:

  • Start looking at the smj infrastructure, turn it into utf-8, look at whether updates and improvements done for sme should be done for smj as well, in order to get a uniform infrastructure.
  • update catxml to remove sections in unwanted language(s)
  • Translate the texts in the sme/corp/correct/ directory to utf-8.
    • Done.
  • Work on the nonrec transducer: Get the nonrec files out of src/ and into int/, get the "make nonrec" process up and running.
    • Done (but headed into limits of the available xfst tools).
  • Look into shortening of three-part compounds, middle part
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)

Trond:

  • Disamb documentation with Børre.
    • Alpha version up and running.
  • Systematize the linguistic issues ahead of the Language Board meeting
    • Nothing done.
  • Add cgi-bin errors to Bugzilla.
    • Done.
  • continue the office disc. with the faculty
    • Not done.
  • Linguistic work.
    • Work in continuos progress, worked on vocabulary, worked on disamb input from Pekka.
  • computational linguistics, with Tomi.
    • Not done (see Tomi's task list above).

All:

  • The www.divvun.no release.
  • The Bugzilla bugs, and a quip or two?
    • Today 15 open bugs.
  • continue the discussion on the language policy of divvun.no in the newsgroup
    • Trond has replied
    • Børre agrees to the policy
      • What about news? In all languages? Lule or North Sámi as default language.
    • Thomas: likes it.
  • Open questions about language policy:
    • All = English? Yes (the amount is small). Decided.
    • How much in 5 languages?
    • Size of general public info: Trond: only a few paragraphs - about the project, and the language policy of the site. More specifically, it should include:
      • what the divvun project is
      • why it is made, by whom, on whose decision
      • what we make, (why we do not make sma)
      • how we make it
      • how, when, and for whom it eventually will be used, and on what terms and price
      • an explanation of our lg policy, and a good set of links.
    • The user documentation will be written in the target language, in Finnish and Norwegian, and in English.

3. Documentation - divvun.no

Thor Øivind has to get Tomcat working. See our task list for other issues.

4. Corpus gathering

  • Lule Sámi:
    • Ådå Testamennta (Svenska bibelsällskapet). They must be contacted.
  • North Sámi:
  • The language consultant in Guovdageaidnu is actively working, gathering public documents from the municipality archives.
  • Otherwise Børre awaits progress on the contract work:
    • Sjur will check with Maaren about the translation from Finnish to Norwegian, and forward the translation to Trond if not done by somebody else.

5. Corpus infrastructure

Lars Nygård has made a searchable web interface as a beta version: Samisk online korpus. Cf. http://omilia.uio.no/CE/CE.pdf for documentation.

6. Linguistics

  1. NRK: as
  2. -eaddjiid vs. –eddjiid
  3. -eabbo/-abbo vs. –eabbu/-abbu
  4. -at/-ut vs. –et/-it
  5. -dettiin vs. –diin/-din
  6. -lit vs. –šit (kondišunal)
  7. Western -l- forms : soappulin vs. soppolin (soppošin vs. soappušin)
  8. kond alternativ form livččon vs. livččen
  9. kárášjoh nieida vs. kárášjohnieida
  10. hálede:
+Du1:e K ;                 =hálede
+Du1:edne K ;           =hálededne
+Du1:etne K ;            =háledetne
  1. háliidit+V+TV+Ind+Prs+Sg3
háliida
háliid
  1. makkárat imperativhámit galget leahkit? Ovdamearkkat bárrastávval vearbbain:
+Pl1:Q3t K ;             ! bohtot       ! West
! +Pl1:Y3t K ;            ! boahttot     ! East 
 +Pl2:Q2t K ;              ! bohtet       ! West
 +Pl2:Y3t K ;              ! boahttet     ! East

Bárahis stávvalat:

háliidit+V+TV+Imprt+Prs+Du1
háliideadnu
háliideahkku

háliidit+V+TV+Imprt+Prs+Pl1
háliideadnot
háliideahkkot
háliidehkot
háliidetnot

háliidit+V+TV+Imprt+Prs+Pl2
háliideahkket
háliidehket

Leahkit:

+Du1:eadnu K ;
+Du1:eahkku K ;

+Pl1:ehkot K ;
+Pl:eatnot K ;
  1. Noun/adj shortforms
beakkán+A+Sg+Gen
beakkána
beakkán

beakkán+A+Sg+Acc
beakkána
beakkán
  1. Jeagohin vs. Jeagoheapmi
  2. rieban vs. rievan
  3. Goallossánit: kulturlávdegoddi / kultuvrralávdegoddi (sosial-, privat-, ...) jf. *kultur, kultuvra (privatriekta)
  4. Loatnasánit

To be further discussed within the project, and brought forward to the normative bodies.

7. Term db

Still more bugfixing, adjustments based on user feedback.

8. Other issues

Nothing.

9. Summary, task list

All:

  • The www.divvun.no release.
  • The Bugzilla bugs, and a quip or two?
  • continue the discussion on the language policy of divvun.no in the newsgroup, esp. re the open questions:
    • How much in 5 languages?
    • Size of general public info: Trond: only a few paragraphs - about the project, and the language policy of the site. More specifically, it should include:
      • what the divvun project is
      • why it is made, by whom, on whose decision
      • what we make, (why we do not make sma)
      • how we make it
      • how, when, and for whom it eventually will be used, and on what terms and price
      • an explanation of our lg policy, and a good set of links.
    • should news be in all languages? What about RSS feeds from Forrest?

Børre

  • Contact Kevin Scannell of an Crúbadán fame, ask if we can use his north sami material.
  • Contact Roy Dragseth about cvs-mailing
  • Work on divvun.no
  • Continue http: //giellatekno.uit.no conversion to forrest ...
  • Contact Svenska bibelsällskapet
  • Add issue to forrest issue tracker about utf-8 ihtml documents.
    • Check for existing bug reports, it might already be in there
  • Continue forrest i18n research

Maaren:

  • continue missing list
  • find and prepare issues for the language board meeting
  • translate Helsinki contract into rough Norwegian; send it to Sjur, then lawyer

Sjur:

  • Check the translation of the Helsinki contract with Maaren; forward the task to Trond if not yet assigned
  • Ask Anne-Britt to update the contract with the University (we are getting 3, not 2, persons there from the fall)
  • write the LIST option to our test tools, as requested in the discussion group.
  • complete the action summary after our half-year evaluation
  • fix risten.no bugs
  • contact Kimmo Koskenniemi about kwic-snt and UTF-8 bug
  • voice group-chat not working to Sámediggi
  • add i18n bug to Forrest issue tracker
  • add i18n menu as feature request to Forrest issue tracker

Thomas:

  • Begin to prepare issues to giellalávdegotti čoahkkin
  • Present issues in dicussion forum
  • Finish the work on verb valency
  • Start to look at Lule Sámi

Tomi:

  • Start looking at the smj infrastructure, turn it into utf-8, look at whether updates and improvements done for sme should be done for smj as well, in order to get a uniform infrastructure.
    • High priority
  • update catxml to remove sections in unwanted language(s)
  • Look into shortening of three-part compounds, middle part
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)

Trond

  • translate Helsinki contract into rough Norwegian; send it to Sjur, then to lawyer

10. Next meeting, closing

13.06.2005 10.00

Closed at 12.45