UIT The arctic university of Norway > Giellatekno
 

Meeting_2005-04-11

Meeting setup

  • Date: 11.04.2005
  • Time: 14.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit

Agenda

  1. Opening, agenda review, participants
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues
  9. Summary, task lists
  10. Closing

Task list since last meeting:

  • Tomi: to check orig. doc. integrity in the OO file format
    • discussion postponed till the XML format is settled/defined
  • Tomi: Move the corpus discussion from newsgroup to forrest
  • Thomas: continue work with verbs / modals
  • Børre: correct image linking in termdb and divvun projects
    • will look into it this week
  • All: discuss our own corpus format in the newsgroup - to be continued.
    • continue
  • Tomi and Børre: identify the input encodings antiword can handle
    • to continue to look into this: it is still possible that char conv. can be done within antiword.
  • Sjur, Børre: terminology database
  • Børre: divvun.no:
    • to set up the documentation and integration with CVS
  • Børre: should make the How-To tab appear as intended:
    • still to be done
  • Børre: add automatic version-stamp: add $Author: boerre $ to some docs, replace e-mail with project e-mail address (Børre will ask Leif Åge).
  • Børre: wiki - will follow up on the UTF-8 problem
    • temporary workaround: use Latin 1-encoding for the memos
    • Børre will continue to look into the chaperon tools (chaperon is used for converting Wiki to XML)
  • Trond: forrest run in cochise:
    • Still awaiting latest report from Trond
  • Trond, Børre, all: fix the links
    • Børre will check if there are still more link problems

1. Opening, agenda review, participants

Opened at 14.08. Agenda accepted as is.

Present: Maaren, Sjur, Thomas, Tomi, Trond, Børre

2. Reviewing the task list from the last meeting

  • Tomi: to check orig. doc. integrity in the OO file format
    • discussion postponed till the XML format is settled/defined
  • Tomi: Move the corpus discussion from newsgroup to forrest
    • has been doing XSLT for docbook to our xml format (docbook2sme.xsl)
  • Thomas: continue work with verbs / modals
    • Finished.
    • Maaren: I have now time (at last) to work with this project, starting tomorrow. I`ll take contact with Trond
  • Børre: correct image linking in termdb and divvun projects
    • That's done.
  • All: discuss our own corpus format in the newsgroup - to be continued.
    • continue
  • Tomi and Børre: identify the input encodings antiword can handle
    • Børre looked at it before easter. antiword assumes everything is cp1252 or Unicode. It would be easiest to do the conversion of non-Unicode input with a perl-script operating on the (erroneous) output of antiword. The reason for that is that most of the encodings are "private" ones.
  • Sjur, Børre: terminology database
    • Still working with it...
  • Børre: divvun.no:
    • to set up the documentation and integration with CVS This must be integrated with the systems at the UIT, and Børre will contact the IT persons at UIT
  • Børre: should make the How-To tab appear as intended:
    • still to be done
  • Børre: add automatic version-stamp: add $Author: boerre $ to some docs, replace e-mail with project e-mail address (Børre will ask Leif Åge).
    • The issue concerning Author tag is solved.
    • Still missing a project e-mail.
  • Børre: wiki - will follow up on the UTF-8 problem
    • temporary workaround: use Latin 1-encoding for the memos
    • Børre will continue to look into the chaperon tools (chaperon is used for converting Wiki to XML)
  • Trond: forrest run in cochise:
    • Hasn't been testing more since easter
    • When forrest runs on cochise, you can start up a browser at your own local machine with the address http: //cochise.uit.no: <your portnumber>/, and look at the documentation that way.
    • Despite the warnings, most of the documentation is generated when using forrest to generate static HTML sites. The generated documentation is found in ./build/site/
    • We can now start moving the old documentation to the attic.
  • Trond, Børre, all: fix the links
    • Lot's of links to .txt files are broken. Trond and Børre takes a look at that issue.

3. Documentation - divvun.no

Børre is going to contact the main sysadm and discuss the technical details. Deadline: we return to it on the next meeting.

4. Corpus gathering

  • Min Áigi: Svein Nordsletta at Min Áigi has been in contact with Per Edvard Klemetsen. From the e-mail:

«Jeg hadde akkurat en telefonsamtale med Svein Nordsletta (tlf 78469709), rådgiver i Min Aigi, som var interessert i status for korrekturprogramarbeidet. Du bør snarest ta kontakt med han for orientering og bistand til tekster, feilkorpus m.m. Han var meget interessert!»

Would be interesting to get the articles before they are proof-read. And after that, the finished articles in InDesign(?) format, with hyphenation.

We need a contract. Trond contacts the people at the University in Oslo, so we have a template. We probably have to modify it for our use cases.

We have got some texts from Marit Einejord at Romssa sáirál (hospital) and Várdobáiki in Skánit.

Olavi Korhonen has offered us mainly Lule Sámi text, the NT, and he will also look for other texts. His Northern Sámi texts are more scattered, but he will look at that issue also.

Routines for corpus gathering: We need a dropbox where raw incoming text can be stored before it is processed. We'll return to this issue in the newsgroup.

Sámi placenames in Finland will become available as well.

5. Corpus infrastructure

  • deadline for corpus format
    • 25.4.
  • perl scrip for unicode conv.
    • Tomi: by next meeting

6. Linguistics

Maaren will start working 2 d/w, Thomas has been working with the transitivity of the verbs, continuing Lena's work.

Lule Sámi: Anders has promised us the dictionary. The exact status of it is not known. Olavi Korhonen also works on a Lule Sámi dictionary, to be finished in the autumn.

More serious linguistic work on Lule Sámi to be started at the beginning of the summer.

7. Term db

The beta version will be done within a week : -)

8. Other issues

None.

9. Summary, task lists

TODO:

  • Tomi: Move the corpus discussion from newsgroup to forrest
    • has been doing XSLT for docbook to our xml format (docbook2sme.xsl)
    • awaiting final XML format
  • Thomas: continue work with verb transitivity. Contact Olavi about his dictionary.
  • Maaren: Will work with this project, starting tomorrow.
    • I`ll take contact with Trond and Thomas (telephone meeting Tuesday 12.4.)
  • All: discuss our own corpus format in the newsgroup - to be continued.
    • deadlline: 25.4.
  • Tomi and Børre: identify the input encodings antiword can handle
    • Tomi makes a perl-script, Børre helps out on that one.
  • Børre: divvun.no:
    • to set up the documentation and integration with CVS This must be integrated with the systems at the UIT, and Børre will contact the IT persons at UIT
    • Coordinate with Sjur as well
    • Still missing a project e-mail address. Børre contacts Leif Åge.
  • Sjur, Børre: terminology database
    • Still working with it...
  • Børre: should make the How-To tab appear as intended:
    • still to be done
  • Børre: wiki - will follow up on the UTF-8 problem
    • temporary workaround: use 8-bit encoding for the memos (=MacRoman works correctly)
    • convert all memos to MacRoman
  • Trond, Børre, all: fix the links
    • Børre will check if there are still more link problems
    • Lot's of links to .txt files are broken. Trond and Børre takes a look at that issue.
  • Børre: contact Min Áigi, discussing possible practical arrangements for cooperation.

10. Next meeting, closing

Next meeting: 18.04.2005 10.00

Closed at 15.34