UIT The arctic university of Norway > Giellatekno
 

Meeting_2005-03-08

Meeting setup

  • Date: 08.03.2005
  • Time: 10.00 Norw. time
  • Place: Wherever we are: -)
  • Tools: Phone, iChat, SubEthaEdit

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - status
  4. Corpus gathering
  5. Corpus infrastructure
  6. Linguistics
  7. Term db
  8. Other issues?
  9. Summary, task lists
  10. Closing

Last week's task list:

  • Sjur, Børre, Tomi: terminology database
  • Børre: will set up a separate main tab How-To
  • All: we move from .profile to .bash_profile, Børre: docu to be updated
  • Børre: mpage & UTF-8 - file a bug in Bugzilla, with the solution postponed to Tiger (OS 10.4)
  • Børre:
    • update emacs and xml-mode: Børre has a solution, will document
    • check automatic version-stamp in cvs $id$? As above.
  • Trond: will follow up the issue with binary postings in news
  • Børre: wiki support in forrest works, will follow up on the UTF-8 problem
  • Børre + Trond: set up computers, check forrest in cochise issue is still open, Trond has problems from home and other locations
  • Børre: divvun.no - to be continued now that the IT staff is back from Jokkmokk

1. Opening, agenda review

Opened at 10.12. Additions to the agenda:

  • PaNoLa-Plus (under "Other")

2. Reviewing the task list from a week ago

  • Sjur, Børre, Tomi: terminology database
    • Not completed yet.
  • Børre: will set up a separate main tab How-To
    • Not appearing as intended yet
  • All: we move from .profile to .bash_profile, Børre: docu to be updated
    • Document checked in
  • Børre: mpage & UTF-8 - file a bug in Bugzilla, with the solution postponed to Tiger (OS 10.4) - still to be done
    • PS contains the correct UTF-8 strings
  • Børre:
    • update emacs and xml-mode: Børre has a solution, will document Done. Docu in gt/doc/infra/
    • check automatic version-stamp in cvs $id$? As above. Børre: add it somewhere in the docu, should perhaps set up a FAQ?
  • Trond: will follow up the issue with binary postings in news
    • called and e-mailed, no answer so far. Will continue
  • Børre: wiki support in forrest works, will follow up on the UTF-8 problem
    • still an issue
    • temporary workaround: use Latin 1-encoding for the memos
  • Børre + Trond: set up computers, check forrest in cochise issue is still open, Trond has problems from home and other locations
    • Should be solved, docu at gt/doc/infra/forrest-howto.xml
  • Børre: divvun.no - to be continued now that the IT staff is back from Jokkmokk
    • Forgot it... : -(

3. Documentation - status

We should aim at generating static html. There is "forrest run" dynamically, and "forrest site" that makes static html pages.

X [0] doc/lang/sms/src/verb-sms-lex.txt
      BROKEN: No pipeline matched request: doc/lang/sms/src/verb-sms-lex.txt
X [0] doc/ling/vislcg.html
      BROKEN: /Users/trondtro/xtdoc/sd/src/documentation/content/xdocs/doc/ling/vislcg.xml
      (No such file or directory)
X [0] index_smi.html
      BROKEN: /Users/trondtro/xtdoc/sd/src/documentation/content/xdocs/index_smi.xml
      (No such file or directory)
X [0] doc/lang/smj/docu-sme-flowchart.html 
      BROKEN: /Users/trondtro/xtdoc/sd/src/documentation/content/xdocs/doc/lang/smj/
      docu-sme-flowchart.xml (No such file or directory)

Link problems:

  • link to other documentation (internal links) broken(?)
  • links to source files broken
  • links to image files

All internal links should be corrected, and links to original source files should be replaced with links to excerpts.

Forrest on cochise is still problematic: no such command

4. Corpus gathering

Thomas has done a lot of work last week, many positive answers (see last week's meeting memo).

Also Info nuorra. Agriculture ministry in Sweden have sent some texts in Northern, Lule, Southern and Anár sámi. Some private texts too.

We need a contract to use when making agreements with people and organisations. Børre will look at the one used in Oslo

5. Corpus infrastructure

The antiword issue: Antiword thinks everything is in Unicode in word, and converts that to whatever 8-bit characters et we want (the -m option). Our problem is that we want a many-to-one machine, and not a one-to-many-machine.

Use antiword to convert from .doc to docbook format. Then perl postprocesses the errors in the .xml files that have Levi and the other private hacks

6. Linguistics

Thomas: I have now gone through all the 54 northern sami adjectival sublexicas in sme-lex.txt. and - with the exception of two bugs (reported to Bugzilla) - all the paradigms now seem to be generated right. When I have done the testing I have noticed though that some of the adjectives themselves are directed to the wrong sublexicas. Started to go through these adjectives. Other issues regarding comparition of certain adjectives have to be treated and will be posted in discussion group.

Maaren: no time last week. Maybe sitting home would work?

Trond, Lena: Worked on transitivity in verbs. Checked so far: -ot verbs, -eret ^LOAN words, work started on DIEHTI verbs. The standard version of the n, a, v are now reversly sorted in cvs, in order to facilitate systematic lexicon work..

7. Term db

Sjur and Børre will continue working on that until the official opening at wednesday 16th. After that there will be no one to maintain it, and it needs documentation, but the workload will be much smaller (hopefully).

8. Other issues

PaNoLa-Plus

FIN1) Asiakas syö lounasta. (The client is eating lunch.) (FIN1a)
A1
STA:cl(fcl)
S:n(sg,nom)	Asiakas
P:v(fin,pres,ind,3sg)	syö
Od:n(sg,par)	lounasta

FIN82) Soittaessasi viulua unohdin ajan kuluvan.
 (While you were playing the violin, I forgot how the time passes.) (FIN9b)

A1
STA:cl(fcl)
A:cl(icl)
=P:v(inf2,act,ine+poss2sg)	Soittaessasi
=Od:n(sg,par)	viulua
P:v(fin,imperf,ind,1sg)	unohdin
Od:cl(icl)
=S:n(sg,gen)	ajan
=P:v(pcp1,act,sg,gen)	kuluvan

http://visl.sdu.dk/, cf. e.g. Finnish, under languages:

"Projektets overordnede målsætning er at udvikle og udbygge et unificeret sprogbeskrivelsesapparat for nordiske sprog med udgangspunkt i grammatiske konventioner og sprogteknologiske værktøjer anvendt i VISL-systemet. Fokus vil være på undervisningsredskaber, paralleltekster og spredningseffekten mellem rammesprogene dansk og norsk på den ene side og de mindre nordiske sprog på den anden side.

Samisk: fx Ressourceinventarisering, aktuelle CG-aktiviteter, specifikke grammatiske problemer i et kontrastivt perspektiv"

What we will need from the project is a view on syntactic analysis (S, Od, P, A, ...)

Easter

Wednesday 1/2 day, Thursday and Friday off, as well as Monday.

9. Summary, task lists

  • Børre: mpage & UTF-8 - file a bug in Bugzilla, with the solution postponed to Tiger (OS 10.4) - still to be done
  • Trond: will follow up the issue with binary postings in news
    • called and e-mailed, no answer so far. Will continue
  • Børre: correct image linking in termdb and divvun projects
  • All: discuss our own corpus format in the newsgroup - what do we want and need?
  • Tomi: identify the input encodings antiword can handle
  • Sjur, Børre: terminology database
  • Børre: divvun.no - to be continued now that the IT staff is back from Jokkmokk
  • Børre: should make the How-To tab appear as intended
  • Børre: add automatic version-stamp in cvs $id$ somewhere in the docu, should perhaps set up a FAQ?
  • Børre: wiki support in forrest works, will follow up on the UTF-8 problem
    • temporary workaround: use Latin 1-encoding for the memos
  • Trond: check forrest in cochise, issue is still open.
    • Should be solved, docu at gt/doc/infra/forrest-howto.xml
  • Trond, Børre, all: fix the links

10. Closing

Closed at 12.10