UIT The arctic university of Norway > Giellatekno
 

Meeting_2005-08-08

Meeting setup

  • Date: 08.08.2005
  • Time: 10.00 Norw. time
  • Place: Wherever we are : -)
  • Tools: Phone, iChat, SubEthaEdit

Agenda

  1. Opening, agenda review
  2. Reviewing the task list from a week ago
  3. Documentation - divvun.no
  4. Corpus gathering
  5. Corpus infrastructure
  6. Speller infrastructure
  7. Other issues
  8. Summary, task lists
  9. Closing

1. Opening, agenda review, participants

Opened at 10: 08.

Present: Tomi, Sjur, Børre

Absent: Maaren, Thomas, Trond

Main secretary: Børre

2. Reviewing the task list from the last meeting

Tomi

  • three-part compounding
    • Nothing done
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)
    • Got a tip from Trond about a Xerox example, but did not have a look at the book, and no time do anything with it
  • corpus infrastructure: dtd location (both public and internal)
    • Not done
  • corpus infrastructure: file and dir organisation
    • Nothing done
  • Aspell: A simple wordlist based speller
    • Working!!
    • ... but it's rather big:
      • word list (plain text) is 587 Mb, and after munch-list option it is 433 Mb
    • It is rather slow, because it has no affix-file, and the wordlist is huge.
    • Tomi will try to make an affix list, based on our lexicon files
    • New aspell in cochise!!
      • You can use it temporarily from /home/tomi/bin/aspell and /home/tomi/bin/prezip-bin
      • The new aspell should be installed in a regular location, replacing the old one, but that should be checked with sysadm (Roy Dragseth) to make sure we don't break anything
  • Follow up on CVS mailing:
    • check that all received forwarded mail
      • Børre and Sjur is receiving mail as intended now
      • Børre is suggesting that a cvs diff is attached to the commit mails, with a diff size limit of, say, 500 lines (excess parts will be skipped)
    • set up Thomas and Maaren when they're back from vacation
      • They're still on vacation

On vacation last week

Trond

  • On vacation, issues from last week repeated at the end of the memo

Maaren

  • On vacation, issues from last week repeated at the end of the memo

Thomas

  • On vacation, issues from last week repeated at the end of the memo

Sjur

  • On vacation, issues from last week repeated at the end of the memo

Børre

  • On vacation, issues from last week repeated at the end of the memo

3. Documentation - divvun.no

Forrest war crashes Tomcat

We have the same problem with risten.no. This blocks the addition of new functionality (like RSS feeds and jspwiki support), but won't stop us from updating the content.

Steps to fix this bug (add):

  1. Take a backup of the present war file (it can be found within the Tomcat webapps/ dir)
  2. Make sure TWICE that you have that backup in a safe place
  3. Test both 5.0.29 and 5.5.9 versions of Tomcat, to see if there is any difference (or the combination of Tomcat and Forrest)
  4. try to identify when the bug was introduced (somewhere around the middle of June)
  5. Report the findings to the Forrest user list

Automated update script

Set up the cron job on cochise. Permissions are set correct now, divvun.no can be manually updated.

4. Corpus gathering

The correct directory for corpus legal things is adm/legal/. The Oslo example should be added as well. Børre will start with that today.

Then we'll have to write our own contract, adapted to our own needs, and get it rephrased and checked by a lawyer.

THEN we can start to gather texts: -)

5. Corpus infrastructure

Do documentation.

6. Speller infrastructure

aSpell

Is working, see above. Needs further optimisation before it is useful, cf the size discussion. Size target: At least below 10 Mb, wish: 6 Mb or less.

Write documentation here as well.

MS Office spellers

To be specified in the outsourcing tech-spec. What about direct integration with MS Office? See this page from MS.

OpenOffice.org

Questions:

1) Is it possible to use Aspell with OOo?

No. From the site:

Background
Lingucomponent was started by Kevin Hendricks to integrate various open source spell
checkers into the OpenOffice.org build. With a little prodding from Kevin Atkinson,
the author of Pspell, a new spell checker called MySpell was written in C++ that supported
affix compression, based on Ispell. This code has been given to Pspell and will eventually
make it into a future Pspell release after it has been integrated.

Later note: Pspell is dead and Aspell is its replacement. I think Aspell is found at
aspell.sourceforge.net. There is no OOo interface to Aspell (no way to use Aspell with OOo).
kh.

MySpell, which builds on almost any system, is used to support spell checking in
OpenOffice.org.

Links to possibly useful stuff:

2) Is the Aspell source files compatible with MySpell?

To be investigated. Goal: to simplify maintenance as much as possible, ie to have as few source and target formats as possible.

7. Other

Headphones

The project will buy headphones, to help with online project meetings. Small and light!

8. Summary, task list

Børre

  • Add crontab specification for the cvs update/export script Tomi made
  • investigate Forrest war + Tomcat issue
  • reopen the jspwiki + UTF-8 issue
  • Contact Svenska bibelsällskapet
  • Add issue to forrest issue tracker about utf-8 ihtml documents.
    • Check for existing bug reports, it might already be in there
  • Continue forrest i18n research
    • Sometimes abides the =locale=se/smj/fi in the url (the document gets correctly selected). The menus and tabs are translated to the language that the browser asks for.
      • A lot is up for i18n within Forrest. Suggestion: we wait and see a while.
  • discuss with Anders Kintel about whether we could get the Sámi-Norwegian dictionary ahead of acceptance
    • wait till Thomas is back from vacation
  • add the Oslo contract to cvs, and begin writing a contract draft
  • Follow up on CVS mailing:
    • check that all received forwarded mail
    • set up Thomas and Maaren when they're back from vacation

Sjur

  • risten.no bugs and fixes
  • complete the action summary after our half-year evaluation
  • follow up on:
    • voice group-chat not working to Sámediggi
    • Maaren has problems with SubEthaEdit (can't connect)
  • To the board:
    • write proposal for permanent maintenance organisation
    • write draft specification for the outsourced tasks
    • write half-yearly project report with progress and bugdet status
    • write agenda
    • Deadline for the board tasks: 3 weeks ahead of meeting (date not decided, but in September)
  • add Divvun to the UNESCO Hall-of-Fame list

Tomi

  • Aspell: Create the affix file, work on automatic conversion from LEXC to affix file
  • investigate Aspell/MySpell/OOo issues
  • three-part compounding
  • decapitalisation of proper nouns when (if?) compounded, and when derived (the oslolaš issue)
  • corpus infrastructure: dtd location (both public and internal)
  • corpus infrastructure: file and dir organisation
  • Follow up on CVS mailing:
    • check that all received forwarded mail
    • set up Thomas and Maaren when they're back from vacation

On vacation

Tasks are transferred to the following weeks until they return from vacation, to keep the tasks alive and updated. New tasks may be added: -)

Maaren

  • The missing list, both the overall missing list from our xml corpus, and a file-for-file review, in order to get different terminology
  • Go through the missing list from risten.no
  • send new letter to SGL about Lule Sámi possessives

Thomas

  • write new letter to SGL about Lule Sámi possessives
  • work with Lule Sami substantives

Trond

  • Work on the bug list (Lule Sámi).
  • Work on compounds (three-part + oslolaš, with Tomi)
  • Work on the corpus interface (with Lars)
  • Corpus infrastructure: dtd location
  • Get the new version of the New Testament
  • Check Hans-Ragnars names.

8. Next meeting, closing

15.08.2005 10: 00

Closed at 12: 08