UIT The arctic university of Norway > Giellatekno
 

corpus_maintenance

Corpus maintenance

This document keeps track of measures to improve the corpus collection and conversion process.

Note also the sentence alignment page, which looks into that specific sub-part of the corpus maintenance.

Corpus improvement project meetings

2017

3.3. // 25.4. // 6.9.

2016

26.10. // 02.11. // 16.11. // 25.11.

2014

12.3. //

2012

12.1. // 19.1. // 25.1. // 1.2. // 7.2. // 13.2. // 17.2. // 29.2. // 12.3. // 22.3. // 31.8.

2011

7.4. // 11.4. // 3.5. // 27.6. // 12.9. // 21.9.  // 12.10. // 7.11. // 11.11. // 25.11. // 28.11. // 8.12. // 14.12. // 20.12.

Tasks

Corpus conversion targets

As a reminder, this is what we aim at:

  1. Sentence-aligned bilingual corpus for CAT
  2. Analysed mono- and bilingual corpus
    1. Lemmatised and word-aligned for terminology
    2. Fully analysed and presented for linguistic work and terminology
  3. Among other things, a one-click corpus like [europarl]

OCR and conversion errors leftover from spring 2011