Pre- and postprocessing

Processing text between different formats, as done e.g. by the scripts preprocess, lookup2cg, etc. Corpus-related pre- and postprocessing tools also come here (aligner, lg recognition)

Select a component to see open bugs in that component.

Component Default Assignee
abbr.txt Ciprian Gerstenberger
The abbreviation and idiom component
casing and spellrelax Trond Trosterud
The case.regex, spellrelax.regex and allcaps.regex xfst files.
catxml Børre Gaup
Bugs relating to our corpus extraction tool catxml
ccat Børre Gaup
Bugs connected to our corpus extraction tool ccat (ccat has replaced xmlcat)
cg2visl Ciprian Gerstenberger
Converting text from vislcg output to the input required by the pedagogical visl program Ciprian Gerstenberger is a perl script to take the content of

tags of xml files out, send them to standard grammatical analysis, and put them back again.

hfst-preprocess Sjur Nørstebø Moshagen
Language recognision with text_cat Ciprian Gerstenberger
The text_cat tool with its accompanying language files are used to detect language in mixed language documents.
lookup2cg Ciprian Gerstenberger
Changing text from lookup output to vislcg input.
preprocess file Ciprian Gerstenberger
The perl file
Text aligner Ciprian Gerstenberger
The parallel text aligner tool from Bergen and its behaviour.
The conversion scripts Ciprian Gerstenberger
The scripts for converting files, especially from external formats to our internal latin-1-and-digraphs format