Pre- and postprocessing

Processing text between different formats, as done e.g. by the scripts preprocess, lookup2cg, etc. Corpus-related pre- and postprocessing tools also come here (aligner, lg recognition)

Select a component to see open bugs in that component.

Component Default Assignee
abbr.txt Ciprian Gerstenberger
The abbreviation and idiom component
casing and spellrelax Trond Trosterud
The case.regex, spellrelax.regex and allcaps.regex xfst files.
catxml Tomi Pieski
Bugs relating to our corpus extraction tool catxml
ccat Tomi Pieski
Bugs connected to our corpus extraction tool ccat (ccat has replaced xmlcat)
cg2visl Ciprian Gerstenberger
Converting text from vislcg output to the input required by the pedagogical visl program
corpus-analyze.pl Ciprian Gerstenberger
corpus-analyze.pl is a perl script to take the content of

tags of xml files out, send them to standard grammatical analysis, and put them back again.

hfst-preprocess Sjur Nørstebø Moshagen
hfst-preprocess
Language recognision with text_cat Ciprian Gerstenberger
The text_cat tool with its accompanying language files are used to detect language in mixed language documents.
lookup2cg Ciprian Gerstenberger
Changing text from lookup output to vislcg input.
preprocess file Ciprian Gerstenberger
The perl file
Text aligner Ciprian Gerstenberger
The parallel text aligner tool from Bergen and its behaviour.
The conversion scripts Ciprian Gerstenberger
The scripts for converting files, especially from external formats to our internal latin-1-and-digraphs format