Bug 2146

Summary: German compound nouns introduce problems with case handling
Product: Infrastructure Reporter: Jack Rueter <rueter.jack>
Component: newinfraAssignee: Sjur Nørstebø Moshagen <sjur.n.moshagen>
Status: ASSIGNED ---    
Severity: enhancement CC: lene.antonsen, sjur.n.moshagen, thomas.omma, trond.trosterud
Priority: P5 - Later    
Version: unspecified   
Hardware: Macintosh   
OS: Other   

Description Jack Rueter 2016-01-13 10:47:21 CET
Uppercasing in German nouns affects down-casing of non-initial nouns in German compound nouns, and also up-casing of derived nouns from verbs and adjectives.

One solution to uppercasing in German nouns could be:
Flag diacritics:

Cf. main/langs/sme/src/morphology/root.lexc

Results vary, however, when we come to consider yaml tests:

Using the analyzer:
Äpfel	Apfel+N+Msc+Pl+Acc	0.000000
Äpfel	Apfel+N+Msc+Pl+Gen	0.000000
Äpfel	Apfel+N+Msc+Pl+Nom	0.000000

does not have symmetry in
the generator:

Apfel+N+Msc+Pl+Acc	äpfel	0.000000
Apfel+N+Msc+Pl+Gen	äpfel
Apfel+N+Msc+Pl+Nom	äpfel	0.000000

Whereas optional upper-casing working sentence-initially has worked for other languages. German presents something that will need a little language-specific work.

This optional uppercasing is done as part of the regular compilation,
which means that we need a pre-tmp file for language-specific pre-processing before the language-independent compilation steps.

Thanks for the discussion, Sjur.
Comment 1 Sjur Nørstebø Moshagen 2016-01-29 09:54:23 CET
Changed subject line from "German compound nouns introduce problems with upcasing that are observed in yaml test analyses and generation asymmetry." to "German compound nouns introduce problems with case handling" - long subject lines tend to make the bug lists harder to read.