As we agreed upon, a dir in the corpus should contain either dirs only or files only. This is not the case any longer, obviously with data coming from the Finnish side. The following dirs contain both dirs and files: freecorpus/2015-06-24/sme/facta/klemetti.blogspot.com/2009/ freecorpus/2015-06-24/sme/facta/lundui.fi/ freecorpus/2015-06-24/sme/facta/lundui.fi/aigeguovdil/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/albmotmeahcit/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/albmotmeahcit/nuuksio/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/earaguovllut/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/historjacuozahagat/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/ceavetjavribuolbmatjavri/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/finnmarkkubalggis/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/geavujohtolat/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/heahttaballasjohtolat/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/kalohttageinnodat/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/njuohttejohka/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/piilolabalggis/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/luondduguovddazat/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/samimeahcceguovllut/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardancuozahagat/stobut/ freecorpus/2015-06-24/sme/facta/lundui.fi/vanddardeamiabc/ freecorpus/2015-06-24/sme/laws/finland/
Here is an updated list with the test in the freecorpus/orig directory, so there are even more dirs with mixed content there. orig/fin/facta/klemetti.blogspot.com/2009/ orig/fin/laws/finland/ orig/sme/facta/klemetti.blogspot.com/2009/ orig/sme/facta/lundui.fi/ orig/sme/facta/lundui.fi/aigeguovdil/ orig/sme/facta/lundui.fi/vanddardancuozahagat/ orig/sme/facta/lundui.fi/vanddardancuozahagat/albmotmeahcit/ orig/sme/facta/lundui.fi/vanddardancuozahagat/albmotmeahcit/nuuksio/ orig/sme/facta/lundui.fi/vanddardancuozahagat/earaguovllut/ orig/sme/facta/lundui.fi/vanddardancuozahagat/historjacuozahagat/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/ceavetjavribuolbmatjavri/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/finnmarkkubalggis/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/geavujohtolat/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/heahttaballasjohtolat/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/kalohttageinnodat/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/njuohttejohka/ orig/sme/facta/lundui.fi/vanddardancuozahagat/johtolagat/piilolabalggis/ orig/sme/facta/lundui.fi/vanddardancuozahagat/luondduguovddazat/ orig/sme/facta/lundui.fi/vanddardancuozahagat/samimeahcceguovllut/ orig/sme/facta/lundui.fi/vanddardancuozahagat/stobut/ orig/sme/facta/lundui.fi/vanddardeamiabc/ orig/sme/laws/finland/ orig/smj/admin/depts/regjeringen.no/
Fixed in commits freecorpus r4880-4487
The following directories have mixed content: 01_2017-06-22/fc/sme/admin/allaskuvla dirs_|1| files_|22| 01_2017-06-22/fc/sme/blogs dirs_|2| files_|6| 01_2017-06-22/fc/sme/facta/samediggi.fi dirs_|1| files_|37| 01_2017-06-22/fc/smn/facta/samediggi.fi dirs_|1| files_|4| Moreover the klementiblog should be moved from the facta to the newly created domain "blogs".