12/09/10 22:27:33 (10 years ago)

GLI now has a gs.FilenameEncoding metadata field which appears like all the others in GLI's EnrichPane, but is unique in that this metadata (once set, changed or removed) must be applied to the affected filenames in the Collection Tree. More importantly, the changes made for this are to allow GLI's java code to interact with the recent changes to Perl where strings were made unicode-aware (for proper regex matching) but which required other changes elsewhere. To still support filenames with different encodings Perl used URL encoded versions of filenames representing characters' code point values in URL encoding. This required that GLI write out URL encoded filenames to the metadata.xml files that are associated with each folder level of a collection, so that Perl can read them. In this way, they can both speak of the same filenames. Only works on unicode 16 (such as latin-1), non-UTF8 systems. The latter is a requirement since Java uses the filesystem encoding from startup. If it is UTF8, non-recognised characters are replaced by the invalid char for UTF8. This process being destructive, we can't get the original filenames' bytecodes back. The changes made to GLI will work on Windows which is UTF-16 (windows codepage 1252), presumably also Macs (some kind of UTF-16) and also works on Native Latin 1 Linux systems. UTF-8 Linux systems need to be reconfigured to Native Latin-1, or if not installed, an administrator can install it easily.

1 edited


  • main/trunk/gli/metadata/greenstone.mds

    r23393 r23433  
    4343    <Language code="en">
    4444      <Attribute name="label">Filename Encoding</Attribute>
    45       <Attribute name="definition">TODO</Attribute>
     45      <Attribute name="definition">The encoding of the filename. If this is known, it can be manually set here.</Attribute>
     46      <Attribute name="comment">If not manually specified, Greenstone will try to guess the encoding of the filename upon building, which may or may not be correct.</Attribute>
    4647    </Language>
    4748  </Element>
Note: See TracChangeset for help on using the changeset viewer.