Opened 13 years ago

Closed 13 years ago

#727 closed defect (fixed)

Filename encodings again

Reported by: ak19 Owned by: nobody
Priority: high Milestone: 2.84 Release
Component: GLI Severity: major
Keywords: Cc:

Description

  1. Dr Bainbridge committed some important changes to Greenstone's Perl code, that makes the strings in Perl unicode-aware for regex matching.
  1. The above required changes in GLI too:

GLI now has a gs.FilenameEncoding metadata field which appears like all the others in GLI's EnrichPane, but is unique in that this metadata (once set, changed or removed) must be applied to the affected filenames in the Collection Tree.

More importantly, the changes made for this are to allow GLI's java code to interact with the recent changes to Perl where strings were made unicode-aware (for proper regex matching) but which required other changes elsewhere. To still support filenames with different encodings Perl used URL encoded versions of filenames representing characters' code point values in URL encoding. This required that GLI write out URL encoded filenames to the metadata.xml files that are associated with each folder level of a collection, so that Perl can read them. In this way, they can both speak of the same filenames.

Note that these changes to GLI only work on unicode 16 (such as latin-1), non-UTF8 systems. The latter is a requirement since Java uses the filesystem encoding from startup. If it is UTF8, non-recognised characters are replaced by the invalid char for UTF8. This process being destructive, we can't get the original filenames' bytecodes back. The changes made to GLI will work on Windows which is UTF-16 (windows codepage 1252), presumably also Macs (some kind of UTF-16) and also works on Native Latin 1 Linux systems. UTF-8 Linux systems need to be reconfigured to Native Latin-1, or if not installed, an administrator can install it easily.

Revision 23433 - GLI files changed :

+ ADDED: src\org\greenstone\gatherer\metadata\FilenameEncoding.java

+ classes\dictionary.properties + metadata\greenstone.mds + src\org\greenstone\gatherer\Gatherer.java + src\org\greenstone\gatherer\collection\CollectionManager.java + src\org\greenstone\gatherer\collection\CollectionTreeNode.java + src\org\greenstone\gatherer\file\FileNode.java + src\org\greenstone\gatherer\gui\EnrichPane.java + src\org\greenstone\gatherer\gui\GUIManager.java + src\org\greenstone\gatherer\metadata\MetadataValueTableModel.java + src\org\greenstone\gatherer\metadata\MetadataXMLFile.java + src\org\greenstone\gatherer\metadata\MetadataXMLFileManager.java + src\org\greenstone\gatherer\util\SynchronizedTreeModelTools.java

Change History (3)

comment:1 by ak19, 13 years ago

INSTALLING AND APPLYING A NEW FILESYSTEM ENCODING ON A LINUX MACHINE:

The instructions are from the thread of questions and answers at http://mail.openjdk.java.net/pipermail/i18n-dev/2008-September/000048.html and the page http://ubuntuforums.org/showthread.php?t=423039

A) Need admin rights: INSTALLATION OF A FILESYSTEM ENCODING: (May not be required, check if Native Latin-1 is already installed on the machine first)

  1. Need to open /var/lib/locales/supported.d/local as root and, at the

bottom of the file, add

en_US.ISO-8859-1 ISO-8859-1

  1. Repeat step 1 with file /var/lib/locales/supported.d/en

(3. ONLY IF making this language encoding the system default would we need to open /etc/default/locale as root and change LANG="en_US.UTF-8" to LANG="en_US". Or possibly LANG="en_US.ISO-8859-1".)

  1. Then in an x-term we need to run:

sudo locale-gen --purge

  1. Then restart the machine.

The above steps need to be carried out once for en_US.ISO-8859-1 to be supported on the machine. In contrast, the following steps might or might not be saved between sessions on the Ubuntu. For that one may need to make ISO8859-1 the default on the machine, as in step 3, which may not be what we want to advise users to do.

Our setup.bash script could perhaps do steps 6 and 7 below and then try running

locale -k LC_CTYPE | grep charmap

to work out whether it succeeded. If it failed, it could print out a message informing the user they need to install en_US.ISO8859-1 and point them to a wiki page to find more information.

B) Don't need admin rights: APPLYING Native Latin-1 (OR OTHER ENCODINGS) AS FILESYSTEM AND DISPLAY ENCODINGS:

  1. export LC_ALL=en_US.ISO8859-1
  1. export LANG=en_US.ISO8859-1

(8. And then we can check it worked by running:

locale

Or by running:

locale -k LC_CTYPE | grep charmap)

comment:2 by ak19, 13 years ago

Revision 23455: committed src\org\greenstone\gatherer\collection\CollectionTree.java

Dr Bainbridge fixed the problem with the metadata data getting mixed up at random sometimes when moving folders in the CollectionTree. The metadata was previously getting mixed up in the MetadataValueTable because of a race condition/state inconsistency due to the table trying to update itself when the collection tree was getting updated: when nodes are selected in the tree, which happens on Tree refresh (including file/directory move operations), it fires a valueChanged event which the Table responds to. However, since the filenodes of the tree (and so their associated metadata) are in flux at that point, the table should hold off updating its metadata on such an occasion. The current fix is to clear the selection in the tree upon the drop event of a drag and drop in the tree: no nodes are selected, no selection changed events are fired, the table does not even get displayed because nothing is selected.

comment:3 by ak19, 13 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.