Opened 13 years ago
Closed 13 years ago
#727 closed defect (fixed)
Filename encodings again
Reported by: | ak19 | Owned by: | nobody |
---|---|---|---|
Priority: | high | Milestone: | 2.84 Release |
Component: | GLI | Severity: | major |
Keywords: | Cc: |
Description
- Dr Bainbridge committed some important changes to Greenstone's Perl code, that makes the strings in Perl unicode-aware for regex matching.
- The above required changes in GLI too:
GLI now has a gs.FilenameEncoding metadata field which appears like all the others in GLI's EnrichPane, but is unique in that this metadata (once set, changed or removed) must be applied to the affected filenames in the Collection Tree.
More importantly, the changes made for this are to allow GLI's java code to interact with the recent changes to Perl where strings were made unicode-aware (for proper regex matching) but which required other changes elsewhere. To still support filenames with different encodings Perl used URL encoded versions of filenames representing characters' code point values in URL encoding. This required that GLI write out URL encoded filenames to the metadata.xml files that are associated with each folder level of a collection, so that Perl can read them. In this way, they can both speak of the same filenames.
Note that these changes to GLI only work on unicode 16 (such as latin-1), non-UTF8 systems. The latter is a requirement since Java uses the filesystem encoding from startup. If it is UTF8, non-recognised characters are replaced by the invalid char for UTF8. This process being destructive, we can't get the original filenames' bytecodes back. The changes made to GLI will work on Windows which is UTF-16 (windows codepage 1252), presumably also Macs (some kind of UTF-16) and also works on Native Latin 1 Linux systems. UTF-8 Linux systems need to be reconfigured to Native Latin-1, or if not installed, an administrator can install it easily.
Revision 23433 - GLI files changed :
+ ADDED: src\org\greenstone\gatherer\metadata\FilenameEncoding.java
+ classes\dictionary.properties + metadata\greenstone.mds + src\org\greenstone\gatherer\Gatherer.java + src\org\greenstone\gatherer\collection\CollectionManager.java + src\org\greenstone\gatherer\collection\CollectionTreeNode.java + src\org\greenstone\gatherer\file\FileNode.java + src\org\greenstone\gatherer\gui\EnrichPane.java + src\org\greenstone\gatherer\gui\GUIManager.java + src\org\greenstone\gatherer\metadata\MetadataValueTableModel.java + src\org\greenstone\gatherer\metadata\MetadataXMLFile.java + src\org\greenstone\gatherer\metadata\MetadataXMLFileManager.java + src\org\greenstone\gatherer\util\SynchronizedTreeModelTools.java
Change History (3)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Revision 23455: committed src\org\greenstone\gatherer\collection\CollectionTree.java
Dr Bainbridge fixed the problem with the metadata data getting mixed up at random sometimes when moving folders in the CollectionTree. The metadata was previously getting mixed up in the MetadataValueTable because of a race condition/state inconsistency due to the table trying to update itself when the collection tree was getting updated: when nodes are selected in the tree, which happens on Tree refresh (including file/directory move operations), it fires a valueChanged event which the Table responds to. However, since the filenodes of the tree (and so their associated metadata) are in flux at that point, the table should hold off updating its metadata on such an occasion. The current fix is to clear the selection in the tree upon the drop event of a drag and drop in the tree: no nodes are selected, no selection changed events are fired, the table does not even get displayed because nothing is selected.
comment:3 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
INSTALLING AND APPLYING A NEW FILESYSTEM ENCODING ON A LINUX MACHINE:
The instructions are from the thread of questions and answers at http://mail.openjdk.java.net/pipermail/i18n-dev/2008-September/000048.html and the page http://ubuntuforums.org/showthread.php?t=423039
A) Need admin rights: INSTALLATION OF A FILESYSTEM ENCODING: (May not be required, check if Native Latin-1 is already installed on the machine first)
bottom of the file, add
en_US.ISO-8859-1 ISO-8859-1
(3. ONLY IF making this language encoding the system default would we need to open /etc/default/locale as root and change LANG="en_US.UTF-8" to LANG="en_US". Or possibly LANG="en_US.ISO-8859-1".)
sudo locale-gen --purge
The above steps need to be carried out once for en_US.ISO-8859-1 to be supported on the machine. In contrast, the following steps might or might not be saved between sessions on the Ubuntu. For that one may need to make ISO8859-1 the default on the machine, as in step 3, which may not be what we want to advise users to do.
Our setup.bash script could perhaps do steps 6 and 7 below and then try running
locale -k LC_CTYPE | grep charmap
to work out whether it succeeded. If it failed, it could print out a message informing the user they need to install en_US.ISO8859-1 and point them to a wiki page to find more information.
B) Don't need admin rights: APPLYING Native Latin-1 (OR OTHER ENCODINGS) AS FILESYSTEM AND DISPLAY ENCODINGS:
(8. And then we can check it worked by running:
locale
Or by running:
locale -k LC_CTYPE | grep charmap)