Dragging and dropping files with alien file encodings on Linux
|Reported by:||ak19||Owned by:||ak19|
|Priority:||moderate||Milestone:||Collection building wishlist|
|Keywords:||GLI, encoding, multilingual||Cc:|
The problem: On Linux, GLI does not recognise that files with Latin-1 (ISO-8859-1) filenames even exist. Dragging and dropping does not work therefore. Perl and Linux are able to copy such files and delete them.
At present, people who want to go through GLI to build collections containing such files need to manually put them into the import folder using Linux' file explorer.
The problem lies in the fact that Java's File class stores File info (the pathname) as a String. On Linux, it starts presuming that filenames must therefore be UTF8. Instead of preserving the bytevalues or URL-encoded URI of a filename, it replaces those bytevalues that make for invalid UTF8 with UTF8's "invalid character". This char is the same for all chars that are invalid for UTF8. Therefore the conversion from bytes to UTF8 was a destructive operation and the String filename stored in the File datastructure is wrong.
The proposed solution (Dr Bainbridge's idea):
- Another listFiles() should be implemented in Perl and return an array of URL encoded file and dir names. This should be called in all places listFiles() was called before, instead of Java's default File.listFiles()
- The FileNode and FileJob/FileQueue classes of GLI will not only have to call the new custom listFiles, they will also have to call Perl code for copying, moving and deleting files (and checking whether they exist). All calls to these operations have to go through the Perl code rather than through Java's File class.
- Since invoking Perl will be more timeconsuming than using Java's File, we can provide an option in GLI called "Recognise alien file system encodings" for filenames. That way the specific processing that is only ever required for specially encoded filenames need not be done unless the GLI user is sure that they are working with such files.