Context Navigation

← Previous Change
Next Change →

DocXMLFileManager.java

Timestamp:

2020-10-22T01:48:03+13:00 (4 years ago)

Author:

ak19

Message:

Redoing work of commit revision 34394: Redoing Bugfix 1 for GLI doc.xml metadata slowdown resulting from earlier bugfix to help GLI cope with filenames and assigned meta that have non-ASCII chars in them. The slowdown happened when gathered files got selected in GLI and was fixed in commit 34394, but the fix was not ideal for 2 reasons. 1. A new form of filename encoding (hexed unicode) going into doc.xml, instead of existing encodings like URL and base64, though those existing encodings weren't the right ones for my first solution. 2. The solution was specific to Windows to cope with special chars in filenames and relied on a new meta field gsdlfullsourcepath being written out to doc.xml by doc.pm. So a built collection moved from Linux to Windows won't show up doc.xml meta in GLI, as it won't have the new doc.xml meta field that Windows is expecting. Have a better solution for 1 that doesn't require the new field. But still can't fix all of point 2, as the existing gsdlsourcefilename meta field in doc.xml can contain Windows Short filenames when the coll is built on Windows and this won't be backwards compatible on Linux anyway. This problem existed before too, except I didn't realise it until now. But the new solution fixes more issues. Second step: modified DocXMLFile to no longer use the new field gsdlfullsourcepath, but return to using gsdlsourcefilename field. This time however, the code is optimised to detect a filename match between doc.xml and any file selected in GLI by storing gsdlsourcefilename in its Long filename form whenever doc.xml had stored it in Win 8.3 Short filename form. The Long filename can be obtained for any file that exists by calling getCanonicalPath(). Of course, the full filename was not stored in gsdlsourcefilename, rather the filename from import folder onwards. So to ensure a file by that filename in long form has a chance of existing, first prefixed the current collection folder and then checked for existence before obtaining the canonical form for it. This is then stored in the hashmap in place of any win short filename. Now a match is more readily found without using any hex encoded unicode filenames stored by doc.pm, and without using the older and inefficient method of making cmd calls to DOS to calculate the Win 8.3 Short filename for each selected file.

File:

: 1 edited

main/trunk/gli/src/org/greenstone/gatherer/metadata/DocXMLFileManager.java (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

main/trunk/gli/src/org/greenstone/gatherer/metadata/DocXMLFileManager.java

-              r34394
+              r34507
         file_relative_path = file_relative_path.substring(import_index + "import".length() + 1);
+    }
+    String searchFileName = DocXMLFile.isWin ? Utility.stringToHex(file_relative_path) : file_relative_path;
     // Build up a list of metadata values extracted from this file
     ArrayList metadata_values = new ArrayList();
 …
         DocXMLFile doc_xml_file = (DocXMLFile) doc_xml_files.get(i);
         ///System.err.println("@@@@ Looking at doc.xml file: " + doc_xml_files.get(i));
         metadata_values.addAll(doc_xml_file.getMetadataExtractedFromFile(file, searchFileName));
+        metadata_values.addAll(doc_xml_file.getMetadataExtractedFromFile(file, file_relative_path));
+    }

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 34507 for main/trunk/gli/src/org/greenstone/gatherer/metadata/DocXMLFileManager.java

Legend:

main/trunk/gli/src/org/greenstone/gatherer/metadata/DocXMLFileManager.java

Download in other formats: