source: main/trunk/gli/src/org/greenstone/gatherer/metadata/MetadataXMLFile.java@ 33727

Last change on this file since 33727 was 33727, checked in by ak19, 4 years ago

Experimental encoding related bugfix to GLI. In GLI, meta assigned at file level to filenames with non-ascii chars were not sticking to the file, because repeated entries were written out to metadata.xml under 2 variants of the filename but never loaded back into GLI again. This problem was not apparent with the old FilenameEncodings test set of docs or Kathy's complex test case of Russian filenames gathered into a folder structure. In the latter case, meta was assigned at folder level and so the regex to match was .* which is just ASCII. Neither test document sets were tested with meta assigned at file level. I can't now remember whether we tested today whether assigning file level meta to docs in the FilenameEncodings test set worked or not, but if it did, maybe that was because the special characters were not too complex and just Latin-1 or Win codepage 850 (like 1252) for the docs where meta was assigned. In any case, with test docs where filenames had A-macrons in them, the problem showed up and also in the Russian test set if meta got assigned at doc level. GLI was correctly saving filenames that had meta into metadata.xml as hex-encoded filenames the first time around. It just wasn't comparing them to hex values on subsequent times, and thus not finding a match. Method FilenameEncoding.fileNameToHex() introduced to fix this (experimental, need to run some questions by Dr Bainbridge). For all current tests, this appears to have fixed it. However, there must be somewhere else that ex.meta is being loaded in, as that is still not appearing for specially named files.

  • Property svn:keywords set to Author Date Id Revision
File size: 36.3 KB
Line 
1/**
2 *############################################################################
3 * A component of the Greenstone Librarian Interface, part of the Greenstone
4 * digital library suite from the New Zealand Digital Library Project at the
5 * University of Waikato, New Zealand.
6 *
7 * Author: Michael Dewsnip, NZDL Project, University of Waikato, NZ
8 *
9 * Copyright (C) 2004 New Zealand Digital Library Project
10 *
11 * This program is free software; you can redistribute it and/or modify
12 * it under the terms of the GNU General Public License as published by
13 * the Free Software Foundation; either version 2 of the License, or
14 * (at your option) any later version.
15 *
16 * This program is distributed in the hope that it will be useful,
17 * but WITHOUT ANY WARRANTY; without even the implied warranty of
18 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19 * GNU General Public License for more details.
20 *
21 * You should have received a copy of the GNU General Public License
22 * along with this program; if not, write to the Free Software
23 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
24 *############################################################################
25 */
26
27package org.greenstone.gatherer.metadata;
28
29
30import java.io.*;
31import java.util.*;
32import org.greenstone.gatherer.DebugStream;
33import org.greenstone.gatherer.collection.CollectionTreeNode;
34import org.greenstone.gatherer.util.XMLTools;
35import org.w3c.dom.*;
36
37
38/** This class represents one metadata.xml file */
39public class MetadataXMLFile
40 extends File
41{
42 static final private String DESCRIPTION_ELEMENT = "Description";
43 static final private String DIRECTORY_FILENAME = ".*";
44 static final private String FILENAME_ELEMENT = "FileName";
45 static final private String FILESET_ELEMENT = "FileSet";
46 static final private String METADATA_ELEMENT = "Metadata";
47 static final private String[] nonEscapingElements = new String[]{FILENAME_ELEMENT};
48
49 /** Special metadata field: the filename encoding is a unique sort of metadata in
50 * that it is not just information stored with a collection file, but also needs to
51 * be applied in real-time to the collection file (to its filename) for display. */
52 static final public String FILENAME_ENCODING_METADATA = "gs.filenameEncoding";
53
54 // To speed things up a bit we keep the last accessed metadata.xml file in memory
55 static private File loaded_file = null;
56 static private Document loaded_file_document = null;
57 static private boolean loaded_file_changed = false;
58
59
60 public MetadataXMLFile(String metadata_xml_file_path)
61 {
62 super(metadata_xml_file_path);
63 }
64
65
66 public void addMetadata(CollectionTreeNode file_node, ArrayList metadata_values)
67 {
68 // If this metadata.xml file isn't the one currently loaded, load it now
69 if (loaded_file != this) {
70 // First we must save out the currently loaded file
71 saveLoadedFile();
72
73 // Parse the metadata.xml file
74 Document document = XMLTools.parseXMLFile(this);
75 if (document == null) {
76 System.err.println("Error: Could not parse metadata.xml file " + getAbsolutePath());
77 return;
78 }
79
80 loaded_file = this;
81 loaded_file_document = document;
82 }
83
84 // Determine the file's path relative to the location of the metadata.xml file
85 String metadata_xml_file_directory_path = FilenameEncoding.fileToURLEncoding(getParentFile());
86 String file_relative_path = file_node.getURLEncodedFilePath().substring(metadata_xml_file_directory_path.length());
87 if (file_relative_path.startsWith(FilenameEncoding.URL_FILE_SEPARATOR)) {
88 file_relative_path = file_relative_path.substring(FilenameEncoding.URL_FILE_SEPARATOR.length());
89 }
90
91 // Form a regular expression that specifies the scope of the metadata
92 String file_path_regexp;
93 if (file_relative_path.equals("")) {
94 // Special case for matching all files in the directory
95 file_path_regexp = DIRECTORY_FILENAME;
96 }
97 else {
98 // Convert the file path into a regular expression that will match it
99 file_path_regexp = MetadataTools.getRegularExpressionThatMatchesFilePath(file_relative_path);
100 }
101
102 //System.err.println("MetadataXMLFile.addMetadata() Adding meta for file regexp: "
103 // + file_path_regexp + " - " + org.greenstone.gatherer.util.Utility.debugUnicodeString(file_path_regexp));
104
105 // Find the appropriate FileSet element for this file
106 Element appropriate_fileset_element = null;
107
108 // Read all the FileSet elements in the file
109 NodeList fileset_elements_nodelist = loaded_file_document.getElementsByTagName(FILESET_ELEMENT);
110 for (int i = 0; i < fileset_elements_nodelist.getLength(); i++) {
111 Element current_fileset_element = (Element) fileset_elements_nodelist.item(i);
112
113 // Check the FileName elements of the FileSet to see if we have a match
114 NodeList filename_elements_nodelist = current_fileset_element.getElementsByTagName(FILENAME_ELEMENT);
115 for (int j = 0; j < filename_elements_nodelist.getLength(); j++) {
116 Element current_filename_element = (Element) filename_elements_nodelist.item(j);
117 String current_filename_element_value = XMLTools.getElementTextValue(current_filename_element);
118
119 // Only exact matches can be extended with new metadata
120 if (current_filename_element_value.equals(file_path_regexp)) {
121 appropriate_fileset_element = current_fileset_element;
122 break;
123 }
124 }
125 }
126
127 // If no appropriate FileSet element exists create a new one for this file
128 if (appropriate_fileset_element == null) {
129 DebugStream.println("Creating new FileSet element for file since none exists..."+file_path_regexp);
130 appropriate_fileset_element = loaded_file_document.createElement(FILESET_ELEMENT);
131
132 Element new_filename_element = loaded_file_document.createElement(FILENAME_ELEMENT);
133 new_filename_element.appendChild(loaded_file_document.createTextNode(file_path_regexp));
134 appropriate_fileset_element.appendChild(new_filename_element);
135
136 Element new_description_element = loaded_file_document.createElement(DESCRIPTION_ELEMENT);
137 appropriate_fileset_element.appendChild(new_description_element);
138
139 // add the fileset element for .* at the top: especially important for
140 // non-accumulating (and override mode) meta. Other type fileset elements can be appended
141 if(file_path_regexp.equals(DIRECTORY_FILENAME)) {
142 loaded_file_document.getDocumentElement().insertBefore(appropriate_fileset_element,
143 loaded_file_document.getDocumentElement().getFirstChild());
144 } else {
145 loaded_file_document.getDocumentElement().appendChild(appropriate_fileset_element);
146 }
147 }
148
149 // Add each of the metadata values to the FileSet's Description element
150 Element description_element = (Element) appropriate_fileset_element.getElementsByTagName(DESCRIPTION_ELEMENT).item(0);
151 for (int i = 0; i < metadata_values.size(); i++) {
152 MetadataValue metadata_value = (MetadataValue) metadata_values.get(i);
153 String metadata_element_name_full = metadata_value.getMetadataElement().getFullName();
154
155 // Remove any characters that are invalid in XML
156 String metadata_value_string = XMLTools.removeInvalidCharacters(metadata_value.getFullValue());
157
158 // Square brackets need to be escaped because they are a special character in Greenstone
159 metadata_value_string = metadata_value_string.replaceAll("\\[", "&#091;");
160 metadata_value_string = metadata_value_string.replaceAll("\\]", "&#093;");
161
162 // the gs.filenameEncoding metadata is unique in that, when added, removed or
163 // changed, it must be applied on the file(name) whose metadata has been adjusted
164 if(metadata_element_name_full.equals(FILENAME_ENCODING_METADATA)) {
165 metadata_value_string = processFilenameEncoding(file_path_regexp,
166 file_node, metadata_value_string, false);
167 // true only if removing meta
168 }
169
170 // Check if this piece of metadata has already been assigned to this FileSet element
171 boolean metadata_already_assigned = false;
172 NodeList metadata_elements_nodelist = description_element.getElementsByTagName(METADATA_ELEMENT);
173 for (int k = 0; k < metadata_elements_nodelist.getLength(); k++) {
174 Element current_metadata_element = (Element) metadata_elements_nodelist.item(k);
175
176 // Check if the metadata element name matches
177 String current_metadata_element_name_full = current_metadata_element.getAttribute("name");
178 if (current_metadata_element_name_full.equals(metadata_element_name_full)) {
179 // if the metadata must not accumulate, then edit the current value
180 if (!metadata_value.isAccumulatingMetadata()) {
181 XMLTools.setNodeText(current_metadata_element, metadata_value_string);
182 metadata_already_assigned = true;
183 break;
184 }
185 // Check if the metadata element value matches
186 String current_metadata_value_string = XMLTools.getElementTextValue(current_metadata_element);
187 if (current_metadata_value_string.equals(metadata_value_string)) {
188 // Metadata already assigned
189 metadata_already_assigned = true;
190 break;
191 }
192 }
193 }
194
195 // If the piece of metadata hasn't already been assigned, add it now
196 if (!metadata_already_assigned) {
197 // Create a new Metadata element to record this metadata
198 Element new_metadata_element = loaded_file_document.createElement(METADATA_ELEMENT);
199 new_metadata_element.setAttribute("name", metadata_value.getMetadataElement().getFullName());
200 new_metadata_element.setAttribute("mode", (metadata_value.isAccumulatingMetadata() ? "accumulate" : "override"));
201 new_metadata_element.appendChild(loaded_file_document.createTextNode(metadata_value_string));
202
203 // Accumulating metadata: add at the end
204 if (metadata_value.isAccumulatingMetadata()) {
205 description_element.appendChild(new_metadata_element);
206 }
207 // Override metadata: add at the start (so it overrides inherited metadata without affecting other assigned metadata)
208 else {
209 description_element.insertBefore(new_metadata_element, description_element.getFirstChild());
210 }
211 }
212 }
213
214 // Remember that we've changed the file so it gets saved when a new one is loaded
215 loaded_file_changed = true;
216 }
217
218
219 public ArrayList getMetadataAssignedToFile(File file, boolean fileEncodingOnly)
220 {
221 // If this metadata.xml file isn't the one currently loaded, load it now
222 if (loaded_file != this) {
223 // First we must save out the currently loaded file
224 saveLoadedFile();
225
226 // Parse the metadata.xml file
227 Document document = XMLTools.parseXMLFile(this);
228 if (document == null) {
229 System.err.println("Error: Could not parse metadata.xml file " + getAbsolutePath());
230 return new ArrayList();
231 }
232
233 loaded_file = this;
234 loaded_file_document = document;
235 }
236
237 // Determine the file's path relative to the location of the metadata.xml file
238 String file_relative_path = FilenameEncoding.fileToURLEncoding(file);
239 File metadata_xml_file_directory = getParentFile();
240 String metadata_xml_file_directory_path = FilenameEncoding.fileToURLEncoding(metadata_xml_file_directory);
241 file_relative_path = file_relative_path.substring(metadata_xml_file_directory_path.length());
242
243 if (file_relative_path.startsWith(FilenameEncoding.URL_FILE_SEPARATOR)) {
244 file_relative_path = file_relative_path.substring(FilenameEncoding.URL_FILE_SEPARATOR.length());
245 }
246
247 // Build up a list of metadata assigned to this file
248 ArrayList metadata_values = new ArrayList();
249
250 // Read all the FileSet elements in the file
251 NodeList fileset_elements_nodelist = loaded_file_document.getElementsByTagName(FILESET_ELEMENT);
252 for (int i = 0; i < fileset_elements_nodelist.getLength(); i++) {
253 Element current_fileset_element = (Element) fileset_elements_nodelist.item(i);
254 boolean current_fileset_matches = false;
255 boolean is_one_file_only_metadata = true;
256 File folder_metadata_inherited_from = null;
257
258 // Check the FileName elements of the FileSet to see if we have a match
259 NodeList filename_elements_nodelist = current_fileset_element.getElementsByTagName(FILENAME_ELEMENT);
260 for (int j = 0; j < filename_elements_nodelist.getLength(); j++) {
261 Element current_filename_element = (Element) filename_elements_nodelist.item(j);
262 String current_filename_element_value = XMLTools.getElementTextValue(current_filename_element);
263
264 String regexed_file_relative_path = MetadataTools.getRegularExpressionThatMatchesFilePath(file_relative_path);
265 //System.err.println("Looking in meta.xml for regexed version of filename: " + regexed_file_relative_path);
266
267 // Does this fileset specify metadata for one file only?
268 is_one_file_only_metadata = true;
269 if (current_filename_element_value.indexOf("*") != -1 && !current_filename_element_value.equals(DIRECTORY_FILENAME)) {
270 // No, it specifies metadata for multiple files (but not all the files in the directory)
271 is_one_file_only_metadata = false;
272 }
273
274 String current_filename_element_value_hex = FilenameEncoding.fileNameToHex(current_filename_element_value);
275
276 // This fileset specifies metadata for the file
277 // MetadataXMLFile.addMetadata(CollectionTreeNode, ArrayList) stored filename in uppercase hex, so need to compare with the same
278 if (file_relative_path.matches(current_filename_element_value_hex)) { //if (file_relative_path.matches(current_filename_element_value)) {
279 //System.err.println("Found a match in meta.xml for file name: " + regexed_file_relative_path);
280 current_fileset_matches = true;
281 if (!file_relative_path.equals("") && current_filename_element_value.equals(DIRECTORY_FILENAME)) {
282 folder_metadata_inherited_from = metadata_xml_file_directory;
283 }
284 break;
285 }
286
287 // This fileset specifies metadata for the folder the file is in
288 if (regexed_file_relative_path.startsWith(current_filename_element_value + FilenameEncoding.URL_FILE_SEPARATOR)) {
289 current_fileset_matches = true;
290 folder_metadata_inherited_from = new File(metadata_xml_file_directory, current_filename_element_value);
291 break;
292 }
293 }
294
295 // The FileSet doesn't apply, so move onto the next one
296 if (current_fileset_matches == false) {
297 continue;
298 }
299
300 // Read all the Metadata elements in the fileset
301 NodeList metadata_elements_nodelist = current_fileset_element.getElementsByTagName(METADATA_ELEMENT);
302 for (int k = 0; k < metadata_elements_nodelist.getLength(); k++) {
303 Element current_metadata_element = (Element) metadata_elements_nodelist.item(k);
304 String metadata_element_name_full = current_metadata_element.getAttribute("name");
305 // if we're only looking for fileEncoding metadata and this isn't it, skip to the next
306 if(fileEncodingOnly && !metadata_element_name_full.equals(FILENAME_ENCODING_METADATA)) {
307 continue;
308 }
309 String metadata_set_namespace = MetadataTools.getMetadataSetNamespace(metadata_element_name_full);
310
311 // Ignore legacy crap
312 if (metadata_set_namespace.equals("hidden")) {
313 continue;
314 }
315
316 MetadataSet metadata_set = MetadataSetManager.getMetadataSet(metadata_set_namespace);
317 if (metadata_set == null) {
318 // The metadata set isn't loaded, so give the option of mapping the element into a loaded set
319 String target_metadata_element_name_full = MetadataSetManager.mapUnloadedMetadataElement(metadata_element_name_full);
320 if (target_metadata_element_name_full == null || target_metadata_element_name_full.equals("")) {
321 // Skip this element if we still don't have a loaded element for it
322 continue;
323 }
324
325 metadata_element_name_full = target_metadata_element_name_full;
326 metadata_set_namespace = MetadataTools.getMetadataSetNamespace(metadata_element_name_full);
327 metadata_set = MetadataSetManager.getMetadataSet(metadata_set_namespace);
328 }
329
330 MetadataElement metadata_element = MetadataTools.getMetadataElementWithName(metadata_element_name_full);
331
332 String metadata_element_name = MetadataTools.getMetadataElementName(metadata_element_name_full);
333 // If the element doesn't exist in the metadata set, we're not interested
334 //Shaoqun modified. It needs to be added to metadata_set because the user might disable skim file
335 if (metadata_element == null) {
336 metadata_element = metadata_set.addMetadataElementForThisSession(metadata_element_name);
337 // continue;
338 }
339
340 // Square brackets need to be escaped because they are a special character in Greenstone
341 String metadata_value_string = XMLTools.getElementTextValue(current_metadata_element);
342 metadata_value_string = metadata_value_string.replaceAll("&#091;", "[");
343 metadata_value_string = metadata_value_string.replaceAll("&#093;", "]");
344
345 MetadataValueTreeNode metadata_value_tree_node = metadata_element.getMetadataValueTreeNode(metadata_value_string);
346
347 // If there is no metadata value tree node for this value, create it
348 if (metadata_value_tree_node == null) {
349 DebugStream.println("Note: No value tree node for metadata value \"" + metadata_value_string + "\"");
350 metadata_element.addMetadataValue(metadata_value_string);
351 metadata_value_tree_node = metadata_element.getMetadataValueTreeNode(metadata_value_string);
352 }
353
354 MetadataValue metadata_value = new MetadataValue(metadata_element, metadata_value_tree_node);
355 metadata_value.inheritsMetadataFromFolder(folder_metadata_inherited_from);
356 metadata_value.setIsOneFileOnlyMetadata(is_one_file_only_metadata);
357
358 // Is this accumulating metadata?
359 if (current_metadata_element.getAttribute("mode").equals("accumulate")) {
360 metadata_value.setIsAccumulatingMetadata(true);
361 }
362
363 // Add the new metadata value to the list
364 metadata_values.add(metadata_value);
365 }
366 }
367
368 return metadata_values;
369 }
370
371
372 public void removeMetadata(CollectionTreeNode file_node, ArrayList metadata_values)
373 {
374 // If this metadata.xml file isn't the one currently loaded, load it now
375 if (loaded_file != this) {
376 // First we must save out the currently loaded file
377 saveLoadedFile();
378
379 // Parse the metadata.xml file
380 Document document = XMLTools.parseXMLFile(this);
381 if (document == null) {
382 System.err.println("Error: Could not parse metadata.xml file " + getAbsolutePath());
383 return;
384 }
385
386 loaded_file = this;
387 loaded_file_document = document;
388 }
389
390 // Determine the file's path relative to the location of the metadata.xml file
391 String metadata_xml_file_directory_path = FilenameEncoding.fileToURLEncoding(getParentFile());
392 String file_relative_path = file_node.getURLEncodedFilePath().substring(metadata_xml_file_directory_path.length());
393 if (file_relative_path.startsWith(FilenameEncoding.URL_FILE_SEPARATOR)) {
394 file_relative_path = file_relative_path.substring(FilenameEncoding.URL_FILE_SEPARATOR.length());
395 }
396
397 // Form a regular expression that specifies the scope of the metadata
398 String file_path_regexp;
399 if (file_relative_path.equals("")) {
400 // Special case for matching all files in the directory
401 file_path_regexp = DIRECTORY_FILENAME;
402 }
403 else {
404 // Convert the file path into a regular expression that will match it
405 file_path_regexp = MetadataTools.getRegularExpressionThatMatchesFilePath(file_relative_path);
406 }
407
408 // Find the appropriate FileSet element for this file
409 Element appropriate_fileset_element = null;
410
411 // Read all the FileSet elements in the file
412 NodeList fileset_elements_nodelist = loaded_file_document.getElementsByTagName(FILESET_ELEMENT);
413 for (int i = 0; i < fileset_elements_nodelist.getLength(); i++) {
414 Element current_fileset_element = (Element) fileset_elements_nodelist.item(i);
415
416 // Check the FileName elements of the FileSet to see if we have a match
417 NodeList filename_elements_nodelist = current_fileset_element.getElementsByTagName(FILENAME_ELEMENT);
418 for (int j = 0; j < filename_elements_nodelist.getLength(); j++) {
419 Element current_filename_element = (Element) filename_elements_nodelist.item(j);
420 String current_filename_element_value = XMLTools.getElementTextValue(current_filename_element);
421
422 // Only exact matches can be extended with new metadata
423 if (current_filename_element_value.equals(file_path_regexp)) {
424 appropriate_fileset_element = current_fileset_element;
425 break;
426 }
427 }
428 }
429
430 // If no appropriate FileSet element exists the metadata isn't assigned in this metadata.xml file
431 if (appropriate_fileset_element == null) {
432 DebugStream.println("Note: No appropriate FileSet element found when removing metadata from " + this);
433 return;
434 }
435
436 // Remove each of the metadata values from the FileSet's Description element
437 for (int i = 0; i < metadata_values.size(); i++) {
438 MetadataValue metadata_value = (MetadataValue) metadata_values.get(i);
439
440 // Remove any characters that are invalid in XML
441 String metadata_value_string = XMLTools.removeInvalidCharacters(metadata_value.getFullValue());
442
443 // Square brackets need to be escaped because they are a special character in Greenstone
444 metadata_value_string = metadata_value_string.replaceAll("\\[", "&#091;");
445 metadata_value_string = metadata_value_string.replaceAll("\\]", "&#093;");
446
447 // Find the Metadata element to delete from the fileset
448 String metadata_element_name_full = metadata_value.getMetadataElement().getFullName();
449 NodeList metadata_elements_nodelist = appropriate_fileset_element.getElementsByTagName(METADATA_ELEMENT);
450 for (int k = 0; k < metadata_elements_nodelist.getLength(); k++) {
451 Element current_metadata_element = (Element) metadata_elements_nodelist.item(k);
452
453 // Check the metadata element name matches
454 String current_metadata_element_name_full = current_metadata_element.getAttribute("name");
455 if (current_metadata_element_name_full.equals(metadata_element_name_full)) {
456 // Check the metadata element value matches
457 String current_metadata_value_string = XMLTools.getElementTextValue(current_metadata_element);
458 if (current_metadata_value_string.equals(metadata_value_string)) {
459
460 // Remove this Metadata element
461 current_metadata_element.getParentNode().removeChild(current_metadata_element);
462
463 // the gs.filenameEncoding metadata is unique in that, when added, removed or
464 // changed, it must be applied on the file(name) whose metadata has been adjusted
465 if(current_metadata_element_name_full.equals(FILENAME_ENCODING_METADATA)) {
466
467 // metadata_value_string will hereafter be the inherited gs.FilenameEncoding
468 // metadata (if any), now that the value at this level has been removed
469 metadata_value_string = processFilenameEncoding(file_path_regexp,
470 file_node, "", true); // true only if *removing* this meta
471 }
472
473 // If there are no Metadata elements left now, remove the (empty) FileSet element
474 if (metadata_elements_nodelist.getLength() == 0) {
475 appropriate_fileset_element.getParentNode().removeChild(appropriate_fileset_element);
476 }
477
478 break;
479 }
480 }
481 }
482 }
483
484 // Remember that we've changed the file so it gets saved when a new one is loaded
485 loaded_file_changed = true;
486 }
487
488
489 public void replaceMetadata(CollectionTreeNode file_node, MetadataValue old_metadata_value, MetadataValue new_metadata_value)
490 {
491 // If this metadata.xml file isn't the one currently loaded, load it now
492 if (loaded_file != this) {
493 // First we must save out the currently loaded file
494 saveLoadedFile();
495
496 // Parse the metadata.xml file
497 Document document = XMLTools.parseXMLFile(this);
498 if (document == null) {
499 System.err.println("Error: Could not parse metadata.xml file " + getAbsolutePath());
500 return;
501 }
502
503 loaded_file = this;
504 loaded_file_document = document;
505 }
506
507 // Determine the file's path relative to the location of the metadata.xml file
508 String metadata_xml_file_directory_path = FilenameEncoding.fileToURLEncoding(getParentFile());
509 String file_relative_path = file_node.getURLEncodedFilePath().substring(metadata_xml_file_directory_path.length());
510 if (file_relative_path.startsWith(FilenameEncoding.URL_FILE_SEPARATOR)) {
511 file_relative_path = file_relative_path.substring(FilenameEncoding.URL_FILE_SEPARATOR.length());
512 }
513
514 // Form a regular expression that specifies the scope of the metadata
515 String file_path_regexp;
516 if (file_relative_path.equals("")) {
517 // Special case for matching all files in the directory
518 file_path_regexp = DIRECTORY_FILENAME;
519 }
520 else {
521 // Convert the file path into a regular expression that will match it
522 file_path_regexp = MetadataTools.getRegularExpressionThatMatchesFilePath(file_relative_path);
523 }
524
525 // Remove any characters that are invalid in XML
526 String old_metadata_value_string = XMLTools.removeInvalidCharacters(old_metadata_value.getFullValue());
527 String new_metadata_value_string = XMLTools.removeInvalidCharacters(new_metadata_value.getFullValue());
528
529 // Square brackets need to be escaped because they are a special character in Greenstone
530 old_metadata_value_string = old_metadata_value_string.replaceAll("\\[", "&#091;");
531 old_metadata_value_string = old_metadata_value_string.replaceAll("\\]", "&#093;");
532 new_metadata_value_string = new_metadata_value_string.replaceAll("\\[", "&#091;");
533 new_metadata_value_string = new_metadata_value_string.replaceAll("\\]", "&#093;");
534
535 // Read all the FileSet elements in the file
536 NodeList fileset_elements_nodelist = loaded_file_document.getElementsByTagName(FILESET_ELEMENT);
537 for (int i = 0; i < fileset_elements_nodelist.getLength(); i++) {
538 Element current_fileset_element = (Element) fileset_elements_nodelist.item(i);
539 boolean current_fileset_matches = false;
540
541 // Check the FileName elements of the FileSet to see if we have a match
542 NodeList filename_elements_nodelist = current_fileset_element.getElementsByTagName(FILENAME_ELEMENT);
543 for (int j = 0; j < filename_elements_nodelist.getLength(); j++) {
544 Element current_filename_element = (Element) filename_elements_nodelist.item(j);
545 String current_filename_element_value = XMLTools.getElementTextValue(current_filename_element);
546
547 // Only exact matches can be edited
548 if (current_filename_element_value.equals(file_path_regexp)) {
549 current_fileset_matches = true;
550 break;
551 }
552 }
553
554 // The FileSet doesn't apply, so move onto the next one
555 if (current_fileset_matches == false) {
556 continue;
557 }
558
559 // Each metadata value is only allowed to be assigned once
560 boolean new_metadata_value_already_exists = false;
561 Element metadata_element_to_edit = null;
562
563 // Find the Metadata element to replace in the fileset
564 String metadata_element_name_full = old_metadata_value.getMetadataElement().getFullName();
565 NodeList metadata_elements_nodelist = current_fileset_element.getElementsByTagName(METADATA_ELEMENT);
566 for (int k = 0; k < metadata_elements_nodelist.getLength(); k++) {
567 Element current_metadata_element = (Element) metadata_elements_nodelist.item(k);
568
569 // Check the metadata element name matches
570 String current_metadata_element_name_full = current_metadata_element.getAttribute("name");
571 if (!current_metadata_element_name_full.equals(metadata_element_name_full)) {
572 continue;
573 }
574
575 // Check the new metadata value doesn't already exist
576 String current_metadata_value_string = XMLTools.getElementTextValue(current_metadata_element);
577 if (current_metadata_value_string.equals(new_metadata_value_string)) {
578 new_metadata_value_already_exists = true;
579 }
580
581 // Check the metadata element value matches
582 if (current_metadata_value_string.equals(old_metadata_value_string)) {
583 metadata_element_to_edit = current_metadata_element;
584 }
585 }
586
587 // If the new metadata value already existed, remove the original value
588 if (new_metadata_value_already_exists) {
589 if(metadata_element_to_edit != null) { //?????????
590 metadata_element_to_edit.getParentNode().removeChild(metadata_element_to_edit);
591 } else {
592 System.err.println("ERROR MetadataXMLFile: metadata_element_to_edit is null");
593 }
594 }
595 // Otherwise replace the old value with the new value
596 // Ensure metadata_element_to_edit isn't null (may occur when multiple files are selected)
597 else if (metadata_element_to_edit != null) {
598
599 // the gs.filenameEncoding metadata is unique in that, when added, removed or
600 // changed, it must be applied on the file(name) whose metadata has been adjusted
601 if(metadata_element_name_full.equals(FILENAME_ENCODING_METADATA)) {
602 new_metadata_value_string = processFilenameEncoding(file_path_regexp, file_node, new_metadata_value_string, false);
603 // true only if removing meta
604 }
605 XMLTools.setElementTextValue(metadata_element_to_edit, new_metadata_value_string);
606 }
607 }
608
609 // Remember that we've changed the file so it gets saved when a new one is loaded
610 loaded_file_changed = true;
611 }
612
613
614 static public void saveLoadedFile()
615 {
616 // If we have a file loaded into memory and it has been modified, save it now
617 if (loaded_file != null && loaded_file_changed == true) {
618 XMLTools.writeXMLFile(loaded_file, loaded_file_document, nonEscapingElements);
619
620
621 loaded_file_changed = false;
622 }
623 }
624
625
626 /**
627 * Every metadata.xml file must be skimmed when a collection is opened, for three very important reasons:
628 * - To handle any non-namespaced metadata in the metadata.xml files (this is mapped and the files rewritten)
629 * - To get a complete list of the metadata elements in the collection (used in Design and Format panes)
630 * - To build complete and accurate metadata value trees (used in the Enrich pane)
631 */
632 public void skimFile()
633 {
634 boolean file_changed = false;
635
636 // Parse the metadata.xml file
637 DebugStream.println("Skimming metadata.xml file " + this + "...");
638
639 Document document = XMLTools.parseXMLFile(this);
640 if (document == null) {
641 System.err.println("Error: Could not parse metadata.xml file " + getAbsolutePath());
642 return;
643 }
644
645 // Read all the Metadata elements in the file
646 HashMap target_metadata_element_name_attrs_cache = new HashMap();
647 NodeList metadata_elements_nodelist = document.getElementsByTagName(METADATA_ELEMENT);
648 for (int i = 0; i < metadata_elements_nodelist.getLength(); i++) {
649 Element current_metadata_element = (Element) metadata_elements_nodelist.item(i);
650 String metadata_element_name_full = current_metadata_element.getAttribute("name");
651 String metadata_set_namespace = MetadataTools.getMetadataSetNamespace(metadata_element_name_full);
652
653 // Ignore legacy crap
654 if (metadata_set_namespace.equals("hidden")) {
655 continue;
656 }
657
658 MetadataSet metadata_set = MetadataSetManager.getMetadataSet(metadata_set_namespace);
659 if (metadata_set == null) {
660 // The metadata set isn't loaded, so give the option of mapping the element into a loaded set
661 String target_metadata_element_name_full = MetadataSetManager.mapUnloadedMetadataElement(metadata_element_name_full);
662 if (target_metadata_element_name_full == null || target_metadata_element_name_full.equals("")) {
663 // Skip this element if we still don't have a loaded element for it
664 continue;
665 }
666
667 // Update the metadata.xml file to have the new (namespaced) element name
668 // Instead of using current_metadata_element.setAttribute("name", target_metadata_element_name_full)
669 // we create an Attr object for each target metadata element name, and cache them
670 // This makes a *huge* difference (namespacing a metadata.xml file with 45000 metadata entries now
671 // takes 45 seconds instead of 30 minutes!) -- why is setting the value of a Node so slow?
672 Attr target_metadata_element_name_attr = (Attr) target_metadata_element_name_attrs_cache.get(target_metadata_element_name_full);
673 if (target_metadata_element_name_attr == null) {
674 target_metadata_element_name_attr = document.createAttribute("name");
675 target_metadata_element_name_attr.setValue(target_metadata_element_name_full);
676 target_metadata_element_name_attrs_cache.put(target_metadata_element_name_full, target_metadata_element_name_attr);
677 }
678
679 // Remove the old name attribute and add the new (namespaced) one
680 current_metadata_element.removeAttribute("name");
681 current_metadata_element.setAttributeNode((Attr) target_metadata_element_name_attr.cloneNode(false));
682 file_changed = true;
683
684 metadata_element_name_full = target_metadata_element_name_full;
685 metadata_set_namespace = MetadataTools.getMetadataSetNamespace(metadata_element_name_full);
686 metadata_set = MetadataSetManager.getMetadataSet(metadata_set_namespace);
687 }
688
689 String metadata_element_name = MetadataTools.getMetadataElementName(metadata_element_name_full);
690 MetadataElement metadata_element = metadata_set.getMetadataElementWithName(metadata_element_name);
691
692 // If the element doesn't exist in the metadata set, add it
693 if (metadata_element == null) {
694 metadata_element = metadata_set.addMetadataElementForThisSession(metadata_element_name);
695 }
696
697 // Square brackets need to be escaped because they are a special character in Greenstone
698 String metadata_value_string = XMLTools.getElementTextValue(current_metadata_element);
699 metadata_value_string = metadata_value_string.replaceAll("&#091;", "[");
700 metadata_value_string = metadata_value_string.replaceAll("&#093;", "]");
701
702 metadata_element.addMetadataValue(metadata_value_string);
703 }
704
705 // Rewrite the metadata.xml file if it has changed
706 if (file_changed) {
707 XMLTools.writeXMLFile(this, document);
708 }
709 }
710
711 /**
712 * The gs.filenameEncoding metadata is unique in that, when added, removed or
713 * replaced, it must be applied on the file(name) whose metadata has been
714 * adjusted.
715 * This method handles all that, given the regular expression or filepath name
716 * to match on (.* matches subdirectories), the affected fileNode, the new
717 * encoding value and whether a new encoding value has been added/an existing
718 * one has been replaced or whether the encoding metadata has been removed.
719 * The new adjusted value for the encoding metadata is returned.
720 *
721 * MetadataXMLFileManager maintains a hashmap of (URL-encoded filepaths, encoding)
722 * to allow fast access to previously assigned gs.filenameEncoding metadata (if
723 * any) for each file. This hashmap also needs to be updated, but this update
724 * is complicated by the fact that it concerns regular expressions that could
725 * affect multiple filenames.
726 */
727 public String processFilenameEncoding(String file_path_regexp, CollectionTreeNode file_node,
728 String encoding_metadata_value, boolean removingMetadata)
729 {
730 if(!FilenameEncoding.MULTIPLE_FILENAME_ENCODINGS_SUPPORTED) {
731 return encoding_metadata_value;
732 }
733
734 // Work out this filenode's new encoding and apply it:
735
736 if(removingMetadata) { // encoding_metadata_value = ""
737 // gs.filenameEncoding metadata being removed, work out
738 // any inherited metadata to replace it with in the meta-table
739 encoding_metadata_value = FilenameEncoding.getInheritedFilenameEncoding(
740 file_node.getURLEncodedFilePath(), file_node.getFile());
741 // should be canonical encoding already
742 }
743 else if(!encoding_metadata_value.equals("")) {
744 // if adding or replacing filename encoding,
745 // get the canonical encoding name for this alias
746 encoding_metadata_value = FilenameEncoding.canonicalEncodingName(encoding_metadata_value);
747 }
748 // Reencode the display of this filenode only as any affected
749 // childnodes will be reencoded on FileNode.refreshDescendantEncodings()
750 file_node.reencodeDisplayName(encoding_metadata_value);
751
752
753 // Whether removing or adding/replacing the file's gs.filename encoding meta,
754 // store this in the file-to-encoding map for fast access, since the map stores
755 // empty string values when no meta has been assigned at this file level.
756 // In the case of removingMetadata, the value stored will be the fallback value
757
758 String urlpath = file_node.getURLEncodedFilePath();
759 if(removingMetadata) {
760 // remove it from the map instead of inserting "", so that when folders in the collectiontree
761 // are being deleted or shifted, the removemetada (and addmetadata) calls that get fired
762 // for each affected filenodes does not cause the undesirable effect of multiple "" to be
763 // entered into the filename-to-encoding map for filepaths that no longer exist .
764 FilenameEncoding.map.remove(urlpath);
765 } else { // for adding and replacing, put the encoding into the map (also replaces any existing encoding for it)
766 FilenameEncoding.map.put(urlpath, encoding_metadata_value);
767 }
768
769 // If new folder-level metadata (or metadata for a set of files fitting a pattern) has been
770 // assigned, the file_to_encodings map will be cleared for all descendant folders and files,
771 // so that these can be re-calculated upon refreshing the visible parts of the CollectionTree.
772 // Mark the state as requiring a refresh of the CollectionTree.
773 // This next step also serves to prevent the MetadataValueTableModel from trying to update
774 // itself while a refresh (involving re-encoding of filenames of visible nodes) is in progress.
775 FilenameEncoding.setRefreshRequired(true);
776
777 return encoding_metadata_value;
778 }
779}
Note: See TracBrowser for help on using the repository browser.