Changeset 24291

Show
Ignore:
Timestamp:
19.07.2011 14:15:29 (9 years ago)
Author:
sjm84
Message:

More changes to do with the way PDF files are parsed

Location:
main/trunk/greenstone2/perllib
Files:
2 modified

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/strings.properties

    r23946 r24291  
    274274import.OIDtype:The method to use when generating unique identifiers for each document. 
    275275import.OIDtype.hash:Hash the contents of the file. Document identifiers will be the same every time the collection is imported. 
     276import.OIDtype.hash_on_ga_xml:Hash the contents of the Greenstone Archive XML file. Document identifiers will be the same every time the collection is imported as long as the metadata does not change. 
    276277 
    277278import.OIDtype.incremental:Use a simple document count. Significantly faster than "hash", but does not necessarily assign the same identifier to the same document content if the collection is reimported. 
     
    862863 
    863864EmbeddedMetadataPlugin.desc:Plugin that extracts embedded metadata from a variety of file types. It is based on the CPAN module 'ExifTool which includes support for over 70 file formats and 20 metadata formats.  Highlights include: video formats such as AVI, ASF, FLV, MPEG, OGG Vorbis, and WMV; image formats such as BMP, GIF, JPEG, JPEG 2000 and  PNG; audio formats such as AIFF, RealAudio, FLAC, MP3, and WAV; Office document formats such as Encapsulated PostScript, HTML, PDF, and Word.  More details are available at the ExifTool home page http://www.sno.phy.queensu.ca/~phil/exiftool/ 
     865EmbeddedMetadataPlugin.join_before_split:Join fields with multiple entries (e.g. Authors or Keywords) before they are (optionally) split using the specified separator.  
     866EmbeddedMetadataPlugin.join_character:The character to use with join_before_split (default is a single space). 
     867EmbeddedMetadataPlugin.trim_whitespace:Trim whitespace from start and end of any extracted metadata values (Note: this also applies to any values generated through joining with join_before_split or splitting through metadata_field_separator). 
    864868 
    865869ExcelPlugin.desc:A plugin for importing Microsoft Excel files (versions 95 and 97). 
  • main/trunk/greenstone2/perllib/util.pm

    r23561 r24291  
    8383    my ($files,$file_accept_re,$file_reject_re) = @_; 
    8484 
     85#   my ($cpackage,$cfilename,$cline,$csubr,$chas_args,$cwantarray) = caller(2); 
     86#   my ($lcfilename) = ($cfilename =~ m/([^\\\/]*)$/); 
     87#   print STDERR "** Calling method (2): $lcfilename:$cline $cpackage->$csubr\n"; 
     88     
    8589    my @files_array = (ref $files eq "ARRAY") ? @$files : ($files); 
    8690