Ignore:
Timestamp:
2011-01-11T14:53:10+13:00 (13 years ago)
Author:
kjdon
Message:

added a bit extra to removesuffix for titles so that [sound recording] is removed. greenstone makes [ an entity, so therefore using [

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documentation/trunk/tutorial_sample_files/beatles/advbeat_large/etc/collect.cfg

    r22947 r23547  
    2727plugin  DirectoryPlugin
    2828
    29 classify    AZCompactList -mingroup 1 -metadata dc.Title,Title -minnesting 20 -firstvalueonly -removesuffix "(?i)(\\s+\\d+)|(\\s*[[:punct:]]\\s+.*)|(\\s*by the beatles\\s*)" -buttonname Title -removeprefix (?i)\\s*beatles\\s+\\-\\s+
     29# (\\s+[.*) in removesuffix is to remove eg [sound recording] from the Title. Greenstone escapes [] as they are used to represent metadata format elements, hence the use of [ instead of \\[ in the regex.
     30classify    AZCompactList -mingroup 1 -metadata dc.Title,Title -minnesting 20 -firstvalueonly -removesuffix "(?i)(\\s+\\d+)|(\\s*[[:punct:]]\\s+.*)|(\\s+[.*)|(\\s*by the beatles\\s*)" -buttonname Title -removeprefix (?i)\\s*beatles\\s+\\-\\s+
    3031classify    AZCompactList -metadata dc.Format -buttonname Browse -sort Title
    3132# classify  Phind
     
    6566collectionmeta  .document:Source [l=en] "filenames"
    6667collectionmeta  collectionname [l=en] "Advanced Beatles -- large"
    67 collectionmeta  collectionextra [l=en] "Demonstration collection illustrating the use of heterogeneous documents. Source document are about
    68 The Beatles pop group in the following formats: HTML, TXT, JPEG, Word, PDF, MIDI, MP3, and MARC file formats."
     68collectionmeta  collectionextra [l=en] "Demonstration collection illustrating the use of heterogeneous documents. Source documents are about The Beatles pop group in the following formats: HTML, TXT, JPEG, Word, PDF, MIDI, MP3, and MARC file formats."
    6969collectionmeta  iconcollection [l=en] "_httpprefix_/collect/advbeat_large/images/beatlesmm.png"
Note: See TracChangeset for help on using the changeset viewer.