Changeset 27976


Ignore:
Timestamp:
2013-08-05T20:28:01+12:00 (8 years ago)
Author:
ak19
Message:

Updating Enhanced-PDF collection now that extra_meta is sorted and the images generated from a pdf are sorted in doc.xml's gsdlassocfile meta section

Location:
other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF
Files:
6 added
26 deleted
19 edited

Legend:

Unmodified
Added
Removed
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASH019c5dca.dir/doc.xml

    r27958 r27976  
    77    <Metadata name="Language">en</Metadata>
    88    <Metadata name="Encoding">utf8</Metadata>
    9     <Metadata name="URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf03.html</Metadata>
    10     <Metadata name="UTF8URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf03.html</Metadata>
     9    <Metadata name="URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690532/pdf03.html</Metadata>
     10    <Metadata name="UTF8URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690532/pdf03.html</Metadata>
    1111    <Metadata name="Title">Applications for Bibliometric Research in the Emerging Digital Libraries Sally Jo Cunningham...</Metadata>
    1212    <Metadata name="gsdlsourcefilename">import/pdf03.pdf</Metadata>
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336471/pdf03.html</Metadata>
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690532/pdf03.html</Metadata>
    1414    <Metadata name="OrigSource">pdf03.html</Metadata>
    1515    <Metadata name="Source">pdf03.pdf</Metadata>
     
    2424    <Metadata name="NumPages">17</Metadata>
    2525    <Metadata name="gsdlthistype">Paged</Metadata>
    26     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata>
    27     <Metadata name="ex.PDF.Author">Bronwyn</Metadata>
    28     <Metadata name="ex.PDF.PageCount">17</Metadata>
    29     <Metadata name="ex.File.FileType">PDF</Metadata>
    30     <Metadata name="ex.PDF.PDFVersion">1.1</Metadata>
    31     <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 2.0 for Macintosh</Metadata>
     26    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
     27    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import</Metadata>
     28    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata>
    3229    <Metadata name="ex.File.FileName">pdf03.pdf</Metadata>
    3330    <Metadata name="ex.File.FilePermissions">644</Metadata>
     31    <Metadata name="ex.File.FileSize">35935</Metadata>
     32    <Metadata name="ex.File.FileType">PDF</Metadata>
     33    <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
     34    <Metadata name="ex.PDF.Author">Bronwyn</Metadata>
    3435    <Metadata name="ex.PDF.CreateDate">1999:09:27 16:05:06</Metadata>
     36    <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata>
    3537    <Metadata name="ex.PDF.Linearized">false</Metadata>
    36     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import</Metadata>
    37     <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata>
     38    <Metadata name="ex.PDF.PDFVersion">1.1</Metadata>
     39    <Metadata name="ex.PDF.PageCount">17</Metadata>
     40    <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 2.0 for Macintosh</Metadata>
    3841    <Metadata name="ex.PDF.Title">biblio_for_dl_scientometrics.do</Metadata>
    39     <Metadata name="ex.File.FileSize">35935</Metadata>
    40     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
    41     <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
    4242    <Metadata name="Identifier">HASH019c5dca7f5bb781460a6b9c</Metadata>
    43     <Metadata name="lastmodified">1375336014</Metadata>
    44     <Metadata name="lastmodifieddate">20130801</Metadata>
    45     <Metadata name="oailastmodified">1375336471</Metadata>
    46     <Metadata name="oailastmodifieddate">20130801</Metadata>
     43    <Metadata name="lastmodified">1375690479</Metadata>
     44    <Metadata name="lastmodifieddate">20130805</Metadata>
     45    <Metadata name="oailastmodified">1375690532</Metadata>
     46    <Metadata name="oailastmodifieddate">20130805</Metadata>
    4747    <Metadata name="assocfilepath">HASH019c5dca.dir</Metadata>
    4848    <Metadata name="gsdlassocfile">doc.pdf:application/pdf:</Metadata>
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASH1a9cea0f.dir/doc.xml

    r27958 r27976  
    77    <Metadata name="Language">en</Metadata>
    88    <Metadata name="Encoding">utf8</Metadata>
    9     <Metadata name="URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf01.html</Metadata>
    10     <Metadata name="UTF8URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf01.html</Metadata>
     9    <Metadata name="URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690531/pdf01.html</Metadata>
     10    <Metadata name="UTF8URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690531/pdf01.html</Metadata>
    1111    <Metadata name="Title">Greenstone: A Comprehensive Open-Source Digital Library Software System Ian H. Witten,* Rodger J....</Metadata>
    1212    <Metadata name="gsdlsourcefilename">import/pdf01.pdf</Metadata>
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336471/pdf01.html</Metadata>
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690531/pdf01.html</Metadata>
    1414    <Metadata name="OrigSource">pdf01.html</Metadata>
    1515    <Metadata name="Source">pdf01.pdf</Metadata>
     
    2424    <Metadata name="NumPages">9</Metadata>
    2525    <Metadata name="gsdlthistype">Paged</Metadata>
    26     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata>
    27     <Metadata name="ex.PDF.Author">Bronwyn</Metadata>
    28     <Metadata name="ex.PDF.PageCount">9</Metadata>
    29     <Metadata name="ex.File.FileType">PDF</Metadata>
    30     <Metadata name="ex.PDF.PDFVersion">1.2</Metadata>
    31     <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 4.0 for Power Macintosh</Metadata>
     26    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
     27    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import</Metadata>
     28    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata>
    3229    <Metadata name="ex.File.FileName">pdf01.pdf</Metadata>
    3330    <Metadata name="ex.File.FilePermissions">644</Metadata>
     31    <Metadata name="ex.File.FileSize">269487</Metadata>
     32    <Metadata name="ex.File.FileType">PDF</Metadata>
     33    <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
     34    <Metadata name="ex.PDF.Author">Bronwyn</Metadata>
    3435    <Metadata name="ex.PDF.CreateDate">2000:03:02 15:21:24</Metadata>
     36    <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata>
    3537    <Metadata name="ex.PDF.Linearized">false</Metadata>
    36     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import</Metadata>
    37     <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata>
    38     <Metadata name="ex.File.FileSize">269487</Metadata>
    39     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
    40     <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
     38    <Metadata name="ex.PDF.PDFVersion">1.2</Metadata>
     39    <Metadata name="ex.PDF.PageCount">9</Metadata>
     40    <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 4.0 for Power Macintosh</Metadata>
    4141    <Metadata name="Identifier">HASH1a9cea0f239f754007681b</Metadata>
    42     <Metadata name="lastmodified">1375336014</Metadata>
    43     <Metadata name="lastmodifieddate">20130801</Metadata>
    44     <Metadata name="oailastmodified">1375336471</Metadata>
    45     <Metadata name="oailastmodifieddate">20130801</Metadata>
     42    <Metadata name="lastmodified">1375690479</Metadata>
     43    <Metadata name="lastmodifieddate">20130805</Metadata>
     44    <Metadata name="oailastmodified">1375690532</Metadata>
     45    <Metadata name="oailastmodifieddate">20130805</Metadata>
    4646    <Metadata name="assocfilepath">HASH1a9cea0f.dir</Metadata>
    4747    <Metadata name="gsdlassocfile">pdf01-2_1.jpg:image/jpeg:</Metadata>
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASH2bdf3b19.dir/doc.xml

    r27958 r27976  
    1111    <Metadata name="Title">pdf05-notext</Metadata>
    1212    <Metadata name="gsdlsourcefilename">import/notext/pdf05-notext.pdf</Metadata>
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336456/pdf05-notext/pdf05-notext.item</Metadata>
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690517/pdf05-notext/pdf05-notext.item</Metadata>
    1414    <Metadata name="OrigSource">pdf05-notext.item</Metadata>
    1515    <Metadata name="Source">pdf05-notext.pdf</Metadata>
     
    2323    <Metadata name="srclinkFile">doc.pdf</Metadata>
    2424    <Metadata name="NumPages">0</Metadata>
    25     <Metadata name="ex.XMP.ModifyDate">2007:06:13 12:29:51+12:00</Metadata>
    26     <Metadata name="ex.XMP.Format">application/pdf</Metadata>
    27     <Metadata name="ex.XMP.MetadataDate">2007:06:13 12:29:51+12:00</Metadata>
    28     <Metadata name="ex.XMP.Company">University of Waikato</Metadata>
    29     <Metadata name="ex.XMP.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata>
     25    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
     26    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import/notext</Metadata>
     27    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata>
     28    <Metadata name="ex.File.FileName">pdf05-notext.pdf</Metadata>
     29    <Metadata name="ex.File.FilePermissions">644</Metadata>
     30    <Metadata name="ex.File.FileSize">748503</Metadata>
     31    <Metadata name="ex.File.FileType">PDF</Metadata>
     32    <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
     33    <Metadata name="ex.PDF.Author">Administrator</Metadata>
     34    <Metadata name="ex.PDF.Company">University of Waikato</Metadata>
     35    <Metadata name="ex.PDF.CreateDate">2007:06:13 12:28:29+12:00</Metadata>
     36    <Metadata name="ex.PDF.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata>
     37    <Metadata name="ex.PDF.Language">EN-US</Metadata>
     38    <Metadata name="ex.PDF.Linearized">true</Metadata>
     39    <Metadata name="ex.PDF.ModifyDate">2007:06:13 12:29:51+12:00</Metadata>
     40    <Metadata name="ex.PDF.PDFVersion">1.4</Metadata>
     41    <Metadata name="ex.PDF.PageCount">9</Metadata>
     42    <Metadata name="ex.PDF.PageLayout">OneColumn</Metadata>
    3043    <Metadata name="ex.PDF.Producer">Acrobat Distiller 7.0.5 &amp;#40;Windows&amp;#41;</Metadata>
    31     <Metadata name="ex.File.FilePermissions">644</Metadata>
    32     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import/notext</Metadata>
    33     <Metadata name="ex.PDF.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata>
    34     <Metadata name="ex.PDF.PageLayout">OneColumn</Metadata>
    35     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata>
    36     <Metadata name="ex.PDF.Author">Administrator</Metadata>
    37     <Metadata name="ex.File.FileType">PDF</Metadata>
    3844    <Metadata name="ex.PDF.SourceModified">D:20070613002201</Metadata>
    3945    <Metadata name="ex.PDF.TaggedPDF">true</Metadata>
    40     <Metadata name="ex.PDF.Linearized">true</Metadata>
    41     <Metadata name="ex.PDF.CreateDate">2007:06:13 12:28:29+12:00</Metadata>
    42     <Metadata name="ex.PDF.ModifyDate">2007:06:13 12:29:51+12:00</Metadata>
    43     <Metadata name="ex.PDF.PDFVersion">1.4</Metadata>
     46    <Metadata name="ex.XMP.Company">University of Waikato</Metadata>
     47    <Metadata name="ex.XMP.CreateDate">2007:06:13 12:28:29+12:00</Metadata>
     48    <Metadata name="ex.XMP.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata>
     49    <Metadata name="ex.XMP.CreatorTool">Acrobat PDFMaker 7.0.7 for Word</Metadata>
     50    <Metadata name="ex.XMP.DocumentID">uuid:5915f718-0b63-4b63-ae6e-1efee5151379</Metadata>
     51    <Metadata name="ex.XMP.Format">application/pdf</Metadata>
    4452    <Metadata name="ex.XMP.InstanceID">uuid:2518e5c5-f724-4ea7-8dfc-e024661fc8c5</Metadata>
    45     <Metadata name="ex.File.FileName">pdf05-notext.pdf</Metadata>
    46     <Metadata name="ex.PDF.Company">University of Waikato</Metadata>
    47     <Metadata name="ex.PDF.Language">EN-US</Metadata>
    48     <Metadata name="ex.File.FileSize">748503</Metadata>
     53    <Metadata name="ex.XMP.MetadataDate">2007:06:13 12:29:51+12:00</Metadata>
     54    <Metadata name="ex.XMP.ModifyDate">2007:06:13 12:29:51+12:00</Metadata>
     55    <Metadata name="ex.XMP.Producer">Acrobat Distiller 7.0.5 &amp;#40;Windows&amp;#41;</Metadata>
     56    <Metadata name="ex.XMP.SourceModified">D:20070613002201</Metadata>
     57    <Metadata name="ex.XMP.VersionID">14</Metadata>
    4958    <Metadata name="ex.XMP.XMPToolkit">3.1-702</Metadata>
    50     <Metadata name="ex.XMP.SourceModified">D:20070613002201</Metadata>
    51     <Metadata name="ex.PDF.PageCount">9</Metadata>
    52     <Metadata name="ex.XMP.VersionID">14</Metadata>
    53     <Metadata name="ex.XMP.Producer">Acrobat Distiller 7.0.5 &amp;#40;Windows&amp;#41;</Metadata>
    54     <Metadata name="ex.XMP.DocumentID">uuid:5915f718-0b63-4b63-ae6e-1efee5151379</Metadata>
    55     <Metadata name="ex.XMP.CreateDate">2007:06:13 12:28:29+12:00</Metadata>
    56     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
    57     <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
    58     <Metadata name="ex.XMP.CreatorTool">Acrobat PDFMaker 7.0.7 for Word</Metadata>
    5959    <Metadata name="Identifier">HASH2bdf3b19cf094fd01ac7ad</Metadata>
    60     <Metadata name="lastmodified">1375336014</Metadata>
    61     <Metadata name="lastmodifieddate">20130801</Metadata>
    62     <Metadata name="oailastmodified">1375336462</Metadata>
    63     <Metadata name="oailastmodifieddate">20130801</Metadata>
     60    <Metadata name="lastmodified">1375690479</Metadata>
     61    <Metadata name="lastmodifieddate">20130805</Metadata>
     62    <Metadata name="oailastmodified">1375690523</Metadata>
     63    <Metadata name="oailastmodifieddate">20130805</Metadata>
    6464    <Metadata name="assocfilepath">HASH2bdf3b19.dir</Metadata>
    6565    <Metadata name="gsdlassocfile">pdf05-notext-0.jpg:image/jpeg:</Metadata>
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASHbb6e0c8f.dir/doc.xml

    r27958 r27976  
    1111    <Metadata name="Title">pdf06-weirdchars</Metadata>
    1212    <Metadata name="gsdlsourcefilename">import/notext/pdf06-weirdchars.pdf</Metadata>
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336462/pdf06-weirdchars/pdf06-weirdchars.item</Metadata>
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690523/pdf06-weirdchars/pdf06-weirdchars.item</Metadata>
    1414    <Metadata name="OrigSource">pdf06-weirdchars.item</Metadata>
    1515    <Metadata name="Source">pdf06-weirdchars.pdf</Metadata>
     
    2323    <Metadata name="srclinkFile">doc.pdf</Metadata>
    2424    <Metadata name="NumPages">0</Metadata>
    25     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata>
    26     <Metadata name="ex.PDF.Author">rg</Metadata>
    27     <Metadata name="ex.PDF.PageCount">12</Metadata>
    28     <Metadata name="ex.File.FileType">PDF</Metadata>
    29     <Metadata name="ex.PDF.PDFVersion">1.2</Metadata>
    30     <Metadata name="ex.PDF.Producer">AFPL Ghostscript 7.04</Metadata>
     25    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
     26    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import/notext</Metadata>
     27    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata>
    3128    <Metadata name="ex.File.FileName">pdf06-weirdchars.pdf</Metadata>
    3229    <Metadata name="ex.File.FilePermissions">644</Metadata>
     30    <Metadata name="ex.File.FileSize">846134</Metadata>
     31    <Metadata name="ex.File.FileType">PDF</Metadata>
     32    <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
     33    <Metadata name="ex.PDF.Author">rg</Metadata>
    3334    <Metadata name="ex.PDF.CreateDate">10/7/2002 16:9:30</Metadata>
     35    <Metadata name="ex.PDF.Creator">Pscript.dll Version 5.0</Metadata>
    3436    <Metadata name="ex.PDF.Linearized">false</Metadata>
    35     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import/notext</Metadata>
    36     <Metadata name="ex.PDF.Creator">Pscript.dll Version 5.0</Metadata>
     37    <Metadata name="ex.PDF.PDFVersion">1.2</Metadata>
     38    <Metadata name="ex.PDF.PageCount">12</Metadata>
     39    <Metadata name="ex.PDF.Producer">AFPL Ghostscript 7.04</Metadata>
    3740    <Metadata name="ex.PDF.Title">metsreportfinal.doc</Metadata>
    38     <Metadata name="ex.File.FileSize">846134</Metadata>
    39     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata>
    40     <Metadata name="ex.File.MIMEType">application/pdf</Metadata>
    4141    <Metadata name="Identifier">HASHbb6e0c8f087c9c5b2e301a</Metadata>
    42     <Metadata name="lastmodified">1375336014</Metadata>
    43     <Metadata name="lastmodifieddate">20130801</Metadata>
    44     <Metadata name="oailastmodified">1375336470</Metadata>
    45     <Metadata name="oailastmodifieddate">20130801</Metadata>
     42    <Metadata name="lastmodified">1375690479</Metadata>
     43    <Metadata name="lastmodifieddate">20130805</Metadata>
     44    <Metadata name="oailastmodified">1375690531</Metadata>
     45    <Metadata name="oailastmodifieddate">20130805</Metadata>
    4646    <Metadata name="assocfilepath">HASHbb6e0c8f.dir</Metadata>
    4747    <Metadata name="gsdlassocfile">pdf06-weirdchars-0.jpg:image/jpeg:</Metadata>
    4848    <Metadata name="gsdlassocfile">pdf06-weirdchars-0_thumb.gif:image/gif:</Metadata>
    4949    <Metadata name="gsdlassocfile">pdf06-weirdchars-0_screen.jpeg:image/jpeg:</Metadata>
    50     <Metadata name="gsdlassocfile">pdf06-weirdchars-10.jpg:image/jpeg:</Metadata>
    51     <Metadata name="gsdlassocfile">pdf06-weirdchars-10_thumb.gif:image/gif:</Metadata>
    52     <Metadata name="gsdlassocfile">pdf06-weirdchars-10_screen.jpeg:image/jpeg:</Metadata>
    53     <Metadata name="gsdlassocfile">pdf06-weirdchars-11.jpg:image/jpeg:</Metadata>
    54     <Metadata name="gsdlassocfile">pdf06-weirdchars-11_thumb.gif:image/gif:</Metadata>
    55     <Metadata name="gsdlassocfile">pdf06-weirdchars-11_screen.jpeg:image/jpeg:</Metadata>
    5650    <Metadata name="gsdlassocfile">pdf06-weirdchars-1.jpg:image/jpeg:</Metadata>
    5751    <Metadata name="gsdlassocfile">pdf06-weirdchars-1_thumb.gif:image/gif:</Metadata>
     
    8175    <Metadata name="gsdlassocfile">pdf06-weirdchars-9_thumb.gif:image/gif:</Metadata>
    8276    <Metadata name="gsdlassocfile">pdf06-weirdchars-9_screen.jpeg:image/jpeg:</Metadata>
     77    <Metadata name="gsdlassocfile">pdf06-weirdchars-10.jpg:image/jpeg:</Metadata>
     78    <Metadata name="gsdlassocfile">pdf06-weirdchars-10_thumb.gif:image/gif:</Metadata>
     79    <Metadata name="gsdlassocfile">pdf06-weirdchars-10_screen.jpeg:image/jpeg:</Metadata>
     80    <Metadata name="gsdlassocfile">pdf06-weirdchars-11.jpg:image/jpeg:</Metadata>
     81    <Metadata name="gsdlassocfile">pdf06-weirdchars-11_thumb.gif:image/gif:</Metadata>
     82    <Metadata name="gsdlassocfile">pdf06-weirdchars-11_screen.jpeg:image/jpeg:</Metadata>
    8383    <Metadata name="gsdlassocfile">doc.pdf:application/pdf:</Metadata>
    8484  </Description>
     
    116116<Section>
    117117  <Description>
    118     <Metadata name="PageNum">1</Metadata>
    119     <Metadata name="Image">pdf06-weirdchars-10.jpg</Metadata>
    120     <Metadata name="Source">pdf06-weirdchars-10.jpg</Metadata>
    121     <Metadata name="SourceFile">pdf06-weirdchars-10.jpg</Metadata>
    122     <Metadata name="FileSize">89333</Metadata>
    123     <Metadata name="ImageType">JPEG</Metadata>
    124     <Metadata name="ImageWidth">595</Metadata>
    125     <Metadata name="ImageHeight">842</Metadata>
    126     <Metadata name="ImageSize">89.3KB</Metadata>
    127     <Metadata name="srclink_file">pdf06-weirdchars-10.jpg</Metadata>
    128     <Metadata name="srclinkFile">pdf06-weirdchars-10.jpg</Metadata>
    129     <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata>
    130     <Metadata name="ThumbType">gif</Metadata>
    131     <Metadata name="Thumb">pdf06-weirdchars-10_thumb.gif</Metadata>
    132     <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata>
    133     <Metadata name="ThumbWidth">71</Metadata>
    134     <Metadata name="ThumbHeight">100</Metadata>
    135     <Metadata name="ScreenType">jpeg</Metadata>
    136     <Metadata name="Screen">pdf06-weirdchars-10_screen.jpeg</Metadata>
    137     <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata>
    138     <Metadata name="ScreenWidth">707</Metadata>
    139     <Metadata name="ScreenHeight">1000</Metadata>
    140     <Metadata name="FileFormat">PagedImage</Metadata>
    141     <Metadata name="NoText">1</Metadata>
    142     <Metadata name="Title">1</Metadata>
    143   </Description>
    144   <Content>This document has no text.</Content>
    145 </Section>
    146 <Section>
    147   <Description>
    148     <Metadata name="PageNum">2</Metadata>
    149     <Metadata name="Image">pdf06-weirdchars-11.jpg</Metadata>
    150     <Metadata name="Source">pdf06-weirdchars-11.jpg</Metadata>
    151     <Metadata name="SourceFile">pdf06-weirdchars-11.jpg</Metadata>
    152     <Metadata name="FileSize">43777</Metadata>
    153     <Metadata name="ImageType">JPEG</Metadata>
    154     <Metadata name="ImageWidth">595</Metadata>
    155     <Metadata name="ImageHeight">842</Metadata>
    156     <Metadata name="ImageSize">43.8KB</Metadata>
    157     <Metadata name="srclink_file">pdf06-weirdchars-11.jpg</Metadata>
    158     <Metadata name="srclinkFile">pdf06-weirdchars-11.jpg</Metadata>
    159     <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata>
    160     <Metadata name="ThumbType">gif</Metadata>
    161     <Metadata name="Thumb">pdf06-weirdchars-11_thumb.gif</Metadata>
    162     <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata>
    163     <Metadata name="ThumbWidth">71</Metadata>
    164     <Metadata name="ThumbHeight">100</Metadata>
    165     <Metadata name="ScreenType">jpeg</Metadata>
    166     <Metadata name="Screen">pdf06-weirdchars-11_screen.jpeg</Metadata>
    167     <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata>
    168     <Metadata name="ScreenWidth">707</Metadata>
    169     <Metadata name="ScreenHeight">1000</Metadata>
    170     <Metadata name="FileFormat">PagedImage</Metadata>
    171     <Metadata name="NoText">1</Metadata>
    172     <Metadata name="Title">2</Metadata>
    173   </Description>
    174   <Content>This document has no text.</Content>
    175 </Section>
    176 <Section>
    177   <Description>
    178118    <Metadata name="PageNum">2</Metadata>
    179119    <Metadata name="Image">pdf06-weirdchars-1.jpg</Metadata>
     
    444384  <Content>This document has no text.</Content>
    445385</Section>
     386<Section>
     387  <Description>
     388    <Metadata name="PageNum">11</Metadata>
     389    <Metadata name="Image">pdf06-weirdchars-10.jpg</Metadata>
     390    <Metadata name="Source">pdf06-weirdchars-10.jpg</Metadata>
     391    <Metadata name="SourceFile">pdf06-weirdchars-10.jpg</Metadata>
     392    <Metadata name="FileSize">89333</Metadata>
     393    <Metadata name="ImageType">JPEG</Metadata>
     394    <Metadata name="ImageWidth">595</Metadata>
     395    <Metadata name="ImageHeight">842</Metadata>
     396    <Metadata name="ImageSize">89.3KB</Metadata>
     397    <Metadata name="srclink_file">pdf06-weirdchars-10.jpg</Metadata>
     398    <Metadata name="srclinkFile">pdf06-weirdchars-10.jpg</Metadata>
     399    <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata>
     400    <Metadata name="ThumbType">gif</Metadata>
     401    <Metadata name="Thumb">pdf06-weirdchars-10_thumb.gif</Metadata>
     402    <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata>
     403    <Metadata name="ThumbWidth">71</Metadata>
     404    <Metadata name="ThumbHeight">100</Metadata>
     405    <Metadata name="ScreenType">jpeg</Metadata>
     406    <Metadata name="Screen">pdf06-weirdchars-10_screen.jpeg</Metadata>
     407    <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata>
     408    <Metadata name="ScreenWidth">707</Metadata>
     409    <Metadata name="ScreenHeight">1000</Metadata>
     410    <Metadata name="FileFormat">PagedImage</Metadata>
     411    <Metadata name="NoText">1</Metadata>
     412    <Metadata name="Title">11</Metadata>
     413  </Description>
     414  <Content>This document has no text.</Content>
     415</Section>
     416<Section>
     417  <Description>
     418    <Metadata name="PageNum">12</Metadata>
     419    <Metadata name="Image">pdf06-weirdchars-11.jpg</Metadata>
     420    <Metadata name="Source">pdf06-weirdchars-11.jpg</Metadata>
     421    <Metadata name="SourceFile">pdf06-weirdchars-11.jpg</Metadata>
     422    <Metadata name="FileSize">43777</Metadata>
     423    <Metadata name="ImageType">JPEG</Metadata>
     424    <Metadata name="ImageWidth">595</Metadata>
     425    <Metadata name="ImageHeight">842</Metadata>
     426    <Metadata name="ImageSize">43.8KB</Metadata>
     427    <Metadata name="srclink_file">pdf06-weirdchars-11.jpg</Metadata>
     428    <Metadata name="srclinkFile">pdf06-weirdchars-11.jpg</Metadata>
     429    <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata>
     430    <Metadata name="ThumbType">gif</Metadata>
     431    <Metadata name="Thumb">pdf06-weirdchars-11_thumb.gif</Metadata>
     432    <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata>
     433    <Metadata name="ThumbWidth">71</Metadata>
     434    <Metadata name="ThumbHeight">100</Metadata>
     435    <Metadata name="ScreenType">jpeg</Metadata>
     436    <Metadata name="Screen">pdf06-weirdchars-11_screen.jpeg</Metadata>
     437    <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata>
     438    <Metadata name="ScreenWidth">707</Metadata>
     439    <Metadata name="ScreenHeight">1000</Metadata>
     440    <Metadata name="FileFormat">PagedImage</Metadata>
     441    <Metadata name="NoText">1</Metadata>
     442    <Metadata name="Title">12</Metadata>
     443  </Description>
     444  <Content>This document has no text.</Content>
     445</Section>
    446446</Section>
    447447</Archive>
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/earliestDatestamp

    r27958 r27976  
    1 1375336454
     11375690515
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/etc/collect.cfg

    r27958 r27976  
    2626plugin  EmailPlugin
    2727plugin  PDFPlugin -process_exp notext.*\.pdf -convert_to pagedimg_jpg
     28plugin  PDFPlugin -use_sections -convert_to html
    2829plugin  RTFPlugin
    2930plugin  WordPlugin
     
    3536plugin  NulPlugin
    3637plugin  EmbeddedMetadataPlugin
    37 plugin  PDFPlugin -use_sections -convert_to html
    3838plugin  MetadataXMLPlugin
    3939plugin  ArchivesInfPlugin -sort
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/gli.col

    r27958 r27976  
    4242            <Argument enabled="true" name="verbosity">5</Argument>
    4343        </Import>
    44         <Schedule/>
     44        <Schedule>
     45            <Argument enabled="false" name="frequency">daily</Argument>
     46            <Argument enabled="false" name="action">add</Argument>
     47            <Argument enabled="false" name="toaddr"/>
     48            <Argument enabled="false" name="fromaddr"/>
     49            <Argument enabled="false" name="smtp"/>
     50        </Schedule>
    4551    </BuildConfig>
    4652</GathererCollection>
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/index/build.cfg

    r27958 r27976  
    1 builddate   1375336472
     1builddate   1375690533
    22buildtype   mgpp
    3 earliestdatestamp   1375336454
     3earliestdatestamp   1375690515
    44indexfieldmap   text->TX    dc.Title,ex.dc.Title,Title->TI  Source->SO
    55indexfields text    dc.Title,ex.dc.Title,Title  Source
Note: See TracChangeset for help on using the changeset viewer.