Changeset 27976

Show
Ignore:
Timestamp:
05.08.2013 20:28:01 (6 years ago)
Author:
ak19
Message:

Updating Enhanced-PDF collection now that extra_meta is sorted and the images generated from a pdf are sorted in doc.xml's gsdlassocfile meta section

Location:
other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF
Files:
6 added
26 removed
19 modified

Legend:

Unmodified
Added
Removed
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASH019c5dca.dir/doc.xml

    r27958 r27976  
    77    <Metadata name="Language">en</Metadata> 
    88    <Metadata name="Encoding">utf8</Metadata> 
    9     <Metadata name="URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf03.html</Metadata> 
    10     <Metadata name="UTF8URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf03.html</Metadata> 
     9    <Metadata name="URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690532/pdf03.html</Metadata> 
     10    <Metadata name="UTF8URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690532/pdf03.html</Metadata> 
    1111    <Metadata name="Title">Applications for Bibliometric Research in the Emerging Digital Libraries Sally Jo Cunningham...</Metadata> 
    1212    <Metadata name="gsdlsourcefilename">import/pdf03.pdf</Metadata> 
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336471/pdf03.html</Metadata> 
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690532/pdf03.html</Metadata> 
    1414    <Metadata name="OrigSource">pdf03.html</Metadata> 
    1515    <Metadata name="Source">pdf03.pdf</Metadata> 
     
    2424    <Metadata name="NumPages">17</Metadata> 
    2525    <Metadata name="gsdlthistype">Paged</Metadata> 
    26     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata> 
    27     <Metadata name="ex.PDF.Author">Bronwyn</Metadata> 
    28     <Metadata name="ex.PDF.PageCount">17</Metadata> 
    29     <Metadata name="ex.File.FileType">PDF</Metadata> 
    30     <Metadata name="ex.PDF.PDFVersion">1.1</Metadata> 
    31     <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 2.0 for Macintosh</Metadata> 
     26    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
     27    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import</Metadata> 
     28    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata> 
    3229    <Metadata name="ex.File.FileName">pdf03.pdf</Metadata> 
    3330    <Metadata name="ex.File.FilePermissions">644</Metadata> 
     31    <Metadata name="ex.File.FileSize">35935</Metadata> 
     32    <Metadata name="ex.File.FileType">PDF</Metadata> 
     33    <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
     34    <Metadata name="ex.PDF.Author">Bronwyn</Metadata> 
    3435    <Metadata name="ex.PDF.CreateDate">1999:09:27 16:05:06</Metadata> 
     36    <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata> 
    3537    <Metadata name="ex.PDF.Linearized">false</Metadata> 
    36     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import</Metadata> 
    37     <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata> 
     38    <Metadata name="ex.PDF.PDFVersion">1.1</Metadata> 
     39    <Metadata name="ex.PDF.PageCount">17</Metadata> 
     40    <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 2.0 for Macintosh</Metadata> 
    3841    <Metadata name="ex.PDF.Title">biblio_for_dl_scientometrics.do</Metadata> 
    39     <Metadata name="ex.File.FileSize">35935</Metadata> 
    40     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
    41     <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
    4242    <Metadata name="Identifier">HASH019c5dca7f5bb781460a6b9c</Metadata> 
    43     <Metadata name="lastmodified">1375336014</Metadata> 
    44     <Metadata name="lastmodifieddate">20130801</Metadata> 
    45     <Metadata name="oailastmodified">1375336471</Metadata> 
    46     <Metadata name="oailastmodifieddate">20130801</Metadata> 
     43    <Metadata name="lastmodified">1375690479</Metadata> 
     44    <Metadata name="lastmodifieddate">20130805</Metadata> 
     45    <Metadata name="oailastmodified">1375690532</Metadata> 
     46    <Metadata name="oailastmodifieddate">20130805</Metadata> 
    4747    <Metadata name="assocfilepath">HASH019c5dca.dir</Metadata> 
    4848    <Metadata name="gsdlassocfile">doc.pdf:application/pdf:</Metadata> 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASH1a9cea0f.dir/doc.xml

    r27958 r27976  
    77    <Metadata name="Language">en</Metadata> 
    88    <Metadata name="Encoding">utf8</Metadata> 
    9     <Metadata name="URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf01.html</Metadata> 
    10     <Metadata name="UTF8URL">http://research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/tmp/1375336471/pdf01.html</Metadata> 
     9    <Metadata name="URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690531/pdf01.html</Metadata> 
     10    <Metadata name="UTF8URL">http://research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/tmp/1375690531/pdf01.html</Metadata> 
    1111    <Metadata name="Title">Greenstone: A Comprehensive Open-Source Digital Library Software System Ian H. Witten,* Rodger J....</Metadata> 
    1212    <Metadata name="gsdlsourcefilename">import/pdf01.pdf</Metadata> 
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336471/pdf01.html</Metadata> 
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690531/pdf01.html</Metadata> 
    1414    <Metadata name="OrigSource">pdf01.html</Metadata> 
    1515    <Metadata name="Source">pdf01.pdf</Metadata> 
     
    2424    <Metadata name="NumPages">9</Metadata> 
    2525    <Metadata name="gsdlthistype">Paged</Metadata> 
    26     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata> 
    27     <Metadata name="ex.PDF.Author">Bronwyn</Metadata> 
    28     <Metadata name="ex.PDF.PageCount">9</Metadata> 
    29     <Metadata name="ex.File.FileType">PDF</Metadata> 
    30     <Metadata name="ex.PDF.PDFVersion">1.2</Metadata> 
    31     <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 4.0 for Power Macintosh</Metadata> 
     26    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
     27    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import</Metadata> 
     28    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata> 
    3229    <Metadata name="ex.File.FileName">pdf01.pdf</Metadata> 
    3330    <Metadata name="ex.File.FilePermissions">644</Metadata> 
     31    <Metadata name="ex.File.FileSize">269487</Metadata> 
     32    <Metadata name="ex.File.FileType">PDF</Metadata> 
     33    <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
     34    <Metadata name="ex.PDF.Author">Bronwyn</Metadata> 
    3435    <Metadata name="ex.PDF.CreateDate">2000:03:02 15:21:24</Metadata> 
     36    <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata> 
    3537    <Metadata name="ex.PDF.Linearized">false</Metadata> 
    36     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import</Metadata> 
    37     <Metadata name="ex.PDF.Creator">Microsoft Word</Metadata> 
    38     <Metadata name="ex.File.FileSize">269487</Metadata> 
    39     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
    40     <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
     38    <Metadata name="ex.PDF.PDFVersion">1.2</Metadata> 
     39    <Metadata name="ex.PDF.PageCount">9</Metadata> 
     40    <Metadata name="ex.PDF.Producer">Acrobat PDFWriter 4.0 for Power Macintosh</Metadata> 
    4141    <Metadata name="Identifier">HASH1a9cea0f239f754007681b</Metadata> 
    42     <Metadata name="lastmodified">1375336014</Metadata> 
    43     <Metadata name="lastmodifieddate">20130801</Metadata> 
    44     <Metadata name="oailastmodified">1375336471</Metadata> 
    45     <Metadata name="oailastmodifieddate">20130801</Metadata> 
     42    <Metadata name="lastmodified">1375690479</Metadata> 
     43    <Metadata name="lastmodifieddate">20130805</Metadata> 
     44    <Metadata name="oailastmodified">1375690532</Metadata> 
     45    <Metadata name="oailastmodifieddate">20130805</Metadata> 
    4646    <Metadata name="assocfilepath">HASH1a9cea0f.dir</Metadata> 
    4747    <Metadata name="gsdlassocfile">pdf01-2_1.jpg:image/jpeg:</Metadata> 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASH2bdf3b19.dir/doc.xml

    r27958 r27976  
    1111    <Metadata name="Title">pdf05-notext</Metadata> 
    1212    <Metadata name="gsdlsourcefilename">import/notext/pdf05-notext.pdf</Metadata> 
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336456/pdf05-notext/pdf05-notext.item</Metadata> 
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690517/pdf05-notext/pdf05-notext.item</Metadata> 
    1414    <Metadata name="OrigSource">pdf05-notext.item</Metadata> 
    1515    <Metadata name="Source">pdf05-notext.pdf</Metadata> 
     
    2323    <Metadata name="srclinkFile">doc.pdf</Metadata> 
    2424    <Metadata name="NumPages">0</Metadata> 
    25     <Metadata name="ex.XMP.ModifyDate">2007:06:13 12:29:51+12:00</Metadata> 
    26     <Metadata name="ex.XMP.Format">application/pdf</Metadata> 
    27     <Metadata name="ex.XMP.MetadataDate">2007:06:13 12:29:51+12:00</Metadata> 
    28     <Metadata name="ex.XMP.Company">University of Waikato</Metadata> 
    29     <Metadata name="ex.XMP.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata> 
     25    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
     26    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import/notext</Metadata> 
     27    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata> 
     28    <Metadata name="ex.File.FileName">pdf05-notext.pdf</Metadata> 
     29    <Metadata name="ex.File.FilePermissions">644</Metadata> 
     30    <Metadata name="ex.File.FileSize">748503</Metadata> 
     31    <Metadata name="ex.File.FileType">PDF</Metadata> 
     32    <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
     33    <Metadata name="ex.PDF.Author">Administrator</Metadata> 
     34    <Metadata name="ex.PDF.Company">University of Waikato</Metadata> 
     35    <Metadata name="ex.PDF.CreateDate">2007:06:13 12:28:29+12:00</Metadata> 
     36    <Metadata name="ex.PDF.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata> 
     37    <Metadata name="ex.PDF.Language">EN-US</Metadata> 
     38    <Metadata name="ex.PDF.Linearized">true</Metadata> 
     39    <Metadata name="ex.PDF.ModifyDate">2007:06:13 12:29:51+12:00</Metadata> 
     40    <Metadata name="ex.PDF.PDFVersion">1.4</Metadata> 
     41    <Metadata name="ex.PDF.PageCount">9</Metadata> 
     42    <Metadata name="ex.PDF.PageLayout">OneColumn</Metadata> 
    3043    <Metadata name="ex.PDF.Producer">Acrobat Distiller 7.0.5 &amp;#40;Windows&amp;#41;</Metadata> 
    31     <Metadata name="ex.File.FilePermissions">644</Metadata> 
    32     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import/notext</Metadata> 
    33     <Metadata name="ex.PDF.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata> 
    34     <Metadata name="ex.PDF.PageLayout">OneColumn</Metadata> 
    35     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata> 
    36     <Metadata name="ex.PDF.Author">Administrator</Metadata> 
    37     <Metadata name="ex.File.FileType">PDF</Metadata> 
    3844    <Metadata name="ex.PDF.SourceModified">D:20070613002201</Metadata> 
    3945    <Metadata name="ex.PDF.TaggedPDF">true</Metadata> 
    40     <Metadata name="ex.PDF.Linearized">true</Metadata> 
    41     <Metadata name="ex.PDF.CreateDate">2007:06:13 12:28:29+12:00</Metadata> 
    42     <Metadata name="ex.PDF.ModifyDate">2007:06:13 12:29:51+12:00</Metadata> 
    43     <Metadata name="ex.PDF.PDFVersion">1.4</Metadata> 
     46    <Metadata name="ex.XMP.Company">University of Waikato</Metadata> 
     47    <Metadata name="ex.XMP.CreateDate">2007:06:13 12:28:29+12:00</Metadata> 
     48    <Metadata name="ex.XMP.Creator">Acrobat PDFMaker 7.0.7 for Word</Metadata> 
     49    <Metadata name="ex.XMP.CreatorTool">Acrobat PDFMaker 7.0.7 for Word</Metadata> 
     50    <Metadata name="ex.XMP.DocumentID">uuid:5915f718-0b63-4b63-ae6e-1efee5151379</Metadata> 
     51    <Metadata name="ex.XMP.Format">application/pdf</Metadata> 
    4452    <Metadata name="ex.XMP.InstanceID">uuid:2518e5c5-f724-4ea7-8dfc-e024661fc8c5</Metadata> 
    45     <Metadata name="ex.File.FileName">pdf05-notext.pdf</Metadata> 
    46     <Metadata name="ex.PDF.Company">University of Waikato</Metadata> 
    47     <Metadata name="ex.PDF.Language">EN-US</Metadata> 
    48     <Metadata name="ex.File.FileSize">748503</Metadata> 
     53    <Metadata name="ex.XMP.MetadataDate">2007:06:13 12:29:51+12:00</Metadata> 
     54    <Metadata name="ex.XMP.ModifyDate">2007:06:13 12:29:51+12:00</Metadata> 
     55    <Metadata name="ex.XMP.Producer">Acrobat Distiller 7.0.5 &amp;#40;Windows&amp;#41;</Metadata> 
     56    <Metadata name="ex.XMP.SourceModified">D:20070613002201</Metadata> 
     57    <Metadata name="ex.XMP.VersionID">14</Metadata> 
    4958    <Metadata name="ex.XMP.XMPToolkit">3.1-702</Metadata> 
    50     <Metadata name="ex.XMP.SourceModified">D:20070613002201</Metadata> 
    51     <Metadata name="ex.PDF.PageCount">9</Metadata> 
    52     <Metadata name="ex.XMP.VersionID">14</Metadata> 
    53     <Metadata name="ex.XMP.Producer">Acrobat Distiller 7.0.5 &amp;#40;Windows&amp;#41;</Metadata> 
    54     <Metadata name="ex.XMP.DocumentID">uuid:5915f718-0b63-4b63-ae6e-1efee5151379</Metadata> 
    55     <Metadata name="ex.XMP.CreateDate">2007:06:13 12:28:29+12:00</Metadata> 
    56     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
    57     <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
    58     <Metadata name="ex.XMP.CreatorTool">Acrobat PDFMaker 7.0.7 for Word</Metadata> 
    5959    <Metadata name="Identifier">HASH2bdf3b19cf094fd01ac7ad</Metadata> 
    60     <Metadata name="lastmodified">1375336014</Metadata> 
    61     <Metadata name="lastmodifieddate">20130801</Metadata> 
    62     <Metadata name="oailastmodified">1375336462</Metadata> 
    63     <Metadata name="oailastmodifieddate">20130801</Metadata> 
     60    <Metadata name="lastmodified">1375690479</Metadata> 
     61    <Metadata name="lastmodifieddate">20130805</Metadata> 
     62    <Metadata name="oailastmodified">1375690523</Metadata> 
     63    <Metadata name="oailastmodifieddate">20130805</Metadata> 
    6464    <Metadata name="assocfilepath">HASH2bdf3b19.dir</Metadata> 
    6565    <Metadata name="gsdlassocfile">pdf05-notext-0.jpg:image/jpeg:</Metadata> 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/HASHbb6e0c8f.dir/doc.xml

    r27958 r27976  
    1111    <Metadata name="Title">pdf06-weirdchars</Metadata> 
    1212    <Metadata name="gsdlsourcefilename">import/notext/pdf06-weirdchars.pdf</Metadata> 
    13     <Metadata name="gsdlconvertedfilename">tmp/1375336462/pdf06-weirdchars/pdf06-weirdchars.item</Metadata> 
     13    <Metadata name="gsdlconvertedfilename">tmp/1375690523/pdf06-weirdchars/pdf06-weirdchars.item</Metadata> 
    1414    <Metadata name="OrigSource">pdf06-weirdchars.item</Metadata> 
    1515    <Metadata name="Source">pdf06-weirdchars.pdf</Metadata> 
     
    2323    <Metadata name="srclinkFile">doc.pdf</Metadata> 
    2424    <Metadata name="NumPages">0</Metadata> 
    25     <Metadata name="ex.File.FileModifyDate">2013:08:01 17:46:54+12:00</Metadata> 
    26     <Metadata name="ex.PDF.Author">rg</Metadata> 
    27     <Metadata name="ex.PDF.PageCount">12</Metadata> 
    28     <Metadata name="ex.File.FileType">PDF</Metadata> 
    29     <Metadata name="ex.PDF.PDFVersion">1.2</Metadata> 
    30     <Metadata name="ex.PDF.Producer">AFPL Ghostscript 7.04</Metadata> 
     25    <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
     26    <Metadata name="ex.File.Directory">/research/ak19/GS2bin_5Aug2013/collect/Enhanced-PDF/import/notext</Metadata> 
     27    <Metadata name="ex.File.FileModifyDate">2013:08:05 20:14:39+12:00</Metadata> 
    3128    <Metadata name="ex.File.FileName">pdf06-weirdchars.pdf</Metadata> 
    3229    <Metadata name="ex.File.FilePermissions">644</Metadata> 
     30    <Metadata name="ex.File.FileSize">846134</Metadata> 
     31    <Metadata name="ex.File.FileType">PDF</Metadata> 
     32    <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
     33    <Metadata name="ex.PDF.Author">rg</Metadata> 
    3334    <Metadata name="ex.PDF.CreateDate">10/7/2002 16:9:30</Metadata> 
     35    <Metadata name="ex.PDF.Creator">Pscript.dll Version 5.0</Metadata> 
    3436    <Metadata name="ex.PDF.Linearized">false</Metadata> 
    35     <Metadata name="ex.File.Directory">/research/ak19/GS2bin_1Aug2013/collect/Enhanced-PDF/import/notext</Metadata> 
    36     <Metadata name="ex.PDF.Creator">Pscript.dll Version 5.0</Metadata> 
     37    <Metadata name="ex.PDF.PDFVersion">1.2</Metadata> 
     38    <Metadata name="ex.PDF.PageCount">12</Metadata> 
     39    <Metadata name="ex.PDF.Producer">AFPL Ghostscript 7.04</Metadata> 
    3740    <Metadata name="ex.PDF.Title">metsreportfinal.doc</Metadata> 
    38     <Metadata name="ex.File.FileSize">846134</Metadata> 
    39     <Metadata name="ex.ExifTool.ExifToolVersion">8.57</Metadata> 
    40     <Metadata name="ex.File.MIMEType">application/pdf</Metadata> 
    4141    <Metadata name="Identifier">HASHbb6e0c8f087c9c5b2e301a</Metadata> 
    42     <Metadata name="lastmodified">1375336014</Metadata> 
    43     <Metadata name="lastmodifieddate">20130801</Metadata> 
    44     <Metadata name="oailastmodified">1375336470</Metadata> 
    45     <Metadata name="oailastmodifieddate">20130801</Metadata> 
     42    <Metadata name="lastmodified">1375690479</Metadata> 
     43    <Metadata name="lastmodifieddate">20130805</Metadata> 
     44    <Metadata name="oailastmodified">1375690531</Metadata> 
     45    <Metadata name="oailastmodifieddate">20130805</Metadata> 
    4646    <Metadata name="assocfilepath">HASHbb6e0c8f.dir</Metadata> 
    4747    <Metadata name="gsdlassocfile">pdf06-weirdchars-0.jpg:image/jpeg:</Metadata> 
    4848    <Metadata name="gsdlassocfile">pdf06-weirdchars-0_thumb.gif:image/gif:</Metadata> 
    4949    <Metadata name="gsdlassocfile">pdf06-weirdchars-0_screen.jpeg:image/jpeg:</Metadata> 
    50     <Metadata name="gsdlassocfile">pdf06-weirdchars-10.jpg:image/jpeg:</Metadata> 
    51     <Metadata name="gsdlassocfile">pdf06-weirdchars-10_thumb.gif:image/gif:</Metadata> 
    52     <Metadata name="gsdlassocfile">pdf06-weirdchars-10_screen.jpeg:image/jpeg:</Metadata> 
    53     <Metadata name="gsdlassocfile">pdf06-weirdchars-11.jpg:image/jpeg:</Metadata> 
    54     <Metadata name="gsdlassocfile">pdf06-weirdchars-11_thumb.gif:image/gif:</Metadata> 
    55     <Metadata name="gsdlassocfile">pdf06-weirdchars-11_screen.jpeg:image/jpeg:</Metadata> 
    5650    <Metadata name="gsdlassocfile">pdf06-weirdchars-1.jpg:image/jpeg:</Metadata> 
    5751    <Metadata name="gsdlassocfile">pdf06-weirdchars-1_thumb.gif:image/gif:</Metadata> 
     
    8175    <Metadata name="gsdlassocfile">pdf06-weirdchars-9_thumb.gif:image/gif:</Metadata> 
    8276    <Metadata name="gsdlassocfile">pdf06-weirdchars-9_screen.jpeg:image/jpeg:</Metadata> 
     77    <Metadata name="gsdlassocfile">pdf06-weirdchars-10.jpg:image/jpeg:</Metadata> 
     78    <Metadata name="gsdlassocfile">pdf06-weirdchars-10_thumb.gif:image/gif:</Metadata> 
     79    <Metadata name="gsdlassocfile">pdf06-weirdchars-10_screen.jpeg:image/jpeg:</Metadata> 
     80    <Metadata name="gsdlassocfile">pdf06-weirdchars-11.jpg:image/jpeg:</Metadata> 
     81    <Metadata name="gsdlassocfile">pdf06-weirdchars-11_thumb.gif:image/gif:</Metadata> 
     82    <Metadata name="gsdlassocfile">pdf06-weirdchars-11_screen.jpeg:image/jpeg:</Metadata> 
    8383    <Metadata name="gsdlassocfile">doc.pdf:application/pdf:</Metadata> 
    8484  </Description> 
     
    116116<Section> 
    117117  <Description> 
    118     <Metadata name="PageNum">1</Metadata> 
    119     <Metadata name="Image">pdf06-weirdchars-10.jpg</Metadata> 
    120     <Metadata name="Source">pdf06-weirdchars-10.jpg</Metadata> 
    121     <Metadata name="SourceFile">pdf06-weirdchars-10.jpg</Metadata> 
    122     <Metadata name="FileSize">89333</Metadata> 
    123     <Metadata name="ImageType">JPEG</Metadata> 
    124     <Metadata name="ImageWidth">595</Metadata> 
    125     <Metadata name="ImageHeight">842</Metadata> 
    126     <Metadata name="ImageSize">89.3KB</Metadata> 
    127     <Metadata name="srclink_file">pdf06-weirdchars-10.jpg</Metadata> 
    128     <Metadata name="srclinkFile">pdf06-weirdchars-10.jpg</Metadata> 
    129     <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata> 
    130     <Metadata name="ThumbType">gif</Metadata> 
    131     <Metadata name="Thumb">pdf06-weirdchars-10_thumb.gif</Metadata> 
    132     <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata> 
    133     <Metadata name="ThumbWidth">71</Metadata> 
    134     <Metadata name="ThumbHeight">100</Metadata> 
    135     <Metadata name="ScreenType">jpeg</Metadata> 
    136     <Metadata name="Screen">pdf06-weirdchars-10_screen.jpeg</Metadata> 
    137     <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata> 
    138     <Metadata name="ScreenWidth">707</Metadata> 
    139     <Metadata name="ScreenHeight">1000</Metadata> 
    140     <Metadata name="FileFormat">PagedImage</Metadata> 
    141     <Metadata name="NoText">1</Metadata> 
    142     <Metadata name="Title">1</Metadata> 
    143   </Description> 
    144   <Content>This document has no text.</Content> 
    145 </Section> 
    146 <Section> 
    147   <Description> 
    148     <Metadata name="PageNum">2</Metadata> 
    149     <Metadata name="Image">pdf06-weirdchars-11.jpg</Metadata> 
    150     <Metadata name="Source">pdf06-weirdchars-11.jpg</Metadata> 
    151     <Metadata name="SourceFile">pdf06-weirdchars-11.jpg</Metadata> 
    152     <Metadata name="FileSize">43777</Metadata> 
    153     <Metadata name="ImageType">JPEG</Metadata> 
    154     <Metadata name="ImageWidth">595</Metadata> 
    155     <Metadata name="ImageHeight">842</Metadata> 
    156     <Metadata name="ImageSize">43.8KB</Metadata> 
    157     <Metadata name="srclink_file">pdf06-weirdchars-11.jpg</Metadata> 
    158     <Metadata name="srclinkFile">pdf06-weirdchars-11.jpg</Metadata> 
    159     <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata> 
    160     <Metadata name="ThumbType">gif</Metadata> 
    161     <Metadata name="Thumb">pdf06-weirdchars-11_thumb.gif</Metadata> 
    162     <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata> 
    163     <Metadata name="ThumbWidth">71</Metadata> 
    164     <Metadata name="ThumbHeight">100</Metadata> 
    165     <Metadata name="ScreenType">jpeg</Metadata> 
    166     <Metadata name="Screen">pdf06-weirdchars-11_screen.jpeg</Metadata> 
    167     <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata> 
    168     <Metadata name="ScreenWidth">707</Metadata> 
    169     <Metadata name="ScreenHeight">1000</Metadata> 
    170     <Metadata name="FileFormat">PagedImage</Metadata> 
    171     <Metadata name="NoText">1</Metadata> 
    172     <Metadata name="Title">2</Metadata> 
    173   </Description> 
    174   <Content>This document has no text.</Content> 
    175 </Section> 
    176 <Section> 
    177   <Description> 
    178118    <Metadata name="PageNum">2</Metadata> 
    179119    <Metadata name="Image">pdf06-weirdchars-1.jpg</Metadata> 
     
    444384  <Content>This document has no text.</Content> 
    445385</Section> 
     386<Section> 
     387  <Description> 
     388    <Metadata name="PageNum">11</Metadata> 
     389    <Metadata name="Image">pdf06-weirdchars-10.jpg</Metadata> 
     390    <Metadata name="Source">pdf06-weirdchars-10.jpg</Metadata> 
     391    <Metadata name="SourceFile">pdf06-weirdchars-10.jpg</Metadata> 
     392    <Metadata name="FileSize">89333</Metadata> 
     393    <Metadata name="ImageType">JPEG</Metadata> 
     394    <Metadata name="ImageWidth">595</Metadata> 
     395    <Metadata name="ImageHeight">842</Metadata> 
     396    <Metadata name="ImageSize">89.3KB</Metadata> 
     397    <Metadata name="srclink_file">pdf06-weirdchars-10.jpg</Metadata> 
     398    <Metadata name="srclinkFile">pdf06-weirdchars-10.jpg</Metadata> 
     399    <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata> 
     400    <Metadata name="ThumbType">gif</Metadata> 
     401    <Metadata name="Thumb">pdf06-weirdchars-10_thumb.gif</Metadata> 
     402    <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata> 
     403    <Metadata name="ThumbWidth">71</Metadata> 
     404    <Metadata name="ThumbHeight">100</Metadata> 
     405    <Metadata name="ScreenType">jpeg</Metadata> 
     406    <Metadata name="Screen">pdf06-weirdchars-10_screen.jpeg</Metadata> 
     407    <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata> 
     408    <Metadata name="ScreenWidth">707</Metadata> 
     409    <Metadata name="ScreenHeight">1000</Metadata> 
     410    <Metadata name="FileFormat">PagedImage</Metadata> 
     411    <Metadata name="NoText">1</Metadata> 
     412    <Metadata name="Title">11</Metadata> 
     413  </Description> 
     414  <Content>This document has no text.</Content> 
     415</Section> 
     416<Section> 
     417  <Description> 
     418    <Metadata name="PageNum">12</Metadata> 
     419    <Metadata name="Image">pdf06-weirdchars-11.jpg</Metadata> 
     420    <Metadata name="Source">pdf06-weirdchars-11.jpg</Metadata> 
     421    <Metadata name="SourceFile">pdf06-weirdchars-11.jpg</Metadata> 
     422    <Metadata name="FileSize">43777</Metadata> 
     423    <Metadata name="ImageType">JPEG</Metadata> 
     424    <Metadata name="ImageWidth">595</Metadata> 
     425    <Metadata name="ImageHeight">842</Metadata> 
     426    <Metadata name="ImageSize">43.8KB</Metadata> 
     427    <Metadata name="srclink_file">pdf06-weirdchars-11.jpg</Metadata> 
     428    <Metadata name="srclinkFile">pdf06-weirdchars-11.jpg</Metadata> 
     429    <Metadata name="srcicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[srclinkFile]&quot; width=&quot;[ImageWidth]&quot; height=&quot;[ImageHeight]&quot;&gt;</Metadata> 
     430    <Metadata name="ThumbType">gif</Metadata> 
     431    <Metadata name="Thumb">pdf06-weirdchars-11_thumb.gif</Metadata> 
     432    <Metadata name="thumbicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Thumb]&quot; alt=&quot;[Thumb]&quot; width=&quot;[ThumbWidth]&quot; height=&quot;[ThumbHeight]&quot;&gt;</Metadata> 
     433    <Metadata name="ThumbWidth">71</Metadata> 
     434    <Metadata name="ThumbHeight">100</Metadata> 
     435    <Metadata name="ScreenType">jpeg</Metadata> 
     436    <Metadata name="Screen">pdf06-weirdchars-11_screen.jpeg</Metadata> 
     437    <Metadata name="screenicon">&lt;img src=&quot;_httpprefix_/collect/[collection]/index/assoc/[parent(Top):assocfilepath]/[Screen]&quot; width=[ScreenWidth] height=[ScreenHeight]&gt;</Metadata> 
     438    <Metadata name="ScreenWidth">707</Metadata> 
     439    <Metadata name="ScreenHeight">1000</Metadata> 
     440    <Metadata name="FileFormat">PagedImage</Metadata> 
     441    <Metadata name="NoText">1</Metadata> 
     442    <Metadata name="Title">12</Metadata> 
     443  </Description> 
     444  <Content>This document has no text.</Content> 
     445</Section> 
    446446</Section> 
    447447</Archive> 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/archives/earliestDatestamp

    r27958 r27976  
    1 1375336454 
     11375690515 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/etc/collect.cfg

    r27958 r27976  
    2626plugin  EmailPlugin 
    2727plugin  PDFPlugin -process_exp notext.*\.pdf -convert_to pagedimg_jpg 
     28plugin  PDFPlugin -use_sections -convert_to html 
    2829plugin  RTFPlugin 
    2930plugin  WordPlugin 
     
    3536plugin  NulPlugin 
    3637plugin  EmbeddedMetadataPlugin 
    37 plugin  PDFPlugin -use_sections -convert_to html 
    3838plugin  MetadataXMLPlugin 
    3939plugin  ArchivesInfPlugin -sort 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/gli.col

    r27958 r27976  
    4242            <Argument enabled="true" name="verbosity">5</Argument> 
    4343        </Import> 
    44         <Schedule/> 
     44        <Schedule> 
     45            <Argument enabled="false" name="frequency">daily</Argument> 
     46            <Argument enabled="false" name="action">add</Argument> 
     47            <Argument enabled="false" name="toaddr"/> 
     48            <Argument enabled="false" name="fromaddr"/> 
     49            <Argument enabled="false" name="smtp"/> 
     50        </Schedule> 
    4551    </BuildConfig> 
    4652</GathererCollection> 
  • other-projects/nightly-tasks/diffcol/trunk/model-collect/Enhanced-PDF/index/build.cfg

    r27958 r27976  
    1 builddate   1375336472 
     1builddate   1375690533 
    22buildtype   mgpp 
    3 earliestdatestamp   1375336454 
     3earliestdatestamp   1375690515 
    44indexfieldmap   text->TX    dc.Title,ex.dc.Title,Title->TI  Source->SO 
    55indexfields text    dc.Title,ex.dc.Title,Title  Source