Ticket #356 (closed defect: worksforme)

Opened 11 years ago

Last modified 9 years ago

import.pl generates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata

Reported by: tonyh Owned by: kjdon
Priority: moderate Milestone: 2.84 Release
Component: Collection Building Severity: minor
Keywords: import.pl args maxdocs OIDtype OIDmetadata error Cc:

Description

While executing import.pl using arg maxdocs in conjunction with OIDtype and OIDmetadata, error: no dc.Identifier metadata found, generating hash id error is generated.

Workable Arg combinations:

  • OIDtype and OIDmetadata work fine without maxdocs
  • maxdocs works fine without OIDtype and OIDmetadata

Command line with args maxdoc, OIDtype, OIDmetadata:

perl -S import.pl -maxdocs 2 -removeold -OIDtype assigned -OIDmetadata dc.Identifier -verbosity 3 BL

Output with errors:

RecPlug: getting directory C:\Program Files\Greenstone\collect\BL\import
RecPlug metadata recurring: 0004.pdf
RecPlug metadata recurring: 0008.pdf
RecPlug: preparing metadata for 0004.pdf
RecPlug recurring: 0004.pdf
Converting 0004.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0004.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0004.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T04:58:21+00:00"
 extracted Title metadata "หนังสือชุด ..." from first 100 chars
no dc.Identifier metadata found, generating hash id
no dc.Identifier metadata found, generating hash id
RecPlug: preparing metadata for 0008.pdf
RecPlug recurring: 0008.pdf
Converting 0008.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0008.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0008.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T09:14:55+00:00"
 extracted Title metadata "..." from first 100 chars
no dc.Identifier metadata found, generating hash id
no dc.Identifier metadata found, generating hash id

At this point, buildcol.pl fails as there is no data:

ivf.pass1 : Inverted buffer size:           5242880 bytes
ivf.pass1 : Max memory needed for 1 chunk:        1 bytes
ivf.pass1 : Number of chunks written:             1
ivf.pass1 : Number of documents:                  2
ivf.pass1 : Number of fragments:                  0
ivf.pass1 : Number of words:                      0
ivf.pass1 : Size of word dictionary:              0
ivf.pass1 : Size of tag dictionary:               2
Stats (Creating index dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;)
Total bytes in collection: 201836
Total bytes in dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;: 0
***************
WARNING: There is very little or no text to process for dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;
         Was this your intention?
***************

However, this works fine if the arg maxdocs is not used:

Command line with args OIDtype, OIDmetadata:

perl -S import.pl -removeold -OIDtype assigned -OIDmetadata dc.Identifier -verbosity 3 BL

Output:

RecPlug: getting directory C:\Program Files\Greenstone\collect\BL\import
RecPlug metadata recurring: 0004.pdf
RecPlug metadata recurring: 0008.pdf
RecPlug metadata recurring: 0009.pdf
RecPlug metadata recurring: metadata.xml
RecPlug: preparing metadata for 0004.pdf
File "0004.pdf" matches filespec "0004\.pdf"
RecPlug recurring: 0004.pdf
Converting 0004.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0004.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0004.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T04:58:21+00:00"
 extracted Title metadata "หนังสือชุด ..." from first 100 chars
RecPlug: preparing metadata for 0008.pdf
File "0008.pdf" matches filespec "0008\.pdf"
RecPlug recurring: 0008.pdf
Converting 0008.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0008.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0008.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T09:14:55+00:00"
 extracted Title metadata "..." from first 100 chars
RecPlug: preparing metadata for 0009.pdf
File "0009.pdf" matches filespec "0009\.pdf"
RecPlug recurring: 0009.pdf
Converting 0009.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0009.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0009.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-17T04:48:09+00:00"
 extracted Title metadata "..." from first 100 chars

Change History

Changed 11 years ago by tonyh

  • summary changed from import.pl gernates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata to import.pl generates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata

Changed 10 years ago by kjdon

  • owner changed from nobody to kjdon
  • milestone set to Collection building wishlist

Changed 9 years ago by kjdon

  • milestone changed from Collection building wishlist to 2.84 Release

Changed 9 years ago by kjdon

  • status changed from new to closed
  • resolution set to worksforme

works for me.

Note: See TracTickets for help on using tickets.