Opened 13 years ago

Closed 11 years ago

#356 closed defect (worksforme)

import.pl generates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata

Reported by: tonyh Owned by: kjdon
Priority: moderate Milestone: 2.84 Release
Component: Collection Building Severity: minor
Keywords: import.pl args maxdocs OIDtype OIDmetadata error Cc:

Description

While executing import.pl using arg maxdocs in conjunction with OIDtype and OIDmetadata, error: no dc.Identifier metadata found, generating hash id error is generated.

Workable Arg combinations:

  • OIDtype and OIDmetadata work fine without maxdocs
  • maxdocs works fine without OIDtype and OIDmetadata

Command line with args maxdoc, OIDtype, OIDmetadata:

perl -S import.pl -maxdocs 2 -removeold -OIDtype assigned -OIDmetadata dc.Identifier -verbosity 3 BL

Output with errors:

RecPlug: getting directory C:\Program Files\Greenstone\collect\BL\import
RecPlug metadata recurring: 0004.pdf
RecPlug metadata recurring: 0008.pdf
RecPlug: preparing metadata for 0004.pdf
RecPlug recurring: 0004.pdf
Converting 0004.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0004.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0004.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T04:58:21+00:00"
 extracted Title metadata "หนังสือชุด ..." from first 100 chars
no dc.Identifier metadata found, generating hash id
no dc.Identifier metadata found, generating hash id
RecPlug: preparing metadata for 0008.pdf
RecPlug recurring: 0008.pdf
Converting 0008.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0008.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0008.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T09:14:55+00:00"
 extracted Title metadata "..." from first 100 chars
no dc.Identifier metadata found, generating hash id
no dc.Identifier metadata found, generating hash id

At this point, buildcol.pl fails as there is no data:

ivf.pass1 : Inverted buffer size:           5242880 bytes
ivf.pass1 : Max memory needed for 1 chunk:        1 bytes
ivf.pass1 : Number of chunks written:             1
ivf.pass1 : Number of documents:                  2
ivf.pass1 : Number of fragments:                  0
ivf.pass1 : Number of words:                      0
ivf.pass1 : Size of word dictionary:              0
ivf.pass1 : Size of tag dictionary:               2
Stats (Creating index dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;)
Total bytes in collection: 201836
Total bytes in dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;: 0
***************
WARNING: There is very little or no text to process for dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;
         Was this your intention?
***************

However, this works fine if the arg maxdocs is not used:

Command line with args OIDtype, OIDmetadata:

perl -S import.pl -removeold -OIDtype assigned -OIDmetadata dc.Identifier -verbosity 3 BL

Output:

RecPlug: getting directory C:\Program Files\Greenstone\collect\BL\import
RecPlug metadata recurring: 0004.pdf
RecPlug metadata recurring: 0008.pdf
RecPlug metadata recurring: 0009.pdf
RecPlug metadata recurring: metadata.xml
RecPlug: preparing metadata for 0004.pdf
File "0004.pdf" matches filespec "0004\.pdf"
RecPlug recurring: 0004.pdf
Converting 0004.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0004.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0004.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T04:58:21+00:00"
 extracted Title metadata "หนังสือชุด ..." from first 100 chars
RecPlug: preparing metadata for 0008.pdf
File "0008.pdf" matches filespec "0008\.pdf"
RecPlug recurring: 0008.pdf
Converting 0008.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0008.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0008.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-18T09:14:55+00:00"
 extracted Title metadata "..." from first 100 chars
RecPlug: preparing metadata for 0009.pdf
File "0009.pdf" matches filespec "0009\.pdf"
RecPlug recurring: 0009.pdf
Converting 0009.pdf to HTML format
BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0009.html as (utf8,en)
HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0009.html
 extracted "GENERATOR" metadata "pdftohtml 0.36"
 extracted "date" metadata "2008-03-17T04:48:09+00:00"
 extracted Title metadata "..." from first 100 chars

Change History (4)

comment:1 by tonyh, 13 years ago

Summary: import.pl gernates an error using arg -maxdocs along with args -OIDtype and -OIDmetadataimport.pl generates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata

comment:2 by kjdon, 12 years ago

Milestone: Collection building wishlist
Owner: changed from nobody to kjdon

comment:3 by kjdon, 11 years ago

Milestone: Collection building wishlist2.84 Release

comment:4 by kjdon, 11 years ago

Resolution: worksforme
Status: newclosed

works for me.

Note: See TracTickets for help on using tickets.