Opened 16 years ago
Closed 14 years ago
#356 closed defect (worksforme)
import.pl generates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata
Reported by: | tonyh | Owned by: | kjdon |
---|---|---|---|
Priority: | moderate | Milestone: | 2.84 Release |
Component: | Collection Building | Severity: | minor |
Keywords: | import.pl args maxdocs OIDtype OIDmetadata error | Cc: |
Description
While executing import.pl using arg maxdocs in conjunction with OIDtype and OIDmetadata, error: no dc.Identifier metadata found, generating hash id error is generated.
Workable Arg combinations:
- OIDtype and OIDmetadata work fine without maxdocs
- maxdocs works fine without OIDtype and OIDmetadata
Command line with args maxdoc, OIDtype, OIDmetadata:
perl -S import.pl -maxdocs 2 -removeold -OIDtype assigned -OIDmetadata dc.Identifier -verbosity 3 BL
Output with errors:
RecPlug: getting directory C:\Program Files\Greenstone\collect\BL\import RecPlug metadata recurring: 0004.pdf RecPlug metadata recurring: 0008.pdf RecPlug: preparing metadata for 0004.pdf RecPlug recurring: 0004.pdf Converting 0004.pdf to HTML format BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0004.html as (utf8,en) HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0004.html extracted "GENERATOR" metadata "pdftohtml 0.36" extracted "date" metadata "2008-03-18T04:58:21+00:00" extracted Title metadata "หนังสือชุด ..." from first 100 chars no dc.Identifier metadata found, generating hash id no dc.Identifier metadata found, generating hash id RecPlug: preparing metadata for 0008.pdf RecPlug recurring: 0008.pdf Converting 0008.pdf to HTML format BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0008.html as (utf8,en) HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0008.html extracted "GENERATOR" metadata "pdftohtml 0.36" extracted "date" metadata "2008-03-18T09:14:55+00:00" extracted Title metadata "..." from first 100 chars no dc.Identifier metadata found, generating hash id no dc.Identifier metadata found, generating hash id
At this point, buildcol.pl fails as there is no data:
ivf.pass1 : Inverted buffer size: 5242880 bytes ivf.pass1 : Max memory needed for 1 chunk: 1 bytes ivf.pass1 : Number of chunks written: 1 ivf.pass1 : Number of documents: 2 ivf.pass1 : Number of fragments: 0 ivf.pass1 : Number of words: 0 ivf.pass1 : Size of word dictionary: 0 ivf.pass1 : Size of tag dictionary: 2 Stats (Creating index dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;) Total bytes in collection: 201836 Total bytes in dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title;: 0 *************** WARNING: There is very little or no text to process for dc.Creator,dc.Description^abstract,dc.Publisher,dc.Subject,dc.Title; Was this your intention? ***************
However, this works fine if the arg maxdocs is not used:
Command line with args OIDtype, OIDmetadata:
perl -S import.pl -removeold -OIDtype assigned -OIDmetadata dc.Identifier -verbosity 3 BL
Output:
RecPlug: getting directory C:\Program Files\Greenstone\collect\BL\import RecPlug metadata recurring: 0004.pdf RecPlug metadata recurring: 0008.pdf RecPlug metadata recurring: 0009.pdf RecPlug metadata recurring: metadata.xml RecPlug: preparing metadata for 0004.pdf File "0004.pdf" matches filespec "0004\.pdf" RecPlug recurring: 0004.pdf Converting 0004.pdf to HTML format BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0004.html as (utf8,en) HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0004.html extracted "GENERATOR" metadata "pdftohtml 0.36" extracted "date" metadata "2008-03-18T04:58:21+00:00" extracted Title metadata "หนังสือชุด ..." from first 100 chars RecPlug: preparing metadata for 0008.pdf File "0008.pdf" matches filespec "0008\.pdf" RecPlug recurring: 0008.pdf Converting 0008.pdf to HTML format BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0008.html as (utf8,en) HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0008.html extracted "GENERATOR" metadata "pdftohtml 0.36" extracted "date" metadata "2008-03-18T09:14:55+00:00" extracted Title metadata "..." from first 100 chars RecPlug: preparing metadata for 0009.pdf File "0009.pdf" matches filespec "0009\.pdf" RecPlug recurring: 0009.pdf Converting 0009.pdf to HTML format BasPlug: reading C:\Program Files\Greenstone\collect\BL\tmp\0009.html as (utf8,en) HTMLPlug: processing C:\Program Files\Greenstone\collect\BL\tmp\0009.html extracted "GENERATOR" metadata "pdftohtml 0.36" extracted "date" metadata "2008-03-17T04:48:09+00:00" extracted Title metadata "..." from first 100 chars
Change History (4)
comment:1 by , 16 years ago
Summary: | import.pl gernates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata → import.pl generates an error using arg -maxdocs along with args -OIDtype and -OIDmetadata |
---|
comment:2 by , 15 years ago
Milestone: | → Collection building wishlist |
---|---|
Owner: | changed from | to
comment:3 by , 14 years ago
Milestone: | Collection building wishlist → 2.84 Release |
---|
comment:4 by , 14 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
Note:
See TracTickets
for help on using tickets.
works for me.