gsarch seg faults
|Reported by:||oranfry||Owned by:||nobody|
There is something weird going on with gsarch collection. Some documents caused it to seg fault. This is because its trying to get the childtype which doesn't exist: eg. classifytype = response.docInfo[numparents-1].metadatachildtype.values; in browsetools. To stop the seg faulting, I added checks for the existence of childtype metadata before using it. now it doesn't seg fault, but the documents that used to seg fault now have a table of contents full of untitled, empty nodes. Adding format DocumentContents false to the config file makes the gsarch collection look good. But it doesn't fix the underlying problem: some of the documents e.g identifiers like (18.104.22.168.2.20030212094334.00affdf0-mail.wrlc.org) get treated as hierarchical, but others like (3FF885B4.4050402-cs.waikato.ac.nz) don't. Why? Is it because the first one has only numbers in between the dots?
error.txt was getting messages like "receptionist: no childtype element in metadata map!" and "Error: call to filter failed for 433D413B.3090402-gmx.net in OIDtools::get_info (protocol error)". These documents were not getting Title metadata set when viewing the document. Looking in OIDtools.cpp shows that is_top() assumes the presence of a "." in the doc ID means that this is a section and not a document. EMAILPlug has now been changed so that dots are removed from the document ids (we were using the email message id as the unique doc id).