source: trunk/gsdl/perllib

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @2835   22 years dmm9 Corrected pluginfo entry and renamed extract_date to …
(edit) @2819   23 years sjboddie Altered HTMLPlug's description_tags option a bit so it should now also …
(edit) @2818   23 years sjboddie * empty log message *
(edit) @2817   23 years sjboddie Implemented a description_tags option to HTMLPlug for splitting an …
(edit) @2816   23 years sjboddie Added cover_image option to BasPlug for associating a jpeg image as a …
(edit) @2813   23 years sjboddie Altered RecPlug's -use_metadata_files option to use better XML files …
(edit) @2812   23 years sjboddie * empty log message *
(edit) @2811   23 years sjboddie * empty log message *
(edit) @2810   23 years sjboddie Created GAPlug (and XMLPlug base class) to replace the old GMLPlug. …
(edit) @2808   23 years sjboddie * empty log message *
(edit) @2804   23 years sjboddie * empty log message *
(edit) @2803   23 years sjboddie * empty log message *
(edit) @2799   23 years sjboddie Fixed a bug where Word documents containing non-ascii characters …
(edit) @2797   23 years sjboddie * empty log message *
(edit) @2796   23 years sjboddie * empty log message *
(edit) @2795   23 years sjboddie Got ZIPPlug working under under windows
(edit) @2793   23 years sjboddie * empty log message *
(edit) @2785   23 years sjboddie The build process now creates a summary of how many files were …
(edit) @2781   23 years jrm21 oops - left off a '$' at end of a pattern match.
(edit) @2779   23 years jrm21 Be a little more flexible when looking for boundary field in a …
(edit) @2772   23 years kjm18 changes to enable language specific collectionmeta in collect.cfg …
(edit) @2771   23 years kjm18 updated this to include the browselist/doclist stuff thats now in …
(edit) @2761   23 years sjboddie added HTMLPlug2 temporarily while testing a new extract_subsections option
(edit) @2755   23 years jrm21 import.pl now takes an option for saving file conversion failures to a …
(edit) @2754   23 years jrm21 oops - left a debugging statement in there.
(edit) @2751   23 years sjboddie Had a go at enriching the default document structure. Added …
(edit) @2735   23 years sjboddie Fixed up bugs I introduced with recent change to BasPlug
(edit) @2734   23 years sjboddie Chinese text segmentation is now done whenever language="zh" instead …
(edit) @2733   23 years jrm21 minor regex fixes/improvements.
(edit) @2732   23 years jrm21 needed <pre> tags when using the text/plain part of a multipart message.
(edit) @2730   23 years jrm21 1) Non-ascii characters should now work for any encoding handled by …
(edit) @2717   23 years jrm21 Do some email munging - @ symbols become &#64;. Both netscape and IE …
(edit) @2713   23 years sjboddie * empty log message *
(edit) @2711   23 years sjboddie Removed the "beta" collect.cfg option to avoid awkward questions from …
(edit) @2700   23 years cs025 fixed this up for building under windows
(edit) @2695   23 years jrm21 Allow spaces in img src=... tags if surrounded with dbl quotes.
(edit) @2685   23 years jrm21 Improved regex for when the last category is too small, and we need to …
(edit) @2681   23 years jrm21 fixed a few more minor MIME header parsing cases.
(edit) @2680   23 years jrm21 1. we escape 'and' chars in headers so greenstone doesn't try to …
(edit) @2667   23 years jrm21 protect against < and > chars, as <pre> tags don't preserve them.
(edit) @2666   23 years jrm21 Modified phind classifier so that special delimiters are always …
(edit) @2662   23 years jrm21 oops, that's a bit stupid (of me) - changed: if …
(edit) @2661   23 years jrm21 added a default block exp of "" so it doesn't inherit HTMLPlugs…
(edit) @2658   23 years jrm21 fixed a typo
(edit) @2657   23 years jrm21 fixed a bug when #including a macro (ie no "... or <... on the line)
(edit) @2652   23 years jrm21 Needed to replace \s with s. Also checked for multipart/related.
(edit) @2638   23 years jrm21 typo in regexp broke import... encoding type should have had [\s], …
(edit) @2632   23 years jrm21 added an option "-bymonth=1", to group by (eg) 2000-January, …
(edit) @2631   23 years jrm21 Don't assume funny dates are 20th C - eg 101 -> 19101 - add to 1900 …
(edit) @2630   23 years jrm21 Mime support for multipart messages. Doesn't extract attachments …
(edit) @2604   23 years jrm21 when extracting email addresses, we now include people in the .net …
(edit) @2601   23 years jrm21 modified usage to not mention HTMLplug blocking rtf.
(edit) @2576   23 years sjboddie Moved phind's stopword directory from etc to etc/packages/phind
(edit) @2564   23 years jrm21 Added RTFPlug. (It's the smallest one so far - 1511 bytes - yay!) …
(edit) @2539   23 years sjboddie * empty log message *
(edit) @2529   23 years sjboddie added quoting to system calls in phind classifier - needed when …
(edit) @2525   23 years kjm18 removed unneeded output
(edit) @2516   23 years sjboddie * empty log message *
(edit) @2515   23 years sjboddie Fixed a couple of bugs/inconsistencies in word and pdf plugins that …
(edit) @2510   23 years sjboddie renamed phind's stopwords directory and contents to use Win3.1 …
(edit) @2509   23 years sjboddie Fixed (bypassed really) a problem with the phind classifier on windows …
(edit) @2507   23 years sjboddie Tidied up the phind client a little more. It now belongs to the …
(edit) @2506   23 years dmm9 added writing of collection document list to db (OID browselist)
(edit) @2505   23 years dmm9 added collection of collection document list
(edit) @2503   23 years sjboddie fixed a small bug in the datelist classifier that caused year ranges …
(edit) @2500   23 years sjboddie Removed test for phindcgi from phind classifier as it is no longer used
(edit) @2493   23 years paynter Changed at the request of Marcio - see mailing list.
(edit) @2492   23 years paynter Fixed trivial bug in the new set_OID function.
(edit) @2489   23 years dmm9 adding the browse interface as a classifier option
(edit) @2487   23 years sjboddie Changes to get phind working under windows
(edit) @2484   23 years say1 Changed SplitPlug to allow control over the OID. Changed BibTexPlug to …
(edit) @2483   23 years say1 added a "if" to catch the case where someone tries to convert an …
(edit) @2481   23 years kjm18 changed mgpp system calls to use the new executable names
(edit) @2480   23 years kjm18 added the store_text option as done in mgbuildproc.pm
(edit) @2479   23 years kjm18 added indexmap and indexfieldmap to build.cfg fields
(edit) @2478   23 years kjm18 brought it in line with changes to buildcol.pl, mgbuilder.pm now uses …
(edit) @2453   23 years jrm21 Slightly smarter title extraction from body's text.
(edit) @2452   23 years jrm21 -title_sub works now -- previously had a leading "--" argument, which …
(edit) @2451   23 years jrm21 PSPlug now uses the -title_sub option to TEXTPlug, to remove any …
(edit) @2450   23 years jrm21 now accepts the "-title_sub" option, a regexp to remove when …
(edit) @2432   23 years say1 switched the order of removing the symbolic link and checking for …
(edit) @2412   23 years sjboddie Added a tar archive of all the perl modules required to make ping.pl work
(edit) @2364   23 years jrm21 turn "\" into " " so that we don't lose backslashes along the way…
(edit) @2363   23 years jrm21 fixed nasty bug where </srclink></a><srclink> was being matched …
(edit) @2359   23 years sjboddie Altered the help text a little for mkcol.pl, import.pl, buildcol.pl, …
(edit) @2356   23 years sjboddie Renamed HBSPlug BookPlug in the hope that it's a little less crytic
(edit) @2355   23 years sjboddie All options to import.pl and buildcol.pl may now be specified from …
(edit) @2342   23 years sjboddie renamed HTMLPlug's w3mir option to file_is_url
(edit) @2336   23 years sjboddie added a -no_text option to buildcol.pl to allow collections to be …
(edit) @2333   23 years kjm18 closed all filehandles that had remained open, to fix the bug that was …
(edit) @2327   23 years sjboddie * empty log message *
(edit) @2326   23 years sjboddie fixed a small bug in the new XML gml code that caused metadata tags …
(edit) @2267   23 years davidb GML file syntax altered to be XML compliant. This basically meant …
(edit) @2241   23 years sjboddie Tidied up the ConvertToPlug stuff to get it working on Windows 95/98
(edit) @2237   23 years sjboddie Added a unicode2koi8r function to unicode.pm (because I needed one). …
(edit) @2235   23 years sjboddie Hacked the textcat package about so that it only reads all the …
(edit) @2230   23 years paynter User can erquest a "Screen" image - essentially a second thumbnail. …
(edit) @2228   23 years paynter The -use_metadata_files option tells RecPlug to read any metadata XML …
(edit) @2226   23 years paynter Image size metadata fixed, dummy text added, Image filename fixed.
(edit) @2224   23 years paynter When the document has associated files, a metadata element …
Note: See TracRevisionLog for help on using the revision log.