source: trunk/gsdl/perllib/plugins/BasPlug.pm

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @13968   17 years kjdon Added a new option to HTMLPlug (tidy_html) - if set, will use HTMLTidy …
(edit) @12970   18 years kjdon set_keepold and self->{'keepold'} have been changed to set_incremental …
(edit) @12630   18 years mdewsnip Plugins now output a <Processes> and <Blocks> information field, so …
(edit) @12624   18 years mdewsnip Added an extra argument to print_xml_usage for specifying whether to …
(edit) @12546   18 years kjdon changed parse2::parse so that it returns -1 on error, 0 on success, or …
(edit) @12270   18 years kjdon set_OIDtype now takes two arguments, the type and the metadata (used …
(edit) @11966   18 years mdewsnip (Profiling) Creating new textcat objects (one for each plugin) is …
(edit) @11880   18 years kjdon added a #" to line 1100 so that emacs colouring is not stuffed up
(edit) @11834   18 years mdewsnip Replaced all "_httpcollection_" in metadata (especially srclink) with …
(edit) @11681   18 years kjdon print_xml_usage and print_xml_header now take arguments
(edit) @11669   18 years kjdon need to pass a parameter to print_xml_header so it knows which DTD to …
(edit) @11389   18 years jrm21 try to get the encoding from a '<meta http-equiv' tag if HTML. make …
(edit) @11368   18 years kjdon For some reason smart_block was hidden for gli, so made it visible
(edit) @11332   18 years mdewsnip Added a mechanism for plugins to do tidying up after exploding. …
(edit) @11122   18 years davidb Introduction of -associate_tail_re option to BasPlug. This is a …
(edit) @11089   18 years kjdon removed a couple of unnecessary bits of code like repeated arguments, …
(edit) @11069   18 years mdewsnip Added an option to use Kea 4.0 -- this isn't included with Greenstone, …
(edit) @11044   18 years mdewsnip The "-extract_keyphrase" and "-extract_keyphrase_options" arguments …
(edit) @10833   18 years jrm21 store the names of files we've already checked when looking for a …
(edit) @10620   19 years kjdon now prints out some gli tags when bad args are encountered for plugins …
(edit) @10579   19 years kjdon copied classify.pm and BasClas.pm, added -gsdlinfo flag - if this is …
(edit) @10478   19 years kjdon arcPlug now knows about keepold, and if its not set, it wont try to do …
(edit) @10446   19 years chi Modifications for converting windows-1252 to windows_1252.
(edit) @10442   19 years chi To retrieve encoding information for the HTML file generated from …
(edit) @10347   19 years kjdon removed the unneeded 'use parsargv'
(edit) @10329   19 years mdewsnip Changed the default_language string to be of type "string", since …
(edit) @10280   19 years chi Some major changes to allow secondary plugin setting.
(edit) @10254   19 years kjdon added 'use strict' to all plugins, and made modifications (mostly …
(edit) @10229   19 years kjdon fixed up some stuff for printing args (pluginfo.pl, classinfo.pl)
(edit) @10218   19 years kjdon Jeffrey's new parsing modifications, committed approx 6 July, 15.16
(edit) @10155   19 years davidb deinit subroutine added that balances out init routine. 'init' called …
(edit) @9961   19 years davidb Minor refinement made to print statements (warnings) generated by BasPlug.
(edit) @9853   19 years kjdon fixed up maxdocs - now pass an extra parameter to the read function
(edit) @9703   19 years mdewsnip Improvement to previous change so "file not processed" messages are …
(edit) @9586   19 years mdewsnip Added a ProcessingError message so the GLI knows when a file failed to …
(edit) @9584   19 years mdewsnip Plugins that return -1 from their read function now must output the …
(edit) @9413   19 years jrm21 if we are trying to automatically determine the encoding, look for a …
(edit) @9403   19 years jrm21 need to 'bless' an object before you can call functions in it (for …
(edit) @9398   19 years davidb Introduction of GISBasPlug for Geographic Informatoin System support. …
(edit) @9351   19 years davidb Two changes: 1. Fusing files with the same root filename is meant …
(edit) @9067   19 years kjdon moved smart blocking stuff in htmlplug metadata_read into basplug …
(edit) @8915   19 years chi Add an option-smart_block_BN for BN Portugal Collection.
(edit) @8908   19 years davidb BasPlug now sets a piece of metadata [hascover] if document has a …
(edit) @8892   19 years davidb Addition of new minus option to BasPlug: -associate_ext. This new …
(edit) @8818   19 years mdewsnip Title tags over multiple lines will now be removed correctly before …
(edit) @8814   19 years mdewsnip Updated files for Kea 3.0, thanks to Olena.
(edit) @8789   19 years mdewsnip Better documentation of the extract keyphrases (Kea) code, thanks to Olena.
(edit) @8761   19 years mdewsnip XML plugin descriptions now include an <Explodes> tag that records …
(edit) @8716   19 years kjdon added some changes made by Emanuel Dejanu (Simple Words)
(edit) @8678   19 years kjdon cover images are now turned on by default, and the option is changed …
(edit) @8510   19 years chi Add a new method metadat_read to deal with specific (or external) …
(edit) @8166   20 years mdewsnip Added FileSize metadata in most plugins.
(edit) @7818   20 years jrm21 improvements to the handling of textcat's guessed encoding
(edit) @7668   20 years jrm21 renamed "kea" to "Keyphrase" metadata, and add one for each extracted …
(edit) @7645   20 years jrm21 don't fail if we can't load the diagnostics package.
(edit) @7644   20 years jrm21 don't print "wrong encoding" message for text in english. textcat …
(edit) @7508   20 years kjdon changed the plugin metadata - instead of having eg HTMLPlug metadata …
(edit) @7504   20 years davidb ImagePlug, MP3Plug, UnknownPlug modified to set Title metadata based …
(edit) @7362   20 years kjdon plugin read functions now return 'undef' - didn't recognise, '-1' - …
(edit) @7105   20 years kjdon changed the max century arg to a string instead of an int - need to be …
(edit) @7023   20 years kjdon fixed up the <tag> display for pluginfo and clasinfo. < and > should …
(edit) @6987   20 years mdewsnip Missed changing some print()s to gsprintf()s.
(edit) @6945   20 years mdewsnip Updated the resource bundle handling code some more. Strings are first …
(edit) @6932   20 years kjdon changed the output slightly, and now outputs the classifier/plugin …
(edit) @6925   20 years mdewsnip Changed the way display in different languages is done. Instead of …
(edit) @6918   20 years mdewsnip Removed some code I commented out.
(edit) @6584   20 years kjdon Fiddled around with segmenting for chinese text. Haven't changed how …
(edit) @6408   20 years jmt12 Added two new attributes for script arguments. HiddenGLI controls …
(edit) @6332   20 years jmt12 When -gli argument is provided to calling script these modules will …
(edit) @5924   20 years kjdon changed the new metadata to eg WordPlug instead of Word, cos a clash …
(edit) @5919   20 years kjdon each plugin now adds a metadata field to teh doc obj based on the …
(edit) @5681   21 years mdewsnip Rewritten option display code (used by all plugins) to use the new …
(edit) @4873   21 years mdewsnip Further work on standardising option descriptions. Specifically, in …
(edit) @4845   21 years jrm21 use add_metadata instead of add_utf8_metadata for Source and URL …
(edit) @4785   21 years mdewsnip Commented out print_usage functions - plugins should now call …
(edit) @4778   21 years mdewsnip Modified the code for generating the usage texts to use the methods in …
(edit) @4764   21 years mdewsnip Replaced call to removed function print_generic_usage() with a call to …
(edit) @4750   21 years mdewsnip Improved formatting of usage texts automatically generated from John's …
(edit) @4746   21 years mdewsnip Initial attempt at a generic print usage function which works with the …
(edit) @4744   21 years mdewsnip Tidied up and structures (representing the options of the plugin) in …
(edit) @3834   21 years sjboddie Prevent "use bytes" from causing errors for older perls
(edit) @3767   21 years sjboddie Scattered some "use bytes" pragmas around to try to prevent perl-5.8 …
(edit) @3731   21 years jrm21 If textcat returns too many possibilities, use the default language …
(edit) @3540   21 years kjdon added John T's changes into CVS - added info to enable retrieval of …
(edit) @3515   21 years jrm21 call a plugin's set_OID() method if one exists, otherwise use the …
(edit) @3427   22 years sjboddie The input encoding will now default to utf8 instead of iso-8859-1. …
(edit) @3086   22 years nzdl * empty log message *
(edit) @2835   22 years dmm9 Corrected pluginfo entry and renamed extract_date to …
(edit) @2816   22 years sjboddie Added cover_image option to BasPlug for associating a jpeg image as a …
(edit) @2811   22 years sjboddie * empty log message *
(edit) @2796   23 years sjboddie * empty log message *
(edit) @2795   23 years sjboddie Got ZIPPlug working under under windows
(edit) @2785   23 years sjboddie The build process now creates a summary of how many files were …
(edit) @2755   23 years jrm21 import.pl now takes an option for saving file conversion failures to a …
(edit) @2751   23 years sjboddie Had a go at enriching the default document structure. Added …
(edit) @2734   23 years sjboddie Chinese text segmentation is now done whenever language="zh" instead …
(edit) @2604   23 years jrm21 when extracting email addresses, we now include people in the .net …
(edit) @2601   23 years jrm21 modified usage to not mention HTMLplug blocking rtf.
(edit) @2327   23 years sjboddie * empty log message *
(edit) @2235   23 years sjboddie Hacked the textcat package about so that it only reads all the …
Note: See TracRevisionLog for help on using the revision log.