source: trunk/gsdl/perllib/plugins/BasPlug.pm

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @8908   19 years davidb BasPlug now sets a piece of metadata [hascover] if document has a …
(edit) @8892   19 years davidb Addition of new minus option to BasPlug: -associate_ext. This new …
(edit) @8818   19 years mdewsnip Title tags over multiple lines will now be removed correctly before …
(edit) @8814   19 years mdewsnip Updated files for Kea 3.0, thanks to Olena.
(edit) @8789   19 years mdewsnip Better documentation of the extract keyphrases (Kea) code, thanks to Olena.
(edit) @8761   19 years mdewsnip XML plugin descriptions now include an <Explodes> tag that records …
(edit) @8716   19 years kjdon added some changes made by Emanuel Dejanu (Simple Words)
(edit) @8678   19 years kjdon cover images are now turned on by default, and the option is changed …
(edit) @8510   19 years chi Add a new method metadat_read to deal with specific (or external) …
(edit) @8166   20 years mdewsnip Added FileSize metadata in most plugins.
(edit) @7818   20 years jrm21 improvements to the handling of textcat's guessed encoding
(edit) @7668   20 years jrm21 renamed "kea" to "Keyphrase" metadata, and add one for each extracted …
(edit) @7645   20 years jrm21 don't fail if we can't load the diagnostics package.
(edit) @7644   20 years jrm21 don't print "wrong encoding" message for text in english. textcat …
(edit) @7508   20 years kjdon changed the plugin metadata - instead of having eg HTMLPlug metadata …
(edit) @7504   20 years davidb ImagePlug, MP3Plug, UnknownPlug modified to set Title metadata based …
(edit) @7362   20 years kjdon plugin read functions now return 'undef' - didn't recognise, '-1' - …
(edit) @7105   20 years kjdon changed the max century arg to a string instead of an int - need to be …
(edit) @7023   20 years kjdon fixed up the <tag> display for pluginfo and clasinfo. < and > should …
(edit) @6987   20 years mdewsnip Missed changing some print()s to gsprintf()s.
(edit) @6945   20 years mdewsnip Updated the resource bundle handling code some more. Strings are first …
(edit) @6932   20 years kjdon changed the output slightly, and now outputs the classifier/plugin …
(edit) @6925   20 years mdewsnip Changed the way display in different languages is done. Instead of …
(edit) @6918   20 years mdewsnip Removed some code I commented out.
(edit) @6584   20 years kjdon Fiddled around with segmenting for chinese text. Haven't changed how …
(edit) @6408   20 years jmt12 Added two new attributes for script arguments. HiddenGLI controls …
(edit) @6332   20 years jmt12 When -gli argument is provided to calling script these modules will …
(edit) @5924   20 years kjdon changed the new metadata to eg WordPlug instead of Word, cos a clash …
(edit) @5919   20 years kjdon each plugin now adds a metadata field to teh doc obj based on the …
(edit) @5681   21 years mdewsnip Rewritten option display code (used by all plugins) to use the new …
(edit) @4873   21 years mdewsnip Further work on standardising option descriptions. Specifically, in …
(edit) @4845   21 years jrm21 use add_metadata instead of add_utf8_metadata for Source and URL …
(edit) @4785   21 years mdewsnip Commented out print_usage functions - plugins should now call …
(edit) @4778   21 years mdewsnip Modified the code for generating the usage texts to use the methods in …
(edit) @4764   21 years mdewsnip Replaced call to removed function print_generic_usage() with a call to …
(edit) @4750   21 years mdewsnip Improved formatting of usage texts automatically generated from John's …
(edit) @4746   21 years mdewsnip Initial attempt at a generic print usage function which works with the …
(edit) @4744   21 years mdewsnip Tidied up and structures (representing the options of the plugin) in …
(edit) @3834   21 years sjboddie Prevent "use bytes" from causing errors for older perls
(edit) @3767   21 years sjboddie Scattered some "use bytes" pragmas around to try to prevent perl-5.8 …
(edit) @3731   21 years jrm21 If textcat returns too many possibilities, use the default language …
(edit) @3540   21 years kjdon added John T's changes into CVS - added info to enable retrieval of …
(edit) @3515   22 years jrm21 call a plugin's set_OID() method if one exists, otherwise use the …
(edit) @3427   22 years sjboddie The input encoding will now default to utf8 instead of iso-8859-1. …
(edit) @3086   22 years nzdl * empty log message *
(edit) @2835   22 years dmm9 Corrected pluginfo entry and renamed extract_date to …
(edit) @2816   23 years sjboddie Added cover_image option to BasPlug for associating a jpeg image as a …
(edit) @2811   23 years sjboddie * empty log message *
(edit) @2796   23 years sjboddie * empty log message *
(edit) @2795   23 years sjboddie Got ZIPPlug working under under windows
(edit) @2785   23 years sjboddie The build process now creates a summary of how many files were …
(edit) @2755   23 years jrm21 import.pl now takes an option for saving file conversion failures to a …
(edit) @2751   23 years sjboddie Had a go at enriching the default document structure. Added …
(edit) @2734   23 years sjboddie Chinese text segmentation is now done whenever language="zh" instead …
(edit) @2604   23 years jrm21 when extracting email addresses, we now include people in the .net …
(edit) @2601   23 years jrm21 modified usage to not mention HTMLplug blocking rtf.
(edit) @2327   23 years sjboddie * empty log message *
(edit) @2235   23 years sjboddie Hacked the textcat package about so that it only reads all the …
(edit) @2219   23 years sjboddie Had another go at suppressing the "subroutine redefined" warnings as …
(edit) @2084   23 years jrm21 usage message is now formatted to fit within 80 columns.
(edit) @1999   23 years sjboddie Fixed a small problem with language detection code.
(edit) @1954   23 years jmt14 * empty log message *
(edit) @1903   23 years sjboddie We now use textcats best guess if it returns 3 or less possibilities …
(edit) @1874   23 years sjboddie * empty log message *
(edit) @1870   23 years sjboddie Tidied up language support stuff.
(edit) @1868   23 years sjboddie Made a bunch of changes to the building code to support lots of new …
(edit) @1857   23 years dmm9 date extraction options documented
(edit) @1855   23 years paynter Trivial change to warning message.
(edit) @1846   23 years sjboddie Removed a call to a function that I removed in my previous changes - oops
(edit) @1845   23 years paynter Changed a "!=" to a "ne".
(edit) @1844   23 years sjboddie Added an 'auto' argument to BasPlug's '-input_encoding' option ('auto' …
(edit) @1838   23 years sjboddie Added support for Cyrillic languages (windows codepage 1251) - yet to …
(edit) @1756   23 years say1 added detection and handling of unreadable files
(edit) @1720   23 years dmm9 Added information to the usage text about date extraction option
(edit) @1719   23 years dmm9 Added information to the usage text about date extraction option
(edit) @1718   23 years dmm9 Added information to the usage text about date extraction option
(edit) @1686   23 years jrm21 HTMLPlug no longer blocks .pdf files. (also updated reference to this …
(edit) @1605   24 years say1 fixed some of my earlier mistakes. sorry Stefan
(edit) @1602   24 years say1 metadata extraction work. (email addresses, generalised HTML tags, …
(edit) @1424   24 years sjboddie Added a -out option to most of the perl building scripts to allow …
(edit) @1411   24 years dmm9 added the options for the date extractor
(edit) @1396   24 years say1 changed initialisation code for acronyms
(edit) @1393   24 years say1 acronym markup functionality
(edit) @1384   24 years paynter Changed language extraction to ignoer encoding information, so that …
(edit) @1379   24 years paynter Fixed bug that gave gsdlsourcedocument metadata relative path instead …
(edit) @1360   24 years say1 clarified status messages
(edit) @1335   24 years say1 many acronym changes
(edit) @1317   24 years paynter Added -extract_language option, which uses the textcat language …
(edit) @1244   24 years sjboddie Caught up most general plugins (that's the ones in …
(edit) @1242   24 years sjboddie Added Stuart Yeate's acronym extraction code and made it a standard …
(edit) @1229   24 years sjboddie fixed bug in options
(edit) @1227   24 years sjboddie Modified the perl code for importing arabic encoded documents. Plugins …
(edit) @1219   24 years sjboddie Made BasPlug take options (these options are available to all plugins …
(edit) @839   24 years davidb added extra_metadata function
(edit) @537   25 years sjboddie added GPL headers
(edit) @317   25 years sjboddie Added maxdocs option
(add) @4   25 years sjboddie Initial revision
Note: See TracRevisionLog for help on using the revision log.