source: trunk/gsdl/perllib

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @1388   24 years sjboddie fixed a bit of a bug (more of a typo really) in the recent changes …
(edit) @1384   24 years paynter Changed language extraction to ignoer encoding information, so that …
(edit) @1382   24 years paynter Less common languages moved into a subdirectory of textcat so that the …
(edit) @1379   24 years paynter Fixed bug that gave gsdlsourcedocument metadata relative path instead …
(edit) @1377   24 years paynter Added "mirror interval N" command for use with update.pl
(edit) @1374   24 years sjboddie made set_OID use original document text instead of document object
(edit) @1362   24 years say1 removed use statement so other files could be compiled with use strict …
(edit) @1361   24 years say1 rewrote recursively to handle stop words and more cases
(edit) @1360   24 years say1 clarified status messages
(edit) @1358   24 years nzdl Fixed bug I recently introduced into HTMLPlug (<pre> tags were being …
(edit) @1341   24 years paynter Licensing information for TextCat language models.
(edit) @1336   24 years say1 fixed acronym extraction so it is now runs in time linear to the …
(edit) @1335   24 years say1 many acronym changes
(edit) @1317   24 years paynter Added -extract_language option, which uses the textcat language …
(edit) @1316   24 years paynter The textcat language identification package.
(edit) @1315   24 years paynter Language models for the textcat language identification package.
(edit) @1313   24 years sjboddie Added Davids version of AZCompactList which handles multiple value metadata
(edit) @1312   24 years sjboddie fixed a bug in the HTML plugin that showed up under windows
(edit) @1304   24 years sjboddie fixed an intermittent bug (I hope) when building under windows
(edit) @1302   24 years kjm18 buildtype and indexfields added to configuration file entries. these …
(edit) @1301   24 years kjm18 building now writes 'buildtype mgpp' to build.cfg - indicates an mgpp …
(edit) @1287   24 years sjboddie Implemented a -sortmeta option for import.pl to sort archives.inf file …
(edit) @1269   24 years sjboddie Added ZIPPlug plugin for handling input documents that have been …
(edit) @1252   24 years sjboddie Building code now extracts a couple more statistics from mg and …
(edit) @1251   24 years sjboddie Added some stat reporting and a warning message to the build code. Now …
(edit) @1250   24 years sjboddie Tidied up the classfiers slightly, made them a little more object …
(edit) @1246   24 years sjboddie Now prevent "notbuilt" field from going in the build.cfg file unless …
(edit) @1245   24 years sjboddie Fixed a bug that davidb found in a couple of regular expressions
(edit) @1244   24 years sjboddie Caught up most general plugins (that's the ones in …
(edit) @1243   24 years sjboddie Caught HTMLPlug up with BasPlug. A few minor changes to some …
(edit) @1242   24 years sjboddie Added Stuart Yeate's acronym extraction code and made it a standard …
(edit) @1241   24 years sjboddie merged ascii_doc.pm and doc.pm back together (removing basedoc.pm). To …
(edit) @1240   24 years gwp Resolved conflicts between previous two versions.
(edit) @1239   24 years gwp Replaced references to @_ in subroutine parse with a new variable …
(edit) @1235   24 years nzdl * empty log message *
(edit) @1231   24 years gwp Bug fix on the H1 metadata option: if the file has no <H1> tag, …
(edit) @1230   24 years gwp Added an additional H1 metadata field that extracts the text between …
(edit) @1229   24 years sjboddie fixed bug in options
(edit) @1227   24 years sjboddie Modified the perl code for importing arabic encoded documents. Plugins …
(edit) @1225   24 years sjboddie minor change to parsargv.pm to allow for parsing of options within …
(edit) @1224   24 years sjboddie added handling of arabic encoding and ability to read in an entire …
(edit) @1223   24 years sjboddie added an arabic2unicode conversion function to unicode.pm
(edit) @1222   24 years sjboddie changed some ghtml.pm regular expressions to handle multiline strings
(edit) @1221   24 years sjboddie Added a new HBSPlug which is kind of a generalisation of HBPlug …
(edit) @1220   24 years sjboddie Caught HTMLPlug up with the changes I made to BasPlug. HTMLPlug now …
(edit) @1219   24 years sjboddie Made BasPlug take options (these options are available to all plugins …
(edit) @1218   24 years sjboddie fixed bug in gb.pm preventing gb encoding text from being translated …
(edit) @1206   24 years gwp A thorough rewrite; some of the metadata was flawed in such a way that …
(edit) @1204   24 years gwp updated htmlsafe to substitue quotes with &quot;
(edit) @1190   24 years gwp The first 200 chars of body text can now be extracted as metadata by …
(edit) @1181   24 years sjboddie got end-user collection building to work (almost) on windows 95. …
(edit) @1178   24 years sjboddie modified perl dmsafe function to handle backslashes
(edit) @1086   24 years sjboddie Added AZCompactList.pm to distribution (and altered List.pm slightly …
(edit) @1072   24 years sjboddie Fixed bug - Control B's and C's were only being removed from body of …
(edit) @1046   24 years sjboddie added comment to make me feel better for having spent an hour testing …
(edit) @1044   24 years nzdl don't output doctype field to gdbm if document already has metadata …
(edit) @1020   24 years sjboddie changed paths to collection images (again!)
(edit) @1010   24 years sjboddie renamed old html module ghtml -- it clashed with builtin html module …
(edit) @1006   24 years sjboddie fixed but in previous changes
(edit) @983   24 years sjboddie link() function isn't supported on windows - use copy
(edit) @973   24 years sjboddie new path to images
(edit) @965   24 years sjboddie fixed bug - added assoc_files option
(edit) @932   24 years kjm18 new building programs for mgpp added
(edit) @918   24 years kjm18 fixed bug where it was creating two doc_obj per file instead of just one.
(edit) @900   24 years sjboddie tweaked the way associated files are handled at build time - some …
(edit) @899   24 years sjboddie small change to doc data structure to allow for some hacking in WebPlug
(edit) @898   24 years sjboddie fixed small bug (groupsize had no default)
(edit) @897   24 years sjboddie lots of stuff
(edit) @863   24 years sjboddie fixed a couple of bugs that I introduced when including Davids stuff
(edit) @862   24 years sjboddie fixed a couple of bugs that were preventing muliple document gml files …
(edit) @850   24 years sjboddie added use strict - tidied a few things up etc.
(edit) @849   24 years sjboddie Fixed a bit of a bug
(edit) @847   24 years sjboddie fixed CVS burp
(edit) @846   24 years sjboddie don't use hashdoc for now
(edit) @842   24 years davidb base object for 'doc' objects (UTF8 or ASCII)
(edit) @840   24 years davidb Optimisations to make plugin go faster
(edit) @839   24 years davidb added extra_metadata function
(edit) @838   24 years davidb added options passed into 'new' subroutine
(edit) @837   24 years davidb added alpha_numeric search
(edit) @836   24 years davidb improvements to utils
(edit) @835   24 years davidb added 'begin' and 'end' function for plugins
(edit) @834   24 years davidb 'groupsize' added
(edit) @833   24 years davidb new doc type for ascii only documents (lots faster the doc.pm)
(edit) @832   24 years davidb Object modified to have basedoc
(edit) @831   24 years davidb added support for multiple metavales for a metadata type
(edit) @813   24 years sjboddie plugins now take options and classifiers are handled properly
(edit) @812   24 years sjboddie hard_link returns if link destination already exists
(edit) @811   24 years sjboddie classifiers are loaded up more like plugins
(edit) @810   24 years sjboddie plugins now take options, files are associated at build time as well …
(edit) @809   24 years sjboddie plugins now take options, maxdocs is always defined
(edit) @808   24 years sjboddie New html plugin with options
(edit) @796   24 years sjboddie semi-colon;;;;
(edit) @784   24 years sjboddie added -keepold option
(edit) @782   24 years sjboddie removed gettext.pl - added debug, mode and index options to …
(edit) @780   24 years sjboddie added dontgdbm configuration option
(edit) @779   24 years sjboddie fixed bug in title option
(edit) @775   24 years sjboddie urlsafe now converts '/' characters
(edit) @741   25 years sjboddie fixed up a bit of a bug - should fix this properly some time
(edit) @740   25 years sjboddie windows specific bug
(edit) @734   25 years sjboddie removed old out of date comments
Note: See TracRevisionLog for help on using the revision log.