source: trunk/gsdl/perllib

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @1483   24 years sjboddie added -out option to classifiers
(edit) @1482   24 years davidb Small modification so Index files can be in subdirectories of an …
(edit) @1467   24 years dmm9 pre-Christian date support
(edit) @1454   24 years stefan Lots of changes to perl building code for collectoraction
(edit) @1448   24 years paynter Changed regular expressions for extracting metadata from META tags …
(edit) @1446   24 years paynter Major overhauls; works with the new gsConvert.pl instead of …
(edit) @1442   24 years dmm9 date->Coverage
(edit) @1436   24 years davidb Due to rearrangement of ConvertTo hierarchy, this file is now redundant.
(edit) @1435   24 years davidb Rearrangement of ConvertTo inheritence so HTMLPlug and TextPlug do not …
(edit) @1431   24 years sjboddie Made a few minor adjustments to perl building code for use with …
(edit) @1424   24 years sjboddie Added a -out option to most of the perl building scripts to allow …
(edit) @1420   24 years davidb Moved read_file and read from ConvertToBasPlug to ConvertToPlug.
(edit) @1418   24 years davidb Small modification to improve handling of file names with spaces in.
(edit) @1417   24 years davidb Additions so ConvertPlug etc. can handle filenames with spaces in them.
(edit) @1415   24 years davidb Removed some diagnostic print statements.
(edit) @1412   24 years dmm9 adding the date extractor
(edit) @1411   24 years dmm9 added the options for the date extractor
(edit) @1410   24 years davidb Introduction of "ConvertTo" family of plugins. This establishes a new …
(edit) @1405   24 years say1 fixed acronym bugs
(edit) @1404   24 years say1 fixed acronyms option file. trimmed text at start of bibliographies to …
(edit) @1403   24 years say1 taught HTMLPlug about shtml, asp, cgi, php and html query files …
(edit) @1401   24 years davidb Fixed small problem with associated files.
(edit) @1400   24 years davidb General tidying of code.
(edit) @1396   24 years say1 changed initialisation code for acronyms
(edit) @1393   24 years say1 acronym markup functionality
(edit) @1388   24 years sjboddie fixed a bit of a bug (more of a typo really) in the recent changes …
(edit) @1384   24 years paynter Changed language extraction to ignoer encoding information, so that …
(edit) @1382   24 years paynter Less common languages moved into a subdirectory of textcat so that the …
(edit) @1379   24 years paynter Fixed bug that gave gsdlsourcedocument metadata relative path instead …
(edit) @1377   24 years paynter Added "mirror interval N" command for use with update.pl
(edit) @1374   24 years sjboddie made set_OID use original document text instead of document object
(edit) @1362   24 years say1 removed use statement so other files could be compiled with use strict …
(edit) @1361   24 years say1 rewrote recursively to handle stop words and more cases
(edit) @1360   24 years say1 clarified status messages
(edit) @1358   24 years nzdl Fixed bug I recently introduced into HTMLPlug (<pre> tags were being …
(edit) @1341   24 years paynter Licensing information for TextCat language models.
(edit) @1336   24 years say1 fixed acronym extraction so it is now runs in time linear to the …
(edit) @1335   24 years say1 many acronym changes
(edit) @1317   24 years paynter Added -extract_language option, which uses the textcat language …
(edit) @1316   24 years paynter The textcat language identification package.
(edit) @1315   24 years paynter Language models for the textcat language identification package.
(edit) @1313   24 years sjboddie Added Davids version of AZCompactList which handles multiple value metadata
(edit) @1312   24 years sjboddie fixed a bug in the HTML plugin that showed up under windows
(edit) @1304   24 years sjboddie fixed an intermittent bug (I hope) when building under windows
(edit) @1302   24 years kjm18 buildtype and indexfields added to configuration file entries. these …
(edit) @1301   24 years kjm18 building now writes 'buildtype mgpp' to build.cfg - indicates an mgpp …
(edit) @1287   24 years sjboddie Implemented a -sortmeta option for import.pl to sort archives.inf file …
(edit) @1269   24 years sjboddie Added ZIPPlug plugin for handling input documents that have been …
(edit) @1252   24 years sjboddie Building code now extracts a couple more statistics from mg and …
(edit) @1251   24 years sjboddie Added some stat reporting and a warning message to the build code. Now …
(edit) @1250   24 years sjboddie Tidied up the classfiers slightly, made them a little more object …
(edit) @1246   24 years sjboddie Now prevent "notbuilt" field from going in the build.cfg file unless …
(edit) @1245   24 years sjboddie Fixed a bug that davidb found in a couple of regular expressions
(edit) @1244   24 years sjboddie Caught up most general plugins (that's the ones in …
(edit) @1243   24 years sjboddie Caught HTMLPlug up with BasPlug. A few minor changes to some …
(edit) @1242   24 years sjboddie Added Stuart Yeate's acronym extraction code and made it a standard …
(edit) @1241   24 years sjboddie merged ascii_doc.pm and doc.pm back together (removing basedoc.pm). To …
(edit) @1240   24 years gwp Resolved conflicts between previous two versions.
(edit) @1239   24 years gwp Replaced references to @_ in subroutine parse with a new variable …
(edit) @1235   24 years nzdl * empty log message *
(edit) @1231   24 years gwp Bug fix on the H1 metadata option: if the file has no <H1> tag, …
(edit) @1230   24 years gwp Added an additional H1 metadata field that extracts the text between …
(edit) @1229   24 years sjboddie fixed bug in options
(edit) @1227   24 years sjboddie Modified the perl code for importing arabic encoded documents. Plugins …
(edit) @1225   24 years sjboddie minor change to parsargv.pm to allow for parsing of options within …
(edit) @1224   24 years sjboddie added handling of arabic encoding and ability to read in an entire …
(edit) @1223   24 years sjboddie added an arabic2unicode conversion function to unicode.pm
(edit) @1222   24 years sjboddie changed some ghtml.pm regular expressions to handle multiline strings
(edit) @1221   24 years sjboddie Added a new HBSPlug which is kind of a generalisation of HBPlug …
(edit) @1220   24 years sjboddie Caught HTMLPlug up with the changes I made to BasPlug. HTMLPlug now …
(edit) @1219   24 years sjboddie Made BasPlug take options (these options are available to all plugins …
(edit) @1218   24 years sjboddie fixed bug in gb.pm preventing gb encoding text from being translated …
(edit) @1206   24 years gwp A thorough rewrite; some of the metadata was flawed in such a way that …
(edit) @1204   24 years gwp updated htmlsafe to substitue quotes with &quot;
(edit) @1190   24 years gwp The first 200 chars of body text can now be extracted as metadata by …
(edit) @1181   24 years sjboddie got end-user collection building to work (almost) on windows 95. …
(edit) @1178   24 years sjboddie modified perl dmsafe function to handle backslashes
(edit) @1086   24 years sjboddie Added AZCompactList.pm to distribution (and altered List.pm slightly …
(edit) @1072   24 years sjboddie Fixed bug - Control B's and C's were only being removed from body of …
(edit) @1046   24 years sjboddie added comment to make me feel better for having spent an hour testing …
(edit) @1044   24 years nzdl don't output doctype field to gdbm if document already has metadata …
(edit) @1020   24 years sjboddie changed paths to collection images (again!)
(edit) @1010   24 years sjboddie renamed old html module ghtml -- it clashed with builtin html module …
(edit) @1006   24 years sjboddie fixed but in previous changes
(edit) @983   24 years sjboddie link() function isn't supported on windows - use copy
(edit) @973   24 years sjboddie new path to images
(edit) @965   24 years sjboddie fixed bug - added assoc_files option
(edit) @932   24 years kjm18 new building programs for mgpp added
(edit) @918   24 years kjm18 fixed bug where it was creating two doc_obj per file instead of just one.
(edit) @900   24 years sjboddie tweaked the way associated files are handled at build time - some …
(edit) @899   24 years sjboddie small change to doc data structure to allow for some hacking in WebPlug
(edit) @898   24 years sjboddie fixed small bug (groupsize had no default)
(edit) @897   24 years sjboddie lots of stuff
(edit) @863   24 years sjboddie fixed a couple of bugs that I introduced when including Davids stuff
(edit) @862   24 years sjboddie fixed a couple of bugs that were preventing muliple document gml files …
(edit) @850   24 years sjboddie added use strict - tidied a few things up etc.
(edit) @849   24 years sjboddie Fixed a bit of a bug
(edit) @847   24 years sjboddie fixed CVS burp
(edit) @846   24 years sjboddie don't use hashdoc for now
(edit) @842   24 years davidb base object for 'doc' objects (UTF8 or ASCII)
Note: See TracRevisionLog for help on using the revision log.