|
|
@1362
|
24 years |
say1 |
removed use statement so other files could be compiled with use strict …
|
|
|
@1361
|
24 years |
say1 |
rewrote recursively to handle stop words and more cases
|
|
|
@1360
|
24 years |
say1 |
clarified status messages
|
|
|
@1358
|
24 years |
nzdl |
Fixed bug I recently introduced into HTMLPlug (<pre> tags were being …
|
|
|
@1341
|
24 years |
paynter |
Licensing information for TextCat language models.
|
|
|
@1336
|
24 years |
say1 |
fixed acronym extraction so it is now runs in time linear to the …
|
|
|
@1335
|
24 years |
say1 |
many acronym changes
|
|
|
@1317
|
24 years |
paynter |
Added -extract_language option, which uses the textcat language …
|
|
|
@1316
|
24 years |
paynter |
The textcat language identification package.
|
|
|
@1315
|
24 years |
paynter |
Language models for the textcat language identification package.
|
|
|
@1313
|
24 years |
sjboddie |
Added Davids version of AZCompactList which handles multiple value
metadata
|
|
|
@1312
|
24 years |
sjboddie |
fixed a bug in the HTML plugin that showed up under windows
|
|
|
@1304
|
24 years |
sjboddie |
fixed an intermittent bug (I hope) when building under windows
|
|
|
@1302
|
24 years |
kjm18 |
buildtype and indexfields added to configuration file entries. these …
|
|
|
@1301
|
24 years |
kjm18 |
building now writes 'buildtype mgpp' to build.cfg - indicates an mgpp …
|
|
|
@1287
|
24 years |
sjboddie |
Implemented a -sortmeta option for import.pl to sort archives.inf file …
|
|
|
@1269
|
24 years |
sjboddie |
Added ZIPPlug plugin for handling input documents that have been …
|
|
|
@1252
|
24 years |
sjboddie |
Building code now extracts a couple more statistics from mg and …
|
|
|
@1251
|
24 years |
sjboddie |
Added some stat reporting and a warning message to the build code.
Now …
|
|
|
@1250
|
24 years |
sjboddie |
Tidied up the classfiers slightly, made them a little more object …
|
|
|
@1246
|
24 years |
sjboddie |
Now prevent "notbuilt" field from going in the build.cfg file unless …
|
|
|
@1245
|
24 years |
sjboddie |
Fixed a bug that davidb found in a couple of regular expressions
|
|
|
@1244
|
24 years |
sjboddie |
Caught up most general plugins (that's the ones in …
|
|
|
@1243
|
24 years |
sjboddie |
Caught HTMLPlug up with BasPlug. A few minor changes to some …
|
|
|
@1242
|
24 years |
sjboddie |
Added Stuart Yeate's acronym extraction code and made it a standard …
|
|
|
@1241
|
24 years |
sjboddie |
merged ascii_doc.pm and doc.pm back together (removing basedoc.pm). To …
|
|
|
@1240
|
24 years |
gwp |
Resolved conflicts between previous two versions.
|
|
|
@1239
|
24 years |
gwp |
Replaced references to @_ in subroutine parse with a new variable …
|
|
|
@1235
|
24 years |
nzdl |
* empty log message *
|
|
|
@1231
|
24 years |
gwp |
Bug fix on the H1 metadata option: if the file has no <H1> tag, …
|
|
|
@1230
|
24 years |
gwp |
Added an additional H1 metadata field that extracts the text
between …
|
|
|
@1229
|
24 years |
sjboddie |
fixed bug in options
|
|
|
@1227
|
24 years |
sjboddie |
Modified the perl code for importing arabic encoded documents. Plugins …
|
|
|
@1225
|
24 years |
sjboddie |
minor change to parsargv.pm to allow for parsing of options within …
|
|
|
@1224
|
24 years |
sjboddie |
added handling of arabic encoding and ability to read in an entire …
|
|
|
@1223
|
24 years |
sjboddie |
added an arabic2unicode conversion function to unicode.pm
|
|
|
@1222
|
24 years |
sjboddie |
changed some ghtml.pm regular expressions to handle multiline strings
|
|
|
@1221
|
24 years |
sjboddie |
Added a new HBSPlug which is kind of a generalisation of HBPlug …
|
|
|
@1220
|
24 years |
sjboddie |
Caught HTMLPlug up with the changes I made to BasPlug. HTMLPlug now …
|
|
|
@1219
|
24 years |
sjboddie |
Made BasPlug take options (these options are available to all plugins …
|
|
|
@1218
|
24 years |
sjboddie |
fixed bug in gb.pm preventing gb encoding text from being translated …
|
|
|
@1206
|
24 years |
gwp |
A thorough rewrite; some of the metadata was flawed in such a way
that …
|
|
|
@1204
|
24 years |
gwp |
updated htmlsafe to substitue quotes with "
|
|
|
@1190
|
24 years |
gwp |
The first 200 chars of body text can now be extracted as metadata
by …
|
|
|
@1181
|
24 years |
sjboddie |
got end-user collection building to work (almost) on windows 95. …
|
|
|
@1178
|
24 years |
sjboddie |
modified perl dmsafe function to handle backslashes
|
|
|
@1086
|
24 years |
sjboddie |
Added AZCompactList.pm to distribution (and altered List.pm slightly …
|
|
|
@1072
|
24 years |
sjboddie |
Fixed bug - Control B's and C's were only being removed from body of …
|
|
|
@1046
|
24 years |
sjboddie |
added comment to make me feel better for having spent an hour testing …
|
|
|
@1044
|
24 years |
nzdl |
don't output doctype field to gdbm if document already has metadata …
|
|
|
@1020
|
24 years |
sjboddie |
changed paths to collection images (again!)
|
|
|
@1010
|
24 years |
sjboddie |
renamed old html module ghtml -- it clashed with builtin html module …
|
|
|
@1006
|
24 years |
sjboddie |
fixed but in previous changes
|
|
|
@983
|
24 years |
sjboddie |
link() function isn't supported on windows - use copy
|
|
|
@973
|
24 years |
sjboddie |
new path to images
|
|
|
@965
|
24 years |
sjboddie |
fixed bug - added assoc_files option
|
|
|
@932
|
24 years |
kjm18 |
new building programs for mgpp added
|
|
|
@918
|
24 years |
kjm18 |
fixed bug where it was creating two doc_obj per file instead of just one.
|
|
|
@900
|
24 years |
sjboddie |
tweaked the way associated files are handled at build time - some …
|
|
|
@899
|
24 years |
sjboddie |
small change to doc data structure to allow for some hacking
in WebPlug
|
|
|
@898
|
24 years |
sjboddie |
fixed small bug (groupsize had no default)
|
|
|
@897
|
24 years |
sjboddie |
lots of stuff
|
|
|
@863
|
24 years |
sjboddie |
fixed a couple of bugs that I introduced when including Davids stuff
|
|
|
@862
|
24 years |
sjboddie |
fixed a couple of bugs that were preventing muliple document gml files …
|
|
|
@850
|
24 years |
sjboddie |
added use strict - tidied a few things up etc.
|
|
|
@849
|
24 years |
sjboddie |
Fixed a bit of a bug
|
|
|
@847
|
25 years |
sjboddie |
fixed CVS burp
|
|
|
@846
|
25 years |
sjboddie |
don't use hashdoc for now
|
|
|
@842
|
25 years |
davidb |
base object for 'doc' objects (UTF8 or ASCII)
|
|
|
@840
|
25 years |
davidb |
Optimisations to make plugin go faster
|
|
|
@839
|
25 years |
davidb |
added extra_metadata function
|
|
|
@838
|
25 years |
davidb |
added options passed into 'new' subroutine
|
|
|
@837
|
25 years |
davidb |
added alpha_numeric search
|
|
|
@836
|
25 years |
davidb |
improvements to utils
|
|
|
@835
|
25 years |
davidb |
added 'begin' and 'end' function for plugins
|
|
|
@834
|
25 years |
davidb |
'groupsize' added
|
|
|
@833
|
25 years |
davidb |
new doc type for ascii only documents (lots faster the doc.pm)
|
|
|
@832
|
25 years |
davidb |
Object modified to have basedoc
|
|
|
@831
|
25 years |
davidb |
added support for multiple metavales for a metadata type
|
|
|
@813
|
25 years |
sjboddie |
plugins now take options and classifiers are handled properly
|
|
|
@812
|
25 years |
sjboddie |
hard_link returns if link destination already exists
|
|
|
@811
|
25 years |
sjboddie |
classifiers are loaded up more like plugins
|
|
|
@810
|
25 years |
sjboddie |
plugins now take options, files are associated at build time as
well …
|
|
|
@809
|
25 years |
sjboddie |
plugins now take options, maxdocs is always defined
|
|
|
@808
|
25 years |
sjboddie |
New html plugin with options
|
|
|
@796
|
25 years |
sjboddie |
semi-colon;;;;
|
|
|
@784
|
25 years |
sjboddie |
added -keepold option
|
|
|
@782
|
25 years |
sjboddie |
removed gettext.pl - added debug, mode and index options to …
|
|
|
@780
|
25 years |
sjboddie |
added dontgdbm configuration option
|
|
|
@779
|
25 years |
sjboddie |
fixed bug in title option
|
|
|
@775
|
25 years |
sjboddie |
urlsafe now converts '/' characters
|
|
|
@741
|
25 years |
sjboddie |
fixed up a bit of a bug - should fix this properly
some time
|
|
|
@740
|
25 years |
sjboddie |
windows specific bug
|
|
|
@734
|
25 years |
sjboddie |
removed old out of date comments
|
|
|
@733
|
25 years |
sjboddie |
just minor changes to book cover image stuff
|
|
|
@732
|
25 years |
sjboddie |
prevent from overriding Title metadata that may have been passed
in …
|
|
|
@721
|
25 years |
davidb |
Support functions to help with the generation of webpages from
Perl …
|
|
|
@717
|
25 years |
sjboddie |
caught HTML classifier up with new browsing structure
|
|
|
@709
|
25 years |
sjboddie |
no longer need classifytype metadata added from plugin
|
|
|
@708
|
25 years |
sjboddie |
fixed problem with titles beginning with tags or html elements
|
|
|