|
|
@2096
|
23 years |
jrm21 |
Minor changes to regexs, so that header fields have to be at start of …
|
|
|
@2086
|
23 years |
jrm21 |
We create a copy of any args to new() because parsargs might modify …
|
|
|
@2085
|
23 years |
jrm21 |
When importing, we need to escape any escape codes otherwise mg(?) …
|
|
|
@2084
|
23 years |
jrm21 |
usage message is now formatted to fit within 80 columns.
|
|
|
@2083
|
23 years |
paynter |
Fixed a stupid mistake that I know I've fixed before.
|
|
|
@2082
|
23 years |
jrm21 |
added bzip2 support (untested).
|
|
|
@2080
|
23 years |
jrm21 |
When creating nodes, now need to pass -buttonname instead of -title.
|
|
|
@2079
|
23 years |
paynter |
Added a new binary field to the savephrases output that indicates …
|
|
|
@2064
|
23 years |
paynter |
Sort thesaurus phrases by frequency then type.
|
|
|
@2048
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2041
|
23 years |
jrm21 |
don't strip all whitespace from tmp filename, only from base name. …
|
|
|
@2040
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2039
|
23 years |
jrm21 |
do eval{symlink()} because platforms that don't support symlink …
|
|
|
@2036
|
23 years |
jrm21 |
don't use strict; anymore, as we want to be able to write error msgs …
|
|
|
@2029
|
23 years |
jrm21 |
Return 0 instead of "" on error in read() so that RecPlug can continue.
|
|
|
@2027
|
23 years |
jrm21 |
read() is now completely independent of BasPlug::read(), as the latter …
|
|
|
@2025
|
23 years |
paynter |
You can now have several phind classifiers on one collection. This …
|
|
|
@2024
|
23 years |
paynter |
Store classifier-specific parameters in gdbm file if required. …
|
|
|
@2022
|
23 years |
sjboddie |
Caught some of the classifiers up with the documentation (finally). …
|
|
|
@2018
|
23 years |
jrm21 |
removed "use BasPlug" lines from metadata extractors, as they …
|
|
|
@2008
|
23 years |
paynter |
Marginally better support for non-English documents.
|
|
|
@2007
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2001
|
23 years |
sjboddie |
Added a hack that mysteriously converts iso639 language codes …
|
|
|
@2000
|
23 years |
sjboddie |
Re-added iso639.pm
|
|
|
@1999
|
23 years |
sjboddie |
Fixed a small problem with language detection code.
|
|
|
@1995
|
23 years |
jmt14 |
* empty log message *
|
|
|
@1989
|
23 years |
jmt14 |
* empty log message *
|
|
|
@1974
|
23 years |
cs025 |
Fixed omission of encoding from parameters in read_file
|
|
|
@1973
|
23 years |
kjm18 |
fixed up language stuff
|
|
|
@1972
|
23 years |
jmt14 |
* empty log message *
|
|
|
@1954
|
23 years |
jmt14 |
* empty log message *
|
|
|
@1949
|
23 years |
paynter |
Fixed bug that prevented tokeniser from distinguishing between languages.
|
|
|
@1948
|
23 years |
jrm21 |
Updated to now pass arguments using the new parsargv list format, …
|
|
|
@1947
|
23 years |
dmm9 |
updated documentation
|
|
|
@1929
|
23 years |
dg5 |
Modified: ConvertToPlug and HTMLPlug to handle files in binary mode to …
|
|
|
@1920
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1919
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1917
|
23 years |
kjm18 |
minor changes
|
|
|
@1905
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1904
|
23 years |
sjboddie |
Added support for a couple more encodings that I'm told are in common …
|
|
|
@1903
|
23 years |
sjboddie |
We now use textcats best guess if it returns 3 or less possibilities …
|
|
|
@1901
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1897
|
23 years |
paynter |
Convert_gml_into_tokens function a little more language tolerant,
and …
|
|
|
@1895
|
23 years |
jrm21 |
Email plug now uses SplitPlug for mbox mail files. Hopefully this …
|
|
|
@1894
|
23 years |
jrm21 |
updated by copying BasPlug's new language/encoding stuff over for the …
|
|
|
@1891
|
23 years |
paynter |
Named characters like é and ì are translated
to UTF8 …
|
|
|
@1890
|
23 years |
paynter |
When multiple metadata fields have multiple values, get them all. …
|
|
|
@1885
|
23 years |
paynter |
Added a classinfo.pl script, analogous to pluginfo.pl, that provides …
|
|
|
@1884
|
23 years |
paynter |
Added some documentation.
|
|
|
@1883
|
23 years |
paynter |
Supports new parameters of suffix program and new stopword file …
|
|
|
@1874
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1871
|
23 years |
paynter |
Use two-letter codes for language names, updated docs.
|
|
|
@1870
|
23 years |
sjboddie |
Tidied up language support stuff.
|
|
|
@1869
|
23 years |
paynter |
Regular expression fix.
|
|
|
@1868
|
23 years |
sjboddie |
Made a bunch of changes to the building code to support lots of new …
|
|
|
@1857
|
23 years |
dmm9 |
date extraction options documented
|
|
|
@1855
|
23 years |
paynter |
Trivial change to warning message.
|
|
|
@1852
|
23 years |
kjm18 |
heaps of changes
|
|
|
@1851
|
23 years |
kjm18 |
added levels and buildtype for mgpp collections
|
|
|
@1846
|
23 years |
sjboddie |
Removed a call to a function that I removed in my previous changes - oops
|
|
|
@1845
|
23 years |
paynter |
Changed a "!=" to a "ne".
|
|
|
@1844
|
23 years |
sjboddie |
Added an 'auto' argument to BasPlug's '-input_encoding' option ('auto' …
|
|
|
@1843
|
23 years |
sjboddie |
Re-included some languages for which we had removed support
|
|
|
@1840
|
23 years |
paynter |
Changed default suffix size, clean up phrases.3 file
|
|
|
@1839
|
23 years |
paynter |
Updated classifiers to use the parsearg library instead of ad-hoc …
|
|
|
@1838
|
23 years |
sjboddie |
Added support for Cyrillic languages (windows codepage 1251) - yet to …
|
|
|
@1829
|
23 years |
paynter |
Accept a "thesaurus=name" option that identifies a thesaurus in a …
|
|
|
@1812
|
23 years |
sjboddie |
ZIPPlug is now disabled under windows
|
|
|
@1810
|
23 years |
sjboddie |
Fixed a bug that showed up when using Perl 5.6 on windows
|
|
|
@1808
|
23 years |
paynter |
Option to save the phind phrases to a text file.
|
|
|
@1803
|
23 years |
paynter |
Moved the phind classifier's data directory into the index directory. …
|
|
|
@1799
|
23 years |
sjboddie |
fixed a little bug in the building code that caused an endless loop if …
|
|
|
@1787
|
23 years |
jrm21 |
"allow_extra_options" missing, to get inherited options
|
|
|
@1778
|
23 years |
sjboddie |
Implemented the new MailServer, LogEvents, EmailEvents and …
|
|
|
@1772
|
23 years |
kjm18 |
removed Paragraph stuff - now only has Document and Section; added …
|
|
|
@1762
|
23 years |
sjboddie |
Added support for the new LogEvents, EmailEvents, EmailUserEvents and …
|
|
|
@1758
|
23 years |
say1 |
added minimum image size and a few bug fixes
|
|
|
@1757
|
23 years |
say1 |
tightened the criteria for email files to avoid matching all dynamic …
|
|
|
@1756
|
23 years |
say1 |
added detection and handling of unreadable files
|
|
|
@1755
|
23 years |
say1 |
added better cycle detection (but still not perfect)
|
|
|
@1754
|
23 years |
say1 |
added support for jar files (which are actually just fancy zip files)
|
|
|
@1744
|
23 years |
say1 |
about a billion changes to ImagePlug
|
|
|
@1742
|
23 years |
jrm21 |
Added a comment to the usage stuff about PRESCRIPT.
|
|
|
@1741
|
23 years |
sjboddie |
Fixed a little bug that was causing pluginfo.pl to print some dodgy …
|
|
|
@1740
|
23 years |
jrm21 |
We now escape underscores so that any macros in source code (wrt to …
|
|
|
@1735
|
23 years |
say1 |
fixed about a billion little Image things.
|
|
|
@1733
|
23 years |
say1 |
new plugin for images
|
|
|
@1732
|
23 years |
say1 |
check metadata before adding
|
|
|
@1731
|
23 years |
jrm21 |
New and improved! Now gets #include information from std C files as …
|
|
|
@1730
|
23 years |
jrm21 |
removed a debugging statement left in accidentally…
|
|
|
@1729
|
23 years |
jrm21 |
title regexp should have started "\s*", not "\s+" - it's optional …
|
|
|
@1728
|
23 years |
jrm21 |
Minor change so that leading whitespace is skipped when grabbing the …
|
|
|
@1720
|
23 years |
dmm9 |
Added information to the usage text about date extraction option
|
|
|
@1719
|
23 years |
dmm9 |
Added information to the usage text about date extraction option
|
|
|
@1718
|
23 years |
dmm9 |
Added information to the usage text about date extraction option
|
|
|
@1716
|
23 years |
jrm21 |
minor change to allow the -title option to display correctly on HTML page.
|
|
|
@1712
|
23 years |
say1 |
cleaned up metadata extraction.
|
|
|
@1711
|
23 years |
say1 |
fixed minor spelling mistake
|
|
|
@1710
|
23 years |
say1 |
RecPlug now skips CVS directories.
|
|
|
@1707
|
23 years |
jrm21 |
Plugin for source code (primarily for putting Greenstone src into a …
|
|
|