|
|
@2735
|
23 years |
sjboddie |
Fixed up bugs I introduced with recent change to BasPlug
|
|
|
@2734
|
23 years |
sjboddie |
Chinese text segmentation is now done whenever language="zh" instead …
|
|
|
@2733
|
23 years |
jrm21 |
minor regex fixes/improvements.
|
|
|
@2732
|
23 years |
jrm21 |
needed <pre> tags when using the text/plain part of a multipart message.
|
|
|
@2730
|
23 years |
jrm21 |
1) Non-ascii characters should now work for any encoding handled by …
|
|
|
@2717
|
23 years |
jrm21 |
Do some email munging - @ symbols become @. Both netscape and IE …
|
|
|
@2713
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2711
|
23 years |
sjboddie |
Removed the "beta" collect.cfg option to avoid awkward questions from …
|
|
|
@2700
|
23 years |
cs025 |
fixed this up for building under windows
|
|
|
@2695
|
23 years |
jrm21 |
Allow spaces in img src=... tags if surrounded with dbl quotes.
|
|
|
@2685
|
23 years |
jrm21 |
Improved regex for when the last category is too small, and we need to …
|
|
|
@2681
|
23 years |
jrm21 |
fixed a few more minor MIME header parsing cases.
|
|
|
@2680
|
23 years |
jrm21 |
1. we escape 'and' chars in headers so greenstone doesn't try to …
|
|
|
@2667
|
23 years |
jrm21 |
protect against < and > chars, as <pre> tags don't preserve them.
|
|
|
@2666
|
23 years |
jrm21 |
Modified phind classifier so that special delimiters are always …
|
|
|
@2662
|
23 years |
jrm21 |
oops, that's a bit stupid (of me) - changed:
if …
|
|
|
@2661
|
23 years |
jrm21 |
added a default block exp of "" so it doesn't inherit HTMLPlugs…
|
|
|
@2658
|
23 years |
jrm21 |
fixed a typo
|
|
|
@2657
|
23 years |
jrm21 |
fixed a bug when #including a macro (ie no "... or <... on the line)
|
|
|
@2652
|
23 years |
jrm21 |
Needed to replace \s with s. Also checked for multipart/related.
|
|
|
@2638
|
23 years |
jrm21 |
typo in regexp broke import... encoding type should have had [\s], …
|
|
|
@2632
|
23 years |
jrm21 |
added an option "-bymonth=1", to group by (eg) 2000-January, …
|
|
|
@2631
|
23 years |
jrm21 |
Don't assume funny dates are 20th C - eg 101 -> 19101 - add to 1900 …
|
|
|
@2630
|
23 years |
jrm21 |
Mime support for multipart messages. Doesn't extract attachments …
|
|
|
@2604
|
23 years |
jrm21 |
when extracting email addresses, we now include people in the .net …
|
|
|
@2601
|
23 years |
jrm21 |
modified usage to not mention HTMLplug blocking rtf.
|
|
|
@2576
|
23 years |
sjboddie |
Moved phind's stopword directory from etc to etc/packages/phind
|
|
|
@2564
|
23 years |
jrm21 |
Added RTFPlug. (It's the smallest one so far - 1511 bytes - yay!)
…
|
|
|
@2539
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2529
|
23 years |
sjboddie |
added quoting to system calls in phind classifier - needed when …
|
|
|
@2525
|
23 years |
kjm18 |
removed unneeded output
|
|
|
@2516
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2515
|
23 years |
sjboddie |
Fixed a couple of bugs/inconsistencies in word and pdf plugins that …
|
|
|
@2510
|
23 years |
sjboddie |
renamed phind's stopwords directory and contents to use Win3.1 …
|
|
|
@2509
|
23 years |
sjboddie |
Fixed (bypassed really) a problem with the phind classifier on windows …
|
|
|
@2507
|
23 years |
sjboddie |
Tidied up the phind client a little more. It now belongs to the …
|
|
|
@2506
|
23 years |
dmm9 |
added writing of collection document list to db (OID browselist)
|
|
|
@2505
|
23 years |
dmm9 |
added collection of collection document list
|
|
|
@2503
|
23 years |
sjboddie |
fixed a small bug in the datelist classifier that caused year ranges …
|
|
|
@2500
|
23 years |
sjboddie |
Removed test for phindcgi from phind classifier as it is no longer used
|
|
|
@2493
|
23 years |
paynter |
Changed at the request of Marcio - see mailing list.
|
|
|
@2492
|
23 years |
paynter |
Fixed trivial bug in the new set_OID function.
|
|
|
@2489
|
23 years |
dmm9 |
adding the browse interface as a classifier option
|
|
|
@2487
|
23 years |
sjboddie |
Changes to get phind working under windows
|
|
|
@2484
|
23 years |
say1 |
Changed SplitPlug to allow control over the OID. Changed BibTexPlug to …
|
|
|
@2483
|
23 years |
say1 |
added a "if" to catch the case where someone tries to convert an …
|
|
|
@2481
|
23 years |
kjm18 |
changed mgpp system calls to use the new executable names
|
|
|
@2480
|
23 years |
kjm18 |
added the store_text option as done in mgbuildproc.pm
|
|
|
@2479
|
23 years |
kjm18 |
added indexmap and indexfieldmap to build.cfg fields
|
|
|
@2478
|
23 years |
kjm18 |
brought it in line with changes to buildcol.pl, mgbuilder.pm
now uses …
|
|
|
@2453
|
23 years |
jrm21 |
Slightly smarter title extraction from body's text.
|
|
|
@2452
|
23 years |
jrm21 |
-title_sub works now -- previously had a leading "--" argument, which …
|
|
|
@2451
|
23 years |
jrm21 |
PSPlug now uses the -title_sub option to TEXTPlug, to remove any …
|
|
|
@2450
|
23 years |
jrm21 |
now accepts the "-title_sub" option, a regexp to remove when …
|
|
|
@2432
|
23 years |
say1 |
switched the order of removing the symbolic link and checking for …
|
|
|
@2412
|
23 years |
sjboddie |
Added a tar archive of all the perl modules required to make ping.pl work
|
|
|
@2364
|
23 years |
jrm21 |
turn "\" into " " so that we don't lose backslashes along the way…
|
|
|
@2363
|
23 years |
jrm21 |
fixed nasty bug where </srclink></a><srclink> was being matched …
|
|
|
@2359
|
23 years |
sjboddie |
Altered the help text a little for mkcol.pl, import.pl, buildcol.pl, …
|
|
|
@2356
|
23 years |
sjboddie |
Renamed HBSPlug BookPlug in the hope that it's a little less crytic
|
|
|
@2355
|
23 years |
sjboddie |
All options to import.pl and buildcol.pl may now be specified from …
|
|
|
@2342
|
23 years |
sjboddie |
renamed HTMLPlug's w3mir option to file_is_url
|
|
|
@2336
|
23 years |
sjboddie |
added a -no_text option to buildcol.pl to allow collections to be …
|
|
|
@2333
|
23 years |
kjm18 |
closed all filehandles that had remained open, to fix the bug that was …
|
|
|
@2327
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2326
|
23 years |
sjboddie |
fixed a small bug in the new XML gml code that caused metadata tags …
|
|
|
@2267
|
23 years |
davidb |
GML file syntax altered to be XML compliant. This basically meant …
|
|
|
@2241
|
23 years |
sjboddie |
Tidied up the ConvertToPlug stuff to get it working on Windows 95/98
|
|
|
@2237
|
23 years |
sjboddie |
Added a unicode2koi8r function to unicode.pm (because I needed one). …
|
|
|
@2235
|
23 years |
sjboddie |
Hacked the textcat package about so that it only reads all the …
|
|
|
@2230
|
23 years |
paynter |
User can erquest a "Screen" image - essentially a second thumbnail. …
|
|
|
@2228
|
23 years |
paynter |
The -use_metadata_files option tells RecPlug to read any metadata XML …
|
|
|
@2226
|
23 years |
paynter |
Image size metadata fixed, dummy text added, Image filename fixed.
|
|
|
@2224
|
23 years |
paynter |
When the document has associated files, a metadata element …
|
|
|
@2219
|
23 years |
sjboddie |
Had another go at suppressing the "subroutine redefined" warnings as …
|
|
|
@2209
|
23 years |
sjboddie |
Suppressed some annoying perl warnings
|
|
|
@2207
|
23 years |
paynter |
Bugfixes: read returns number of files instead of file type, and …
|
|
|
@2206
|
23 years |
paynter |
Annoying bug.
|
|
|
@2193
|
23 years |
sjboddie |
soft_link function now acts as a simple "copy" function on windows
|
|
|
@2096
|
23 years |
jrm21 |
Minor changes to regexs, so that header fields have to be at start of …
|
|
|
@2086
|
23 years |
jrm21 |
We create a copy of any args to new() because parsargs might modify …
|
|
|
@2085
|
23 years |
jrm21 |
When importing, we need to escape any escape codes otherwise mg(?) …
|
|
|
@2084
|
23 years |
jrm21 |
usage message is now formatted to fit within 80 columns.
|
|
|
@2083
|
23 years |
paynter |
Fixed a stupid mistake that I know I've fixed before.
|
|
|
@2082
|
23 years |
jrm21 |
added bzip2 support (untested).
|
|
|
@2080
|
23 years |
jrm21 |
When creating nodes, now need to pass -buttonname instead of -title.
|
|
|
@2079
|
23 years |
paynter |
Added a new binary field to the savephrases output that indicates …
|
|
|
@2064
|
23 years |
paynter |
Sort thesaurus phrases by frequency then type.
|
|
|
@2048
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2041
|
23 years |
jrm21 |
don't strip all whitespace from tmp filename, only from base name. …
|
|
|
@2040
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2039
|
23 years |
jrm21 |
do eval{symlink()} because platforms that don't support symlink …
|
|
|
@2036
|
23 years |
jrm21 |
don't use strict; anymore, as we want to be able to write error msgs …
|
|
|
@2029
|
23 years |
jrm21 |
Return 0 instead of "" on error in read() so that RecPlug can continue.
|
|
|
@2027
|
23 years |
jrm21 |
read() is now completely independent of BasPlug::read(), as the latter …
|
|
|
@2025
|
23 years |
paynter |
You can now have several phind classifiers on one collection. This …
|
|
|
@2024
|
23 years |
paynter |
Store classifier-specific parameters in gdbm file if required. …
|
|
|
@2022
|
23 years |
sjboddie |
Caught some of the classifiers up with the documentation (finally). …
|
|
|
@2018
|
23 years |
jrm21 |
removed "use BasPlug" lines from metadata extractors, as they …
|
|
|
@2008
|
23 years |
paynter |
Marginally better support for non-English documents.
|
|
|