|
|
@2811
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2810
|
23 years |
sjboddie |
Created GAPlug (and XMLPlug base class) to replace the old GMLPlug. …
|
|
|
@2799
|
23 years |
sjboddie |
Fixed a bug where Word documents containing non-ascii characters …
|
|
|
@2796
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2795
|
23 years |
sjboddie |
Got ZIPPlug working under under windows
|
|
|
@2785
|
23 years |
sjboddie |
The build process now creates a summary of how many files were …
|
|
|
@2781
|
23 years |
jrm21 |
oops - left off a '$' at end of a pattern match.
|
|
|
@2779
|
23 years |
jrm21 |
Be a little more flexible when looking for boundary field in a …
|
|
|
@2761
|
23 years |
sjboddie |
added HTMLPlug2 temporarily while testing a new extract_subsections option
|
|
|
@2755
|
23 years |
jrm21 |
import.pl now takes an option for saving file conversion failures to a …
|
|
|
@2754
|
23 years |
jrm21 |
oops - left a debugging statement in there.
|
|
|
@2751
|
23 years |
sjboddie |
Had a go at enriching the default document structure.
Added …
|
|
|
@2735
|
23 years |
sjboddie |
Fixed up bugs I introduced with recent change to BasPlug
|
|
|
@2734
|
23 years |
sjboddie |
Chinese text segmentation is now done whenever language="zh" instead …
|
|
|
@2733
|
23 years |
jrm21 |
minor regex fixes/improvements.
|
|
|
@2732
|
23 years |
jrm21 |
needed <pre> tags when using the text/plain part of a multipart message.
|
|
|
@2730
|
23 years |
jrm21 |
1) Non-ascii characters should now work for any encoding handled by …
|
|
|
@2717
|
23 years |
jrm21 |
Do some email munging - @ symbols become @. Both netscape and IE …
|
|
|
@2695
|
23 years |
jrm21 |
Allow spaces in img src=... tags if surrounded with dbl quotes.
|
|
|
@2681
|
23 years |
jrm21 |
fixed a few more minor MIME header parsing cases.
|
|
|
@2680
|
23 years |
jrm21 |
1. we escape 'and' chars in headers so greenstone doesn't try to …
|
|
|
@2667
|
23 years |
jrm21 |
protect against < and > chars, as <pre> tags don't preserve them.
|
|
|
@2662
|
23 years |
jrm21 |
oops, that's a bit stupid (of me) - changed:
if …
|
|
|
@2661
|
23 years |
jrm21 |
added a default block exp of "" so it doesn't inherit HTMLPlugs…
|
|
|
@2657
|
23 years |
jrm21 |
fixed a bug when #including a macro (ie no "... or <... on the line)
|
|
|
@2652
|
23 years |
jrm21 |
Needed to replace \s with s. Also checked for multipart/related.
|
|
|
@2638
|
23 years |
jrm21 |
typo in regexp broke import... encoding type should have had [\s], …
|
|
|
@2630
|
23 years |
jrm21 |
Mime support for multipart messages. Doesn't extract attachments …
|
|
|
@2604
|
23 years |
jrm21 |
when extracting email addresses, we now include people in the .net …
|
|
|
@2601
|
23 years |
jrm21 |
modified usage to not mention HTMLplug blocking rtf.
|
|
|
@2564
|
23 years |
jrm21 |
Added RTFPlug. (It's the smallest one so far - 1511 bytes - yay!)
…
|
|
|
@2515
|
23 years |
sjboddie |
Fixed a couple of bugs/inconsistencies in word and pdf plugins that …
|
|
|
@2493
|
23 years |
paynter |
Changed at the request of Marcio - see mailing list.
|
|
|
@2492
|
23 years |
paynter |
Fixed trivial bug in the new set_OID function.
|
|
|
@2484
|
23 years |
say1 |
Changed SplitPlug to allow control over the OID. Changed BibTexPlug to …
|
|
|
@2453
|
23 years |
jrm21 |
Slightly smarter title extraction from body's text.
|
|
|
@2452
|
23 years |
jrm21 |
-title_sub works now -- previously had a leading "--" argument, which …
|
|
|
@2451
|
23 years |
jrm21 |
PSPlug now uses the -title_sub option to TEXTPlug, to remove any …
|
|
|
@2450
|
23 years |
jrm21 |
now accepts the "-title_sub" option, a regexp to remove when …
|
|
|
@2432
|
23 years |
say1 |
switched the order of removing the symbolic link and checking for …
|
|
|
@2364
|
23 years |
jrm21 |
turn "\" into " " so that we don't lose backslashes along the way…
|
|
|
@2363
|
23 years |
jrm21 |
fixed nasty bug where </srclink></a><srclink> was being matched …
|
|
|
@2356
|
23 years |
sjboddie |
Renamed HBSPlug BookPlug in the hope that it's a little less crytic
|
|
|
@2342
|
23 years |
sjboddie |
renamed HTMLPlug's w3mir option to file_is_url
|
|
|
@2327
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2326
|
23 years |
sjboddie |
fixed a small bug in the new XML gml code that caused metadata tags …
|
|
|
@2267
|
23 years |
davidb |
GML file syntax altered to be XML compliant. This basically meant …
|
|
|
@2241
|
23 years |
sjboddie |
Tidied up the ConvertToPlug stuff to get it working on Windows 95/98
|
|
|
@2235
|
23 years |
sjboddie |
Hacked the textcat package about so that it only reads all the …
|
|
|
@2230
|
23 years |
paynter |
User can erquest a "Screen" image - essentially a second thumbnail. …
|
|
|
@2228
|
23 years |
paynter |
The -use_metadata_files option tells RecPlug to read any metadata XML …
|
|
|
@2226
|
23 years |
paynter |
Image size metadata fixed, dummy text added, Image filename fixed.
|
|
|
@2219
|
23 years |
sjboddie |
Had another go at suppressing the "subroutine redefined" warnings as …
|
|
|
@2209
|
23 years |
sjboddie |
Suppressed some annoying perl warnings
|
|
|
@2207
|
23 years |
paynter |
Bugfixes: read returns number of files instead of file type, and …
|
|
|
@2096
|
23 years |
jrm21 |
Minor changes to regexs, so that header fields have to be at start of …
|
|
|
@2086
|
23 years |
jrm21 |
We create a copy of any args to new() because parsargs might modify …
|
|
|
@2085
|
23 years |
jrm21 |
When importing, we need to escape any escape codes otherwise mg(?) …
|
|
|
@2084
|
23 years |
jrm21 |
usage message is now formatted to fit within 80 columns.
|
|
|
@2082
|
23 years |
jrm21 |
added bzip2 support (untested).
|
|
|
@2041
|
23 years |
jrm21 |
don't strip all whitespace from tmp filename, only from base name. …
|
|
|
@2036
|
23 years |
jrm21 |
don't use strict; anymore, as we want to be able to write error msgs …
|
|
|
@2029
|
23 years |
jrm21 |
Return 0 instead of "" on error in read() so that RecPlug can continue.
|
|
|
@2027
|
23 years |
jrm21 |
read() is now completely independent of BasPlug::read(), as the latter …
|
|
|
@2007
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1999
|
23 years |
sjboddie |
Fixed a small problem with language detection code.
|
|
|
@1974
|
23 years |
cs025 |
Fixed omission of encoding from parameters in read_file
|
|
|
@1954
|
23 years |
jmt14 |
* empty log message *
|
|
|
@1929
|
23 years |
dg5 |
Modified: ConvertToPlug and HTMLPlug to handle files in binary mode to …
|
|
|
@1903
|
23 years |
sjboddie |
We now use textcats best guess if it returns 3 or less possibilities …
|
|
|
@1895
|
23 years |
jrm21 |
Email plug now uses SplitPlug for mbox mail files. Hopefully this …
|
|
|
@1894
|
23 years |
jrm21 |
updated by copying BasPlug's new language/encoding stuff over for the …
|
|
|
@1891
|
23 years |
paynter |
Named characters like é and ì are translated
to UTF8 …
|
|
|
@1874
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1870
|
23 years |
sjboddie |
Tidied up language support stuff.
|
|
|
@1869
|
23 years |
paynter |
Regular expression fix.
|
|
|
@1868
|
23 years |
sjboddie |
Made a bunch of changes to the building code to support lots of new …
|
|
|
@1857
|
23 years |
dmm9 |
date extraction options documented
|
|
|
@1855
|
23 years |
paynter |
Trivial change to warning message.
|
|
|
@1846
|
23 years |
sjboddie |
Removed a call to a function that I removed in my previous changes - oops
|
|
|
@1845
|
23 years |
paynter |
Changed a "!=" to a "ne".
|
|
|
@1844
|
23 years |
sjboddie |
Added an 'auto' argument to BasPlug's '-input_encoding' option ('auto' …
|
|
|
@1838
|
23 years |
sjboddie |
Added support for Cyrillic languages (windows codepage 1251) - yet to …
|
|
|
@1812
|
23 years |
sjboddie |
ZIPPlug is now disabled under windows
|
|
|
@1810
|
23 years |
sjboddie |
Fixed a bug that showed up when using Perl 5.6 on windows
|
|
|
@1787
|
23 years |
jrm21 |
"allow_extra_options" missing, to get inherited options
|
|
|
@1758
|
23 years |
say1 |
added minimum image size and a few bug fixes
|
|
|
@1757
|
23 years |
say1 |
tightened the criteria for email files to avoid matching all dynamic …
|
|
|
@1756
|
23 years |
say1 |
added detection and handling of unreadable files
|
|
|
@1755
|
23 years |
say1 |
added better cycle detection (but still not perfect)
|
|
|
@1754
|
23 years |
say1 |
added support for jar files (which are actually just fancy zip files)
|
|
|
@1744
|
23 years |
say1 |
about a billion changes to ImagePlug
|
|
|
@1742
|
23 years |
jrm21 |
Added a comment to the usage stuff about PRESCRIPT.
|
|
|
@1741
|
23 years |
sjboddie |
Fixed a little bug that was causing pluginfo.pl to print some dodgy …
|
|
|
@1740
|
23 years |
jrm21 |
We now escape underscores so that any macros in source code (wrt to …
|
|
|
@1735
|
23 years |
say1 |
fixed about a billion little Image things.
|
|
|
@1733
|
23 years |
say1 |
new plugin for images
|
|
|
@1731
|
23 years |
jrm21 |
New and improved! Now gets #include information from std C files as …
|
|
|
@1730
|
23 years |
jrm21 |
removed a debugging statement left in accidentally…
|
|
|
@1729
|
23 years |
jrm21 |
title regexp should have started "\s*", not "\s+" - it's optional …
|
|
|