|
|
@3721
|
21 years |
jrm21 |
bug where some text/plain messages weren't having < > & properly …
|
|
|
@3720
|
21 years |
sjboddie |
Added options to PDFPlug to take advantage of the improvements in …
|
|
|
@3719
|
21 years |
sjboddie |
Prevent dodgy format_string_english() functions from destroying any …
|
|
|
@3718
|
21 years |
sjboddie |
Added new language model so textcat can detect UTF-8 encoded Russian
|
|
|
@3708
|
21 years |
sjboddie |
Fixed a bug where HTMLPlug failed to associate files whose filenames …
|
|
|
@3665
|
21 years |
sjboddie |
Prevent occurances of 70 or more hyphens in metadata values from …
|
|
|
@3639
|
21 years |
kjdon |
modified the default sorting element, and fixed a bug.
|
|
|
@3630
|
22 years |
jrm21 |
1) Correct typo in print_usage(): process_exp -> split_exp
2) Fixed …
|
|
|
@3629
|
22 years |
jrm21 |
need to look for associated files in the assocfilepath, if this …
|
|
|
@3628
|
22 years |
jrm21 |
hard_link returns 1 on error. (But not if it attempts to copy - always …
|
|
|
@3627
|
22 years |
jrm21 |
added less-obfuscated quote-printable parsing in qp_decode()
|
|
|
@3614
|
22 years |
jrm21 |
modified section-handling stuff to work with output from v.0.34 of …
|
|
|
@3590
|
22 years |
jrm21 |
modified the split regular expression so it works with newer versions …
|
|
|
@3587
|
22 years |
jrm21 |
removed comments about storing "BibTex" metadata as we don't do that …
|
|
|
@3542
|
22 years |
jrm21 |
ghtml returns utf8, not iso-8859-1, so any html entities were being …
|
|
|
@3540
|
22 years |
kjdon |
added John T's changes into CVS - added info to enable retrieval of …
|
|
|
@3539
|
22 years |
kjdon |
added jpe to the process and block expressions
|
|
|
@3537
|
22 years |
jrm21 |
if process() returns undef, then the plugin couldn't process that …
|
|
|
@3536
|
22 years |
jrm21 |
set doc title to "" if it is undefined
|
|
|
@3529
|
22 years |
jrm21 |
fixed oversight where alpha_numeric_cmp was no longer being called …
|
|
|
@3524
|
22 years |
kjdon |
added the help message for the previous change
|
|
|
@3523
|
22 years |
kjdon |
now EMAILplug accepts the split_exp option - a regular expression that …
|
|
|
@3520
|
22 years |
jrm21 |
wrong variable name meant we were throwing away the first line of each …
|
|
|
@3517
|
22 years |
davidb |
ImagePlug modified so 'Source' metadata set to be consistent with …
|
|
|
@3515
|
22 years |
jrm21 |
call a plugin's set_OID() method if one exists, otherwise use the …
|
|
|
@3510
|
22 years |
jrm21 |
need to check that remove_prefix is defined before checking its length
|
|
|
@3508
|
22 years |
jrm21 |
modified copyright statement
|
|
|
@3507
|
22 years |
jrm21 |
updated to also allow '..."foo" ...' as the enclosing quotes (for …
|
|
|
@3506
|
22 years |
jrm21 |
need to allow escaped \" inside a multiline "...". Eg
…
|
|
|
@3472
|
22 years |
kjdon |
renamed phind.pm to Phind.pm in keeping with the names of the other …
|
|
|
@3433
|
22 years |
jrm21 |
If a metadata value becomes empty (because of the removeprefix option) …
|
|
|
@3430
|
22 years |
jrm21 |
Added MARCPlug, mostly done by David Bainbridge. It needs a …
|
|
|
@3427
|
22 years |
sjboddie |
The input encoding will now default to utf8 instead of iso-8859-1. …
|
|
|
@3426
|
22 years |
jrm21 |
Don't add \n to the end of each metadata value.
|
|
|
@3418
|
22 years |
jrm21 |
Allow fields to stretch over multiple lines if enclosed in double …
|
|
|
@3416
|
22 years |
jrm21 |
Fix up problem if no documents were processed and accepted.
|
|
|
@3415
|
22 years |
jrm21 |
don't try to write to and close an archive file if one wasn't opened …
|
|
|
@3414
|
22 years |
jrm21 |
Need to escape "_" characters so that greenstone doesn't interprete them…
|
|
|
@3413
|
22 years |
jrm21 |
Added "\" to the characters we need to escape for classifying.
|
|
|
@3411
|
22 years |
jrm21 |
Now takes a "-use_sections" option to make a section per page.
|
|
|
@3402
|
22 years |
sjboddie |
import.pl now tells user where the fail.log lives
|
|
|
@3400
|
22 years |
sjboddie |
WordPlug now handles .dot files as well as .doc files.
|
|
|
@3398
|
22 years |
jrm21 |
Oops... the last change to the regex was too permissive... fixed up to …
|
|
|
@3397
|
22 years |
jrm21 |
minor change to the regex for marking up urls (to allow #anchor at the end)
|
|
|
@3369
|
22 years |
sjboddie |
HTMLPlug will no longer prevent metadata extraction when the …
|
|
|
@3352
|
22 years |
jrm21 |
We can now properly handle messages with a content type of …
|
|
|
@3351
|
22 years |
jrm21 |
If a message is in an unsupported encoding, we assume iso8859-1. …
|
|
|
@3350
|
22 years |
sjboddie |
Added -use_strings option to ConvertToPlug. The default behaviour for …
|
|
|
@3349
|
22 years |
sjboddie |
Bug fix.
|
|
|
@3329
|
22 years |
jrm21 |
Oops, removed debugging statement!
|
|
|
@3328
|
22 years |
jrm21 |
Make sure that sender's name is more than 0 chars long, otherwise use …
|
|
|
@3307
|
22 years |
davidb |
Some minor modifications to Image Plugin: filenames can now
include …
|
|
|
@3306
|
22 years |
davidb |
Removed some debugging print statements
|
|
|
@3303
|
22 years |
davidb |
Classifier extented to support frequency sort option through -freqsort …
|
|
|
@3302
|
22 years |
davidb |
Classifier modified so it does not include A-Z letters at top of
page …
|
|
|
@3249
|
22 years |
jrm21 |
1) add a space when joining consecutive lines, just in case.
2) Don't …
|
|
|
@3248
|
22 years |
jrm21 |
If we convert to HTML, we post-process to change named entities (eg …
|
|
|
@3247
|
22 years |
jrm21 |
Modified automatic title extraction to also recognise utf-8 nbsp as …
|
|
|
@3244
|
22 years |
jrm21 |
we no longer exit with an error if the suffix program failed to create …
|
|
|
@3226
|
22 years |
jrm21 |
Don't allow fields Encoding or Language for search - these are internal?!?
|
|
|
@3215
|
22 years |
jrm21 |
Fixed up some regexs for mime header encodings - eg people with …
|
|
|
@3206
|
22 years |
jrm21 |
Oops! Bad things were happening when the headers said utf-8 encoding, …
|
|
|
@3196
|
22 years |
sjboddie |
Added to the list of entities that HTMLPlug doesn't convert to utf-8
|
|
|
@3195
|
22 years |
kjdon |
create_shortname (turns a long metadata name into 2 char name) changed …
|
|
|
@3181
|
22 years |
sjboddie |
Altered the getcharequiv() function so it now converts entities to raw …
|
|
|
@3158
|
22 years |
kjdon |
the indexfieldmap list is now in sorted order with TextOnly at the …
|
|
|
@3156
|
22 years |
jrm21 |
Added a few extra accented characters, and recognise some …
|
|
|
@3148
|
22 years |
jrm21 |
If a document has associated files that are also given a subdirectory, …
|
|
|
@3146
|
22 years |
sjboddie |
textcat now returns "id" for Indonesian instead of "in"
|
|
|
@3144
|
22 years |
kjdon |
added mgpp's metadata field map to the gdbm file
For metadata, it uses …
|
|
|
@3143
|
22 years |
jrm21 |
Minor tweak for badly formatted dates. We now use a window, so …
|
|
|
@3142
|
22 years |
jrm21 |
1) We can't use "Date" for the year metadata, as greenstone assumes …
|
|
|
@3137
|
22 years |
paynter |
Changed the way Width, Height, Size and Type metadata is calculated. …
|
|
|
@3136
|
22 years |
paynter |
Reconciled John's version of my changes to EMAILPlug with my version …
|
|
|
@3135
|
22 years |
jrm21 |
modified process_exp to process php3 -named files too.
|
|
|
@3134
|
22 years |
jrm21 |
1) Convert headers to detected charset if possible.
2) Convert header …
|
|
|
@3132
|
22 years |
jrm21 |
Try to determine the encoding used in the headers in case it is not …
|
|
|
@3130
|
22 years |
jrm21 |
Added map files for iso-8859-15 encoding, which is basically Latin1 …
|
|
|
@3116
|
22 years |
sjboddie |
RecPlug will now die with an error if it finds a metadata.xml file …
|
|
|
@3115
|
22 years |
jrm21 |
Redirect mg(pp)_passes stderr to /dev/null if the "-out xxx" option is …
|
|
|
@3112
|
22 years |
jrm21 |
minor changes to formatted values (eg if enclosed in { and } ) and …
|
|
|
@3111
|
22 years |
jrm21 |
Allow .eml extension (IE and mozilla default to this for individual …
|
|
|
@3109
|
22 years |
jrm21 |
When getting first char for classification, s/(.).*$/$1/g isn't good …
|
|
|
@3108
|
22 years |
jrm21 |
Don't recursive into directories if they are symbolic links and point …
|
|
|
@3107
|
22 years |
jrm21 |
fixed problem where documents after a "bad" document would not be
read …
|
|
|
@3095
|
22 years |
jrm21 |
Added check for reading an empty file (ie read_line() returns undef).
|
|
|
@3094
|
22 years |
jrm21 |
Needed to add failhandle to the init() function, to pass to BasPlug.
|
|
|
@3086
|
22 years |
nzdl |
* empty log message *
|
|
|
@3073
|
22 years |
jrm21 |
1) Default Title now correctly escapes [ and ] chars.
2) …
|
|
|
@3038
|
22 years |
jrm21 |
Put \" \" around href for srclink, in case the collection name has …
|
|
|
@3037
|
22 years |
jrm21 |
title_sub seems to always get defined by parsargv, so we test that it …
|
|
|
@3019
|
22 years |
jrm21 |
Fixes for when on windows - it was having a lot of trouble sorting out …
|
|
|
@2996
|
22 years |
sjboddie |
* empty log message *
|
|
|
@2995
|
22 years |
sjboddie |
Fixed a bug preventing HTML headers from being removed correctly when …
|
|
|
@2994
|
22 years |
jrm21 |
Added some mime types, and gave a url for "the list" of types at iana.org
|
|
|
@2990
|
22 years |
jrm21 |
Do MS Excel using ConvertToPlug, which currently uses the xlhtml package.
|
|
|
@2981
|
22 years |
jrm21 |
Added a minimal powerpoint plugin that causes an external converter to …
|
|
|
@2980
|
22 years |
jrm21 |
Added converted_to, which tells us what format the last input file we …
|
|
|
@2979
|
22 years |
jrm21 |
Use self->converted_to instead of convert_to, in case the file could …
|
|
|
@2975
|
22 years |
jrm21 |
Tidied up usage info to fit in 80 columns. Fixed title_sub stuff, so …
|
|
|