source: trunk/gsdl/perllib

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @3727   21 years sjboddie Reverted back last change to AZList.pm. Attempting to make it work …
(edit) @3726   21 years jrm21 minor fix for "_" chars in urls... escape them after, not before. …
(edit) @3724   21 years kde2 Submission of Interface Translation Agency
(edit) @3721   21 years jrm21 bug where some text/plain messages weren't having < > & properly …
(edit) @3720   21 years sjboddie Added options to PDFPlug to take advantage of the improvements in …
(edit) @3719   21 years sjboddie Prevent dodgy format_string_english() functions from destroying any …
(edit) @3718   21 years sjboddie Added new language model so textcat can detect UTF-8 encoded Russian
(edit) @3708   21 years sjboddie Fixed a bug where HTMLPlug failed to associate files whose filenames …
(edit) @3665   21 years sjboddie Prevent occurances of 70 or more hyphens in metadata values from …
(edit) @3639   21 years kjdon modified the default sorting element, and fixed a bug.
(edit) @3630   21 years jrm21 1) Correct typo in print_usage(): process_exp -> split_exp 2) Fixed …
(edit) @3629   21 years jrm21 need to look for associated files in the assocfilepath, if this …
(edit) @3628   21 years jrm21 hard_link returns 1 on error. (But not if it attempts to copy - always …
(edit) @3627   21 years jrm21 added less-obfuscated quote-printable parsing in qp_decode()
(edit) @3614   21 years jrm21 modified section-handling stuff to work with output from v.0.34 of …
(edit) @3590   21 years jrm21 modified the split regular expression so it works with newer versions …
(edit) @3587   21 years jrm21 removed comments about storing "BibTex" metadata as we don't do that …
(edit) @3542   22 years jrm21 ghtml returns utf8, not iso-8859-1, so any html entities were being …
(edit) @3540   22 years kjdon added John T's changes into CVS - added info to enable retrieval of …
(edit) @3539   22 years kjdon added jpe to the process and block expressions
(edit) @3537   22 years jrm21 if process() returns undef, then the plugin couldn't process that …
(edit) @3536   22 years jrm21 set doc title to "" if it is undefined
(edit) @3529   22 years jrm21 fixed oversight where alpha_numeric_cmp was no longer being called …
(edit) @3524   22 years kjdon added the help message for the previous change
(edit) @3523   22 years kjdon now EMAILplug accepts the split_exp option - a regular expression that …
(edit) @3520   22 years jrm21 wrong variable name meant we were throwing away the first line of each …
(edit) @3517   22 years davidb ImagePlug modified so 'Source' metadata set to be consistent with …
(edit) @3515   22 years jrm21 call a plugin's set_OID() method if one exists, otherwise use the …
(edit) @3510   22 years jrm21 need to check that remove_prefix is defined before checking its length
(edit) @3508   22 years jrm21 modified copyright statement
(edit) @3507   22 years jrm21 updated to also allow '..."foo" ...' as the enclosing quotes (for …
(edit) @3506   22 years jrm21 need to allow escaped \" inside a multiline "...". Eg …
(edit) @3472   22 years kjdon renamed phind.pm to Phind.pm in keeping with the names of the other …
(edit) @3433   22 years jrm21 If a metadata value becomes empty (because of the removeprefix option) …
(edit) @3430   22 years jrm21 Added MARCPlug, mostly done by David Bainbridge. It needs a …
(edit) @3427   22 years sjboddie The input encoding will now default to utf8 instead of iso-8859-1. …
(edit) @3426   22 years jrm21 Don't add \n to the end of each metadata value.
(edit) @3418   22 years jrm21 Allow fields to stretch over multiple lines if enclosed in double …
(edit) @3416   22 years jrm21 Fix up problem if no documents were processed and accepted.
(edit) @3415   22 years jrm21 don't try to write to and close an archive file if one wasn't opened …
(edit) @3414   22 years jrm21 Need to escape "_" characters so that greenstone doesn't interprete them…
(edit) @3413   22 years jrm21 Added "\" to the characters we need to escape for classifying.
(edit) @3411   22 years jrm21 Now takes a "-use_sections" option to make a section per page.
(edit) @3402   22 years sjboddie import.pl now tells user where the fail.log lives
(edit) @3400   22 years sjboddie WordPlug now handles .dot files as well as .doc files.
(edit) @3398   22 years jrm21 Oops... the last change to the regex was too permissive... fixed up to …
(edit) @3397   22 years jrm21 minor change to the regex for marking up urls (to allow #anchor at the end)
(edit) @3369   22 years sjboddie HTMLPlug will no longer prevent metadata extraction when the …
(edit) @3352   22 years jrm21 We can now properly handle messages with a content type of …
(edit) @3351   22 years jrm21 If a message is in an unsupported encoding, we assume iso8859-1. …
(edit) @3350   22 years sjboddie Added -use_strings option to ConvertToPlug. The default behaviour for …
(edit) @3349   22 years sjboddie Bug fix.
(edit) @3329   22 years jrm21 Oops, removed debugging statement!
(edit) @3328   22 years jrm21 Make sure that sender's name is more than 0 chars long, otherwise use …
(edit) @3307   22 years davidb Some minor modifications to Image Plugin: filenames can now include …
(edit) @3306   22 years davidb Removed some debugging print statements
(edit) @3303   22 years davidb Classifier extented to support frequency sort option through -freqsort …
(edit) @3302   22 years davidb Classifier modified so it does not include A-Z letters at top of page …
(edit) @3249   22 years jrm21 1) add a space when joining consecutive lines, just in case. 2) Don't …
(edit) @3248   22 years jrm21 If we convert to HTML, we post-process to change named entities (eg …
(edit) @3247   22 years jrm21 Modified automatic title extraction to also recognise utf-8 nbsp as …
(edit) @3244   22 years jrm21 we no longer exit with an error if the suffix program failed to create …
(edit) @3226   22 years jrm21 Don't allow fields Encoding or Language for search - these are internal?!?
(edit) @3215   22 years jrm21 Fixed up some regexs for mime header encodings - eg people with …
(edit) @3206   22 years jrm21 Oops! Bad things were happening when the headers said utf-8 encoding, …
(edit) @3196   22 years sjboddie Added &nbsp; to the list of entities that HTMLPlug doesn't convert to utf-8
(edit) @3195   22 years kjdon create_shortname (turns a long metadata name into 2 char name) changed …
(edit) @3181   22 years sjboddie Altered the getcharequiv() function so it now converts entities to raw …
(edit) @3158   22 years kjdon the indexfieldmap list is now in sorted order with TextOnly at the …
(edit) @3156   22 years jrm21 Added a few extra accented characters, and recognise some …
(edit) @3148   22 years jrm21 If a document has associated files that are also given a subdirectory, …
(edit) @3146   22 years sjboddie textcat now returns "id" for Indonesian instead of "in"
(edit) @3144   22 years kjdon added mgpp's metadata field map to the gdbm file For metadata, it uses …
(edit) @3143   22 years jrm21 Minor tweak for badly formatted dates. We now use a window, so …
(edit) @3142   22 years jrm21 1) We can't use "Date" for the year metadata, as greenstone assumes …
(edit) @3137   22 years paynter Changed the way Width, Height, Size and Type metadata is calculated. …
(edit) @3136   22 years paynter Reconciled John's version of my changes to EMAILPlug with my version …
(edit) @3135   22 years jrm21 modified process_exp to process php3 -named files too.
(edit) @3134   22 years jrm21 1) Convert headers to detected charset if possible. 2) Convert header …
(edit) @3132   22 years jrm21 Try to determine the encoding used in the headers in case it is not …
(edit) @3130   22 years jrm21 Added map files for iso-8859-15 encoding, which is basically Latin1 …
(edit) @3116   22 years sjboddie RecPlug will now die with an error if it finds a metadata.xml file …
(edit) @3115   22 years jrm21 Redirect mg(pp)_passes stderr to /dev/null if the "-out xxx" option is …
(edit) @3112   22 years jrm21 minor changes to formatted values (eg if enclosed in { and } ) and …
(edit) @3111   22 years jrm21 Allow .eml extension (IE and mozilla default to this for individual …
(edit) @3109   22 years jrm21 When getting first char for classification, s/(.).*$/$1/g isn't good …
(edit) @3108   22 years jrm21 Don't recursive into directories if they are symbolic links and point …
(edit) @3107   22 years jrm21 fixed problem where documents after a "bad" document would not be read …
(edit) @3095   22 years jrm21 Added check for reading an empty file (ie read_line() returns undef).
(edit) @3094   22 years jrm21 Needed to add failhandle to the init() function, to pass to BasPlug.
(edit) @3086   22 years nzdl * empty log message *
(edit) @3073   22 years jrm21 1) Default Title now correctly escapes [ and ] chars. 2) …
(edit) @3038   22 years jrm21 Put \" \" around href for srclink, in case the collection name has …
(edit) @3037   22 years jrm21 title_sub seems to always get defined by parsargv, so we test that it …
(edit) @3019   22 years jrm21 Fixes for when on windows - it was having a lot of trouble sorting out …
(edit) @2996   22 years sjboddie * empty log message *
(edit) @2995   22 years sjboddie Fixed a bug preventing HTML headers from being removed correctly when …
(edit) @2994   22 years jrm21 Added some mime types, and gave a url for "the list" of types at iana.org
(edit) @2990   22 years jrm21 Do MS Excel using ConvertToPlug, which currently uses the xlhtml package.
(edit) @2981   22 years jrm21 Added a minimal powerpoint plugin that causes an external converter to …
Note: See TracRevisionLog for help on using the revision log.