source: trunk/gsdl/perllib

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @3132   22 years jrm21 Try to determine the encoding used in the headers in case it is not …
(edit) @3130   22 years jrm21 Added map files for iso-8859-15 encoding, which is basically Latin1 …
(edit) @3116   22 years sjboddie RecPlug will now die with an error if it finds a metadata.xml file …
(edit) @3115   22 years jrm21 Redirect mg(pp)_passes stderr to /dev/null if the "-out xxx" option is …
(edit) @3112   22 years jrm21 minor changes to formatted values (eg if enclosed in { and } ) and …
(edit) @3111   22 years jrm21 Allow .eml extension (IE and mozilla default to this for individual …
(edit) @3109   22 years jrm21 When getting first char for classification, s/(.).*$/$1/g isn't good …
(edit) @3108   22 years jrm21 Don't recursive into directories if they are symbolic links and point …
(edit) @3107   22 years jrm21 fixed problem where documents after a "bad" document would not be read …
(edit) @3095   22 years jrm21 Added check for reading an empty file (ie read_line() returns undef).
(edit) @3094   22 years jrm21 Needed to add failhandle to the init() function, to pass to BasPlug.
(edit) @3086   22 years nzdl * empty log message *
(edit) @3073   22 years jrm21 1) Default Title now correctly escapes [ and ] chars. 2) …
(edit) @3038   22 years jrm21 Put \" \" around href for srclink, in case the collection name has …
(edit) @3037   22 years jrm21 title_sub seems to always get defined by parsargv, so we test that it …
(edit) @3019   22 years jrm21 Fixes for when on windows - it was having a lot of trouble sorting out …
(edit) @2996   22 years sjboddie * empty log message *
(edit) @2995   22 years sjboddie Fixed a bug preventing HTML headers from being removed correctly when …
(edit) @2994   22 years jrm21 Added some mime types, and gave a url for "the list" of types at iana.org
(edit) @2990   22 years jrm21 Do MS Excel using ConvertToPlug, which currently uses the xlhtml package.
(edit) @2981   22 years jrm21 Added a minimal powerpoint plugin that causes an external converter to …
(edit) @2980   22 years jrm21 Added converted_to, which tells us what format the last input file we …
(edit) @2979   22 years jrm21 Use self->converted_to instead of convert_to, in case the file could …
(edit) @2975   22 years jrm21 Tidied up usage info to fit in 80 columns. Fixed title_sub stuff, so …
(edit) @2974   22 years jrm21 added a newline to soft link error message
(edit) @2973   22 years sjboddie Fixed a bug in the Hierarchy classifier
(edit) @2956   22 years jrm21 Added Don Gourley's changes for getting Sections to work properly.
(edit) @2955   22 years jrm21 Added removeprefix option. Added better usage information of the options.
(edit) @2954   22 years jrm21 added a remove_prefix option to strip from metadata before sorting for …
(edit) @2925   22 years sjboddie Altered the format of the GreenstoneArchive and …
(edit) @2918   22 years jrm21 Add [Title] metadata so that the default format strings will show …
(edit) @2916   22 years jrm21 Tidied up the usage output.
(edit) @2901   22 years jrm21 We now interprete some latex commands in the input, mostly to do with …
(edit) @2899   22 years sjboddie Added Alan Christensen's W3ImagePlug
(edit) @2897   22 years sjboddie Added AZCompactSectionList which was contributed by Don Gourley …
(edit) @2896   22 years sjboddie Fixed a small bug in the way XMLPlug was implemented - previously it …
(edit) @2891   22 years jrm21 Don't print out segment number if verbosity is set to zero.
(edit) @2890   22 years sjboddie Added xml_entity function to XMLPlug
(edit) @2889   22 years jrm21 Need to define $outhandle before using it in reclassify.
(edit) @2888   22 years sjboddie Removed extra white space that was being added inside all <Content> …
(edit) @2886   22 years jrm21 Fixed some encoding issues - need to convert to utf-8 after …
(edit) @2883   22 years paynter This Plugin can be used to import any file to Greenstone, regardless …
(edit) @2882   22 years paynter Compensate for change to "convert" output (size data goes to STDERR …
(edit) @2858   22 years sjboddie * empty log message *
(edit) @2847   22 years sjboddie Altered EMAILPlug a little so it now treats all text that it used to …
(edit) @2846   22 years sjboddie * empty log message *
(edit) @2845   22 years sjboddie Caught SplitPlug up with recent changes
(edit) @2837   22 years sjboddie added hlist_at_top option to Hierarchy classifier
(edit) @2835   22 years dmm9 Corrected pluginfo entry and renamed extract_date to …
(edit) @2819   22 years sjboddie Altered HTMLPlug's description_tags option a bit so it should now also …
(edit) @2818   22 years sjboddie * empty log message *
(edit) @2817   22 years sjboddie Implemented a description_tags option to HTMLPlug for splitting an …
(edit) @2816   22 years sjboddie Added cover_image option to BasPlug for associating a jpeg image as a …
(edit) @2813   23 years sjboddie Altered RecPlug's -use_metadata_files option to use better XML files …
(edit) @2812   23 years sjboddie * empty log message *
(edit) @2811   23 years sjboddie * empty log message *
(edit) @2810   23 years sjboddie Created GAPlug (and XMLPlug base class) to replace the old GMLPlug. …
(edit) @2808   23 years sjboddie * empty log message *
(edit) @2804   23 years sjboddie * empty log message *
(edit) @2803   23 years sjboddie * empty log message *
(edit) @2799   23 years sjboddie Fixed a bug where Word documents containing non-ascii characters …
(edit) @2797   23 years sjboddie * empty log message *
(edit) @2796   23 years sjboddie * empty log message *
(edit) @2795   23 years sjboddie Got ZIPPlug working under under windows
(edit) @2793   23 years sjboddie * empty log message *
(edit) @2785   23 years sjboddie The build process now creates a summary of how many files were …
(edit) @2781   23 years jrm21 oops - left off a '$' at end of a pattern match.
(edit) @2779   23 years jrm21 Be a little more flexible when looking for boundary field in a …
(edit) @2772   23 years kjm18 changes to enable language specific collectionmeta in collect.cfg …
(edit) @2771   23 years kjm18 updated this to include the browselist/doclist stuff thats now in …
(edit) @2761   23 years sjboddie added HTMLPlug2 temporarily while testing a new extract_subsections option
(edit) @2755   23 years jrm21 import.pl now takes an option for saving file conversion failures to a …
(edit) @2754   23 years jrm21 oops - left a debugging statement in there.
(edit) @2751   23 years sjboddie Had a go at enriching the default document structure. Added …
(edit) @2735   23 years sjboddie Fixed up bugs I introduced with recent change to BasPlug
(edit) @2734   23 years sjboddie Chinese text segmentation is now done whenever language="zh" instead …
(edit) @2733   23 years jrm21 minor regex fixes/improvements.
(edit) @2732   23 years jrm21 needed <pre> tags when using the text/plain part of a multipart message.
(edit) @2730   23 years jrm21 1) Non-ascii characters should now work for any encoding handled by …
(edit) @2717   23 years jrm21 Do some email munging - @ symbols become &#64;. Both netscape and IE …
(edit) @2713   23 years sjboddie * empty log message *
(edit) @2711   23 years sjboddie Removed the "beta" collect.cfg option to avoid awkward questions from …
(edit) @2700   23 years cs025 fixed this up for building under windows
(edit) @2695   23 years jrm21 Allow spaces in img src=... tags if surrounded with dbl quotes.
(edit) @2685   23 years jrm21 Improved regex for when the last category is too small, and we need to …
(edit) @2681   23 years jrm21 fixed a few more minor MIME header parsing cases.
(edit) @2680   23 years jrm21 1. we escape 'and' chars in headers so greenstone doesn't try to …
(edit) @2667   23 years jrm21 protect against < and > chars, as <pre> tags don't preserve them.
(edit) @2666   23 years jrm21 Modified phind classifier so that special delimiters are always …
(edit) @2662   23 years jrm21 oops, that's a bit stupid (of me) - changed: if …
(edit) @2661   23 years jrm21 added a default block exp of "" so it doesn't inherit HTMLPlugs…
(edit) @2658   23 years jrm21 fixed a typo
(edit) @2657   23 years jrm21 fixed a bug when #including a macro (ie no "... or <... on the line)
(edit) @2652   23 years jrm21 Needed to replace \s with s. Also checked for multipart/related.
(edit) @2638   23 years jrm21 typo in regexp broke import... encoding type should have had [\s], …
(edit) @2632   23 years jrm21 added an option "-bymonth=1", to group by (eg) 2000-January, …
(edit) @2631   23 years jrm21 Don't assume funny dates are 20th C - eg 101 -> 19101 - add to 1900 …
(edit) @2630   23 years jrm21 Mime support for multipart messages. Doesn't extract attachments …
(edit) @2604   23 years jrm21 when extracting email addresses, we now include people in the .net …
(edit) @2601   23 years jrm21 modified usage to not mention HTMLplug blocking rtf.
Note: See TracRevisionLog for help on using the revision log.