source: trunk/gsdl/perllib/plugins/HTMLPlug.pm

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @3540   22 years kjdon added John T's changes into CVS - added info to enable retrieval of …
(edit) @3539   22 years kjdon added jpe to the process and block expressions
(edit) @3369   22 years sjboddie HTMLPlug will no longer prevent metadata extraction when the …
(edit) @3349   22 years sjboddie Bug fix.
(edit) @3247   22 years jrm21 Modified automatic title extraction to also recognise utf-8 nbsp as …
(edit) @3196   22 years sjboddie Added   to the list of entities that HTMLPlug doesn't convert to utf-8
(edit) @3181   22 years sjboddie Altered the getcharequiv() function so it now converts entities to raw …
(edit) @3148   22 years jrm21 If a document has associated files that are also given a subdirectory, …
(edit) @3135   22 years jrm21 modified process_exp to process php3 -named files too.
(edit) @3019   22 years jrm21 Fixes for when on windows - it was having a lot of trouble sorting out …
(edit) @2995   22 years sjboddie Fixed a bug preventing HTML headers from being removed correctly when …
(edit) @2975   22 years jrm21 Tidied up usage info to fit in 80 columns. Fixed title_sub stuff, so …
(edit) @2819   23 years sjboddie Altered HTMLPlug's description_tags option a bit so it should now also …
(edit) @2817   23 years sjboddie Implemented a description_tags option to HTMLPlug for splitting an …
(edit) @2735   23 years sjboddie Fixed up bugs I introduced with recent change to BasPlug
(edit) @2695   23 years jrm21 Allow spaces in img src=... tags if surrounded with dbl quotes.
(edit) @2564   23 years jrm21 Added RTFPlug. (It's the smallest one so far - 1511 bytes - yay!) …
(edit) @2453   23 years jrm21 Slightly smarter title extraction from body's text.
(edit) @2364   23 years jrm21 turn "\" into " " so that we don't lose backslashes along the way…
(edit) @2342   23 years sjboddie renamed HTMLPlug's w3mir option to file_is_url
(edit) @2219   23 years sjboddie Had another go at suppressing the "subroutine redefined" warnings as …
(edit) @2209   23 years sjboddie Suppressed some annoying perl warnings
(edit) @1929   23 years dg5 Modified: ConvertToPlug and HTMLPlug to handle files in binary mode to …
(edit) @1891   23 years paynter Named characters like é and ì are translated to UTF8 …
(edit) @1844   23 years sjboddie Added an 'auto' argument to BasPlug's '-input_encoding' option ('auto' …
(edit) @1699   24 years say1 fixed the bug in HTML plug which broke images for Dave
(edit) @1686   24 years jrm21 HTMLPlug no longer blocks .pdf files. (also updated reference to this …
(edit) @1653   24 years paynter Fixed a few bugs where incorrect variable names were used.
(edit) @1609   24 years say1 fixed print_uage
(edit) @1605   24 years say1 fixed some of my earlier mistakes. sorry Stefan
(edit) @1602   24 years say1 metadata extraction work. (email addresses, generalised HTML tags, …
(edit) @1448   24 years paynter Changed regular expressions for extracting metadata from META tags …
(edit) @1435   24 years davidb Rearrangement of ConvertTo inheritence so HTMLPlug and TextPlug do not …
(edit) @1431   24 years sjboddie Made a few minor adjustments to perl building code for use with …
(edit) @1424   24 years sjboddie Added a -out option to most of the perl building scripts to allow …
(edit) @1410   24 years davidb Introduction of "ConvertTo" family of plugins. This establishes a new …
(edit) @1403   24 years say1 taught HTMLPlug about shtml, asp, cgi, php and html query files …
(edit) @1400   24 years davidb General tidying of code.
(edit) @1358   24 years nzdl Fixed bug I recently introduced into HTMLPlug (<pre> tags were being …
(edit) @1312   24 years sjboddie fixed a bug in the HTML plugin that showed up under windows
(edit) @1245   24 years sjboddie Fixed a bug that davidb found in a couple of regular expressions
(edit) @1244   24 years sjboddie Caught up most general plugins (that's the ones in …
(edit) @1243   24 years sjboddie Caught HTMLPlug up with BasPlug. A few minor changes to some …
(edit) @1231   24 years gwp Bug fix on the H1 metadata option: if the file has no <H1> tag, …
(edit) @1230   24 years gwp Added an additional H1 metadata field that extracts the text between …
(edit) @1220   24 years sjboddie Caught HTMLPlug up with the changes I made to BasPlug. HTMLPlug now …
(edit) @1190   24 years gwp The first 200 chars of body text can now be extracted as metadata by …
(edit) @1020   24 years sjboddie changed paths to collection images (again!)
(edit) @1010   24 years sjboddie renamed old html module ghtml -- it clashed with builtin html module …
(edit) @965   24 years sjboddie fixed bug - added assoc_files option
(edit) @900   24 years sjboddie tweaked the way associated files are handled at build time - some …
(edit) @897   24 years sjboddie lots of stuff
(edit) @850   24 years sjboddie added use strict - tidied a few things up etc.
(edit) @808   25 years sjboddie New html plugin with options
(edit) @734   25 years sjboddie removed old out of date comments
(edit) @732   25 years sjboddie prevent from overriding Title metadata that may have been passed in …
(edit) @721   25 years davidb Support functions to help with the generation of webpages from Perl …
(edit) @617   25 years sjboddie a few fixes
(edit) @589   25 years sjboddie fixed bug in regular expression
(add) @585   25 years sjboddie new plugin
Note: See TracRevisionLog for help on using the revision log.