|
|
@10347
|
19 years |
kjdon |
removed the unneeded 'use parsargv'
|
|
|
@10277
|
19 years |
chi |
tidy up the filename in add_file().
|
|
|
@10218
|
19 years |
kjdon |
Jeffrey's new parsing modifications, committed approx 6 July, 15.16
|
|
|
@10121
|
19 years |
mdewsnip |
Added the "sectionalise_using_h_tags" option to HTMLPlug, which …
|
|
|
@9747
|
19 years |
davidb |
Encountered new circumstance -- table -- for HTML tags that reference …
|
|
|
@9228
|
19 years |
davidb |
Changed setting URL metadata back to always being done (regardless of …
|
|
|
@9169
|
19 years |
davidb |
HTMLPlug was always setting URL metadata. This only makes sense if …
|
|
|
@9143
|
19 years |
davidb |
Added handling of <embed> tag in a similar fashion to <img>
Also, …
|
|
|
@9125
|
19 years |
mdewsnip |
Added a substr function to unicode.pm that should work correctly on …
|
|
|
@9067
|
19 years |
kjdon |
moved smart blocking stuff in htmlplug metadata_read into basplug …
|
|
|
@9057
|
19 years |
kjdon |
tidied up previous commit
|
|
|
@9056
|
19 years |
kjdon |
added an option to not strip html tags from metadata in description …
|
|
|
@9053
|
19 years |
kjdon |
changed the description tags metadata handling again. now uses an …
|
|
|
@8914
|
19 years |
chi |
Add a smart_block option to deal with associated files of HTML document.
|
|
|
@8843
|
20 years |
jrm21 |
fix problem for -metadata_fields if tag1<Tag2> given for mapping to a …
|
|
|
@8794
|
20 years |
jrm21 |
remove trailing \n from meta tags (bug reported by Tim Finney, 13 Dec 2004)
|
|
|
@8767
|
20 years |
jrm21 |
add 'use utf8' so hopefully substr() is smart enough to cut between …
|
|
|
@8716
|
20 years |
kjdon |
added some changes made by Emanuel Dejanu (Simple Words)
|
|
|
@8668
|
20 years |
kjdon |
when processing description tags, it used to use …
|
|
|
@8509
|
20 years |
chi |
Add new methods (with a smart_block option) to store the blocked …
|
|
|
@8366
|
20 years |
kjdon |
added script to the list of tags to process as relative links, and js …
|
|
|
@8225
|
20 years |
jrm21 |
support tag<tagname> as described in the pluginfo for HTMLPlug. The …
|
|
|
@8121
|
20 years |
chi |
Add the "FileFormat" metadata to each of the Plugins.
|
|
|
@8071
|
20 years |
davidb |
When title metadata is derived from first 100 chars of text,
extra =~ …
|
|
|
@7966
|
20 years |
mdewsnip |
Updated my fix from yesterday, so the collections will work correctly …
|
|
|
@7949
|
20 years |
mdewsnip |
Added a bit of a hack for the wv 0.7.1 bug under Windows that causes …
|
|
|
@7640
|
20 years |
mdewsnip |
Removed the reference to WebPlug, which no longer exists.
|
|
|
@7595
|
20 years |
mdewsnip |
Seem to have fixed the problem with anchors being added to images (for …
|
|
|
@7235
|
20 years |
kjdon |
fixed a couple of bugs and added a bit of output to do with extracting …
|
|
|
@7202
|
20 years |
jrm21 |
rewrote the <meta> tag handling to be more robust and more efficient.
|
|
|
@6812
|
20 years |
mdewsnip |
Additions for the GsdlCollageApplet: a classifier that displays a …
|
|
|
@6651
|
20 years |
kjdon |
fixed a bug I introduced last time
|
|
|
@6649
|
20 years |
kjdon |
changed the regex for getting info out of meta tags so it now works if …
|
|
|
@6408
|
20 years |
jmt12 |
Added two new attributes for script arguments. HiddenGLI controls …
|
|
|
@6332
|
21 years |
jmt12 |
When -gli argument is provided to calling script these modules will …
|
|
|
@5924
|
21 years |
kjdon |
changed the new metadata to eg WordPlug instead of Word, cos a clash …
|
|
|
@5919
|
21 years |
kjdon |
each plugin now adds a metadata field to teh doc obj based on the …
|
|
|
@5680
|
21 years |
mdewsnip |
Moved plugin descriptions into the resource bundle …
|
|
|
@5096
|
21 years |
jmt12 |
Metadata fields actually has nothing to do with the metadata elements …
|
|
|
@5066
|
21 years |
kjdon |
changed HTMLPLug to extract multiple values for the same metadata name
|
|
|
@4873
|
21 years |
mdewsnip |
Further work on standardising option descriptions. Specifically, in …
|
|
|
@4845
|
21 years |
jrm21 |
use add_metadata instead of add_utf8_metadata for Source and URL …
|
|
|
@4821
|
21 years |
jrm21 |
corrected extract_first_NNNN function so that it doesn't get confused …
|
|
|
@4785
|
21 years |
mdewsnip |
Commented out print_usage functions - plugins should now call …
|
|
|
@4748
|
21 years |
mdewsnip |
Changed "metadatum" type to "metadata".
|
|
|
@4744
|
21 years |
mdewsnip |
Tidied up and structures (representing the options of the plugin) in …
|
|
|
@3708
|
21 years |
sjboddie |
Fixed a bug where HTMLPlug failed to associate files whose filenames …
|
|
|
@3540
|
22 years |
kjdon |
added John T's changes into CVS - added info to enable retrieval of …
|
|
|
@3539
|
22 years |
kjdon |
added jpe to the process and block expressions
|
|
|
@3369
|
22 years |
sjboddie |
HTMLPlug will no longer prevent metadata extraction when the …
|
|
|
@3349
|
22 years |
sjboddie |
Bug fix.
|
|
|
@3247
|
22 years |
jrm21 |
Modified automatic title extraction to also recognise utf-8 nbsp as …
|
|
|
@3196
|
22 years |
sjboddie |
Added to the list of entities that HTMLPlug doesn't convert to utf-8
|
|
|
@3181
|
22 years |
sjboddie |
Altered the getcharequiv() function so it now converts entities to raw …
|
|
|
@3148
|
22 years |
jrm21 |
If a document has associated files that are also given a subdirectory, …
|
|
|
@3135
|
22 years |
jrm21 |
modified process_exp to process php3 -named files too.
|
|
|
@3019
|
22 years |
jrm21 |
Fixes for when on windows - it was having a lot of trouble sorting out …
|
|
|
@2995
|
22 years |
sjboddie |
Fixed a bug preventing HTML headers from being removed correctly when …
|
|
|
@2975
|
22 years |
jrm21 |
Tidied up usage info to fit in 80 columns. Fixed title_sub stuff, so …
|
|
|
@2819
|
23 years |
sjboddie |
Altered HTMLPlug's description_tags option a bit so it should now also …
|
|
|
@2817
|
23 years |
sjboddie |
Implemented a description_tags option to HTMLPlug for splitting an …
|
|
|
@2735
|
23 years |
sjboddie |
Fixed up bugs I introduced with recent change to BasPlug
|
|
|
@2695
|
23 years |
jrm21 |
Allow spaces in img src=... tags if surrounded with dbl quotes.
|
|
|
@2564
|
23 years |
jrm21 |
Added RTFPlug. (It's the smallest one so far - 1511 bytes - yay!)
…
|
|
|
@2453
|
23 years |
jrm21 |
Slightly smarter title extraction from body's text.
|
|
|
@2364
|
23 years |
jrm21 |
turn "\" into " " so that we don't lose backslashes along the way…
|
|
|
@2342
|
23 years |
sjboddie |
renamed HTMLPlug's w3mir option to file_is_url
|
|
|
@2219
|
23 years |
sjboddie |
Had another go at suppressing the "subroutine redefined" warnings as …
|
|
|
@2209
|
23 years |
sjboddie |
Suppressed some annoying perl warnings
|
|
|
@1929
|
23 years |
dg5 |
Modified: ConvertToPlug and HTMLPlug to handle files in binary mode to …
|
|
|
@1891
|
23 years |
paynter |
Named characters like é and ì are translated
to UTF8 …
|
|
|
@1844
|
23 years |
sjboddie |
Added an 'auto' argument to BasPlug's '-input_encoding' option ('auto' …
|
|
|
@1699
|
24 years |
say1 |
fixed the bug in HTML plug which broke images for Dave
|
|
|
@1686
|
24 years |
jrm21 |
HTMLPlug no longer blocks .pdf files. (also updated reference to this …
|
|
|
@1653
|
24 years |
paynter |
Fixed a few bugs where incorrect variable names were used.
|
|
|
@1609
|
24 years |
say1 |
fixed print_uage
|
|
|
@1605
|
24 years |
say1 |
fixed some of my earlier mistakes. sorry Stefan
|
|
|
@1602
|
24 years |
say1 |
metadata extraction work. (email addresses, generalised HTML tags, …
|
|
|
@1448
|
24 years |
paynter |
Changed regular expressions for extracting metadata from META tags …
|
|
|
@1435
|
24 years |
davidb |
Rearrangement of ConvertTo inheritence so HTMLPlug and TextPlug do not …
|
|
|
@1431
|
24 years |
sjboddie |
Made a few minor adjustments to perl building code for use with …
|
|
|
@1424
|
24 years |
sjboddie |
Added a -out option to most of the perl building scripts to allow …
|
|
|
@1410
|
24 years |
davidb |
Introduction of "ConvertTo" family of plugins. This establishes
a new …
|
|
|
@1403
|
24 years |
say1 |
taught HTMLPlug about shtml, asp, cgi, php and html query files …
|
|
|
@1400
|
24 years |
davidb |
General tidying of code.
|
|
|
@1358
|
24 years |
nzdl |
Fixed bug I recently introduced into HTMLPlug (<pre> tags were being …
|
|
|
@1312
|
24 years |
sjboddie |
fixed a bug in the HTML plugin that showed up under windows
|
|
|
@1245
|
24 years |
sjboddie |
Fixed a bug that davidb found in a couple of regular expressions
|
|
|
@1244
|
24 years |
sjboddie |
Caught up most general plugins (that's the ones in …
|
|
|
@1243
|
24 years |
sjboddie |
Caught HTMLPlug up with BasPlug. A few minor changes to some …
|
|
|
@1231
|
24 years |
gwp |
Bug fix on the H1 metadata option: if the file has no <H1> tag, …
|
|
|
@1230
|
24 years |
gwp |
Added an additional H1 metadata field that extracts the text
between …
|
|
|
@1220
|
24 years |
sjboddie |
Caught HTMLPlug up with the changes I made to BasPlug. HTMLPlug now …
|
|
|
@1190
|
24 years |
gwp |
The first 200 chars of body text can now be extracted as metadata
by …
|
|
|
@1020
|
24 years |
sjboddie |
changed paths to collection images (again!)
|
|
|
@1010
|
24 years |
sjboddie |
renamed old html module ghtml -- it clashed with builtin html module …
|
|
|
@965
|
24 years |
sjboddie |
fixed bug - added assoc_files option
|
|
|
@900
|
24 years |
sjboddie |
tweaked the way associated files are handled at build time - some …
|
|
|
@897
|
24 years |
sjboddie |
lots of stuff
|
|
|
@850
|
24 years |
sjboddie |
added use strict - tidied a few things up etc.
|
|
|