|
|
@3517
|
22 years |
davidb |
ImagePlug modified so 'Source' metadata set to be consistent with …
|
|
|
@3515
|
22 years |
jrm21 |
call a plugin's set_OID() method if one exists, otherwise use the …
|
|
|
@3508
|
22 years |
jrm21 |
modified copyright statement
|
|
|
@3430
|
22 years |
jrm21 |
Added MARCPlug, mostly done by David Bainbridge. It needs a …
|
|
|
@3427
|
22 years |
sjboddie |
The input encoding will now default to utf8 instead of iso-8859-1. …
|
|
|
@3426
|
22 years |
jrm21 |
Don't add \n to the end of each metadata value.
|
|
|
@3414
|
22 years |
jrm21 |
Need to escape "_" characters so that greenstone doesn't interprete them…
|
|
|
@3411
|
22 years |
jrm21 |
Now takes a "-use_sections" option to make a section per page.
|
|
|
@3400
|
22 years |
sjboddie |
WordPlug now handles .dot files as well as .doc files.
|
|
|
@3398
|
22 years |
jrm21 |
Oops... the last change to the regex was too permissive... fixed up to …
|
|
|
@3397
|
22 years |
jrm21 |
minor change to the regex for marking up urls (to allow #anchor at the end)
|
|
|
@3369
|
22 years |
sjboddie |
HTMLPlug will no longer prevent metadata extraction when the …
|
|
|
@3352
|
22 years |
jrm21 |
We can now properly handle messages with a content type of …
|
|
|
@3351
|
22 years |
jrm21 |
If a message is in an unsupported encoding, we assume iso8859-1. …
|
|
|
@3350
|
22 years |
sjboddie |
Added -use_strings option to ConvertToPlug. The default behaviour for …
|
|
|
@3349
|
22 years |
sjboddie |
Bug fix.
|
|
|
@3329
|
22 years |
jrm21 |
Oops, removed debugging statement!
|
|
|
@3328
|
22 years |
jrm21 |
Make sure that sender's name is more than 0 chars long, otherwise use …
|
|
|
@3307
|
22 years |
davidb |
Some minor modifications to Image Plugin: filenames can now
include …
|
|
|
@3249
|
22 years |
jrm21 |
1) add a space when joining consecutive lines, just in case.
2) Don't …
|
|
|
@3248
|
22 years |
jrm21 |
If we convert to HTML, we post-process to change named entities (eg …
|
|
|
@3247
|
22 years |
jrm21 |
Modified automatic title extraction to also recognise utf-8 nbsp as …
|
|
|
@3215
|
22 years |
jrm21 |
Fixed up some regexs for mime header encodings - eg people with …
|
|
|
@3206
|
22 years |
jrm21 |
Oops! Bad things were happening when the headers said utf-8 encoding, …
|
|
|
@3196
|
22 years |
sjboddie |
Added to the list of entities that HTMLPlug doesn't convert to utf-8
|
|
|
@3181
|
22 years |
sjboddie |
Altered the getcharequiv() function so it now converts entities to raw …
|
|
|
@3156
|
22 years |
jrm21 |
Added a few extra accented characters, and recognise some …
|
|
|
@3148
|
22 years |
jrm21 |
If a document has associated files that are also given a subdirectory, …
|
|
|
@3143
|
22 years |
jrm21 |
Minor tweak for badly formatted dates. We now use a window, so …
|
|
|
@3142
|
22 years |
jrm21 |
1) We can't use "Date" for the year metadata, as greenstone assumes …
|
|
|
@3137
|
22 years |
paynter |
Changed the way Width, Height, Size and Type metadata is calculated. …
|
|
|
@3136
|
22 years |
paynter |
Reconciled John's version of my changes to EMAILPlug with my version …
|
|
|
@3135
|
22 years |
jrm21 |
modified process_exp to process php3 -named files too.
|
|
|
@3134
|
22 years |
jrm21 |
1) Convert headers to detected charset if possible.
2) Convert header …
|
|
|
@3132
|
22 years |
jrm21 |
Try to determine the encoding used in the headers in case it is not …
|
|
|
@3116
|
22 years |
sjboddie |
RecPlug will now die with an error if it finds a metadata.xml file …
|
|
|
@3112
|
22 years |
jrm21 |
minor changes to formatted values (eg if enclosed in { and } ) and …
|
|
|
@3111
|
22 years |
jrm21 |
Allow .eml extension (IE and mozilla default to this for individual …
|
|
|
@3108
|
22 years |
jrm21 |
Don't recursive into directories if they are symbolic links and point …
|
|
|
@3107
|
22 years |
jrm21 |
fixed problem where documents after a "bad" document would not be
read …
|
|
|
@3094
|
22 years |
jrm21 |
Needed to add failhandle to the init() function, to pass to BasPlug.
|
|
|
@3086
|
22 years |
nzdl |
* empty log message *
|
|
|
@3073
|
22 years |
jrm21 |
1) Default Title now correctly escapes [ and ] chars.
2) …
|
|
|
@3038
|
22 years |
jrm21 |
Put \" \" around href for srclink, in case the collection name has …
|
|
|
@3037
|
22 years |
jrm21 |
title_sub seems to always get defined by parsargv, so we test that it …
|
|
|
@3019
|
22 years |
jrm21 |
Fixes for when on windows - it was having a lot of trouble sorting out …
|
|
|
@2996
|
22 years |
sjboddie |
* empty log message *
|
|
|
@2995
|
22 years |
sjboddie |
Fixed a bug preventing HTML headers from being removed correctly when …
|
|
|
@2990
|
22 years |
jrm21 |
Do MS Excel using ConvertToPlug, which currently uses the xlhtml package.
|
|
|
@2981
|
22 years |
jrm21 |
Added a minimal powerpoint plugin that causes an external converter to …
|
|
|
@2980
|
22 years |
jrm21 |
Added converted_to, which tells us what format the last input file we …
|
|
|
@2979
|
22 years |
jrm21 |
Use self->converted_to instead of convert_to, in case the file could …
|
|
|
@2975
|
22 years |
jrm21 |
Tidied up usage info to fit in 80 columns. Fixed title_sub stuff, so …
|
|
|
@2925
|
22 years |
sjboddie |
Altered the format of the GreenstoneArchive and …
|
|
|
@2918
|
22 years |
jrm21 |
Add [Title] metadata so that the default format strings will show …
|
|
|
@2901
|
22 years |
jrm21 |
We now interprete some latex commands in the input, mostly to do with …
|
|
|
@2899
|
22 years |
sjboddie |
Added Alan Christensen's W3ImagePlug
|
|
|
@2896
|
22 years |
sjboddie |
Fixed a small bug in the way XMLPlug was implemented - previously it …
|
|
|
@2891
|
22 years |
jrm21 |
Don't print out segment number if verbosity is set to zero.
|
|
|
@2890
|
22 years |
sjboddie |
Added xml_entity function to XMLPlug
|
|
|
@2886
|
23 years |
jrm21 |
Fixed some encoding issues - need to convert to utf-8 after …
|
|
|
@2883
|
23 years |
paynter |
This Plugin can be used to import any file to Greenstone, regardless …
|
|
|
@2882
|
23 years |
paynter |
Compensate for change to "convert" output (size data goes to STDERR …
|
|
|
@2847
|
23 years |
sjboddie |
Altered EMAILPlug a little so it now treats all text that it used to …
|
|
|
@2845
|
23 years |
sjboddie |
Caught SplitPlug up with recent changes
|
|
|
@2835
|
23 years |
dmm9 |
Corrected pluginfo entry and renamed extract_date to …
|
|
|
@2819
|
23 years |
sjboddie |
Altered HTMLPlug's description_tags option a bit so it should now also …
|
|
|
@2818
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2817
|
23 years |
sjboddie |
Implemented a description_tags option to HTMLPlug for splitting an …
|
|
|
@2816
|
23 years |
sjboddie |
Added cover_image option to BasPlug for associating a jpeg image as a …
|
|
|
@2813
|
23 years |
sjboddie |
Altered RecPlug's -use_metadata_files option to use better XML files …
|
|
|
@2812
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2811
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2810
|
23 years |
sjboddie |
Created GAPlug (and XMLPlug base class) to replace the old GMLPlug. …
|
|
|
@2799
|
23 years |
sjboddie |
Fixed a bug where Word documents containing non-ascii characters …
|
|
|
@2796
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2795
|
23 years |
sjboddie |
Got ZIPPlug working under under windows
|
|
|
@2785
|
23 years |
sjboddie |
The build process now creates a summary of how many files were …
|
|
|
@2781
|
23 years |
jrm21 |
oops - left off a '$' at end of a pattern match.
|
|
|
@2779
|
23 years |
jrm21 |
Be a little more flexible when looking for boundary field in a …
|
|
|
@2761
|
23 years |
sjboddie |
added HTMLPlug2 temporarily while testing a new extract_subsections option
|
|
|
@2755
|
23 years |
jrm21 |
import.pl now takes an option for saving file conversion failures to a …
|
|
|
@2754
|
23 years |
jrm21 |
oops - left a debugging statement in there.
|
|
|
@2751
|
23 years |
sjboddie |
Had a go at enriching the default document structure.
Added …
|
|
|
@2735
|
23 years |
sjboddie |
Fixed up bugs I introduced with recent change to BasPlug
|
|
|
@2734
|
23 years |
sjboddie |
Chinese text segmentation is now done whenever language="zh" instead …
|
|
|
@2733
|
23 years |
jrm21 |
minor regex fixes/improvements.
|
|
|
@2732
|
23 years |
jrm21 |
needed <pre> tags when using the text/plain part of a multipart message.
|
|
|
@2730
|
23 years |
jrm21 |
1) Non-ascii characters should now work for any encoding handled by …
|
|
|
@2717
|
23 years |
jrm21 |
Do some email munging - @ symbols become @. Both netscape and IE …
|
|
|
@2695
|
23 years |
jrm21 |
Allow spaces in img src=... tags if surrounded with dbl quotes.
|
|
|
@2681
|
23 years |
jrm21 |
fixed a few more minor MIME header parsing cases.
|
|
|
@2680
|
23 years |
jrm21 |
1. we escape 'and' chars in headers so greenstone doesn't try to …
|
|
|
@2667
|
23 years |
jrm21 |
protect against < and > chars, as <pre> tags don't preserve them.
|
|
|
@2662
|
23 years |
jrm21 |
oops, that's a bit stupid (of me) - changed:
if …
|
|
|
@2661
|
23 years |
jrm21 |
added a default block exp of "" so it doesn't inherit HTMLPlugs…
|
|
|
@2657
|
23 years |
jrm21 |
fixed a bug when #including a macro (ie no "... or <... on the line)
|
|
|
@2652
|
23 years |
jrm21 |
Needed to replace \s with s. Also checked for multipart/related.
|
|
|
@2638
|
23 years |
jrm21 |
typo in regexp broke import... encoding type should have had [\s], …
|
|
|
@2630
|
23 years |
jrm21 |
Mime support for multipart messages. Doesn't extract attachments …
|
|
|