|
|
@9703
|
19 years |
mdewsnip |
Improvement to previous change so "file not processed" messages are …
|
|
|
@9586
|
19 years |
mdewsnip |
Added a ProcessingError message so the GLI knows when a file failed to …
|
|
|
@9584
|
19 years |
mdewsnip |
Plugins that return -1 from their read function now must output the …
|
|
|
@9413
|
19 years |
jrm21 |
if we are trying to automatically determine the encoding, look for a …
|
|
|
@9403
|
19 years |
jrm21 |
need to 'bless' an object before you can call functions in it
(for …
|
|
|
@9398
|
19 years |
davidb |
Introduction of GISBasPlug for Geographic Informatoin System support. …
|
|
|
@9351
|
19 years |
davidb |
Two changes:
1. Fusing files with the same root filename is meant …
|
|
|
@9067
|
19 years |
kjdon |
moved smart blocking stuff in htmlplug metadata_read into basplug …
|
|
|
@8915
|
19 years |
chi |
Add an option-smart_block_BN for BN Portugal Collection.
|
|
|
@8908
|
19 years |
davidb |
BasPlug now sets a piece of metadata [hascover] if document has a …
|
|
|
@8892
|
19 years |
davidb |
Addition of new minus option to BasPlug: -associate_ext.
This new …
|
|
|
@8818
|
20 years |
mdewsnip |
Title tags over multiple lines will now be removed correctly before …
|
|
|
@8814
|
20 years |
mdewsnip |
Updated files for Kea 3.0, thanks to Olena.
|
|
|
@8789
|
20 years |
mdewsnip |
Better documentation of the extract keyphrases (Kea) code, thanks to Olena.
|
|
|
@8761
|
20 years |
mdewsnip |
XML plugin descriptions now include an <Explodes> tag that records …
|
|
|
@8716
|
20 years |
kjdon |
added some changes made by Emanuel Dejanu (Simple Words)
|
|
|
@8678
|
20 years |
kjdon |
cover images are now turned on by default, and the option is changed …
|
|
|
@8510
|
20 years |
chi |
Add a new method metadat_read to deal with specific (or external) …
|
|
|
@8166
|
20 years |
mdewsnip |
Added FileSize metadata in most plugins.
|
|
|
@7818
|
20 years |
jrm21 |
improvements to the handling of textcat's guessed encoding
|
|
|
@7668
|
20 years |
jrm21 |
renamed "kea" to "Keyphrase" metadata, and add one for each extracted …
|
|
|
@7645
|
20 years |
jrm21 |
don't fail if we can't load the diagnostics package.
|
|
|
@7644
|
20 years |
jrm21 |
don't print "wrong encoding" message for text in english.
textcat …
|
|
|
@7508
|
20 years |
kjdon |
changed the plugin metadata - instead of having eg HTMLPlug metadata …
|
|
|
@7504
|
20 years |
davidb |
ImagePlug, MP3Plug, UnknownPlug modified to set Title metadata based …
|
|
|
@7362
|
20 years |
kjdon |
plugin read functions now return 'undef' - didn't recognise, '-1' - …
|
|
|
@7105
|
20 years |
kjdon |
changed the max century arg to a string instead of an int - need to be …
|
|
|
@7023
|
20 years |
kjdon |
fixed up the <tag> display for pluginfo and clasinfo. < and > should …
|
|
|
@6987
|
20 years |
mdewsnip |
Missed changing some print()s to gsprintf()s.
|
|
|
@6945
|
20 years |
mdewsnip |
Updated the resource bundle handling code some more. Strings are first …
|
|
|
@6932
|
20 years |
kjdon |
changed the output slightly, and now outputs the classifier/plugin …
|
|
|
@6925
|
20 years |
mdewsnip |
Changed the way display in different languages is done. Instead of …
|
|
|
@6918
|
20 years |
mdewsnip |
Removed some code I commented out.
|
|
|
@6584
|
20 years |
kjdon |
Fiddled around with segmenting for chinese text. Haven't changed how …
|
|
|
@6408
|
20 years |
jmt12 |
Added two new attributes for script arguments. HiddenGLI controls …
|
|
|
@6332
|
21 years |
jmt12 |
When -gli argument is provided to calling script these modules will …
|
|
|
@5924
|
21 years |
kjdon |
changed the new metadata to eg WordPlug instead of Word, cos a clash …
|
|
|
@5919
|
21 years |
kjdon |
each plugin now adds a metadata field to teh doc obj based on the …
|
|
|
@5681
|
21 years |
mdewsnip |
Rewritten option display code (used by all plugins) to use the new …
|
|
|
@4873
|
21 years |
mdewsnip |
Further work on standardising option descriptions. Specifically, in …
|
|
|
@4845
|
21 years |
jrm21 |
use add_metadata instead of add_utf8_metadata for Source and URL …
|
|
|
@4785
|
21 years |
mdewsnip |
Commented out print_usage functions - plugins should now call …
|
|
|
@4778
|
21 years |
mdewsnip |
Modified the code for generating the usage texts to use the methods in …
|
|
|
@4764
|
21 years |
mdewsnip |
Replaced call to removed function print_generic_usage() with a call to …
|
|
|
@4750
|
21 years |
mdewsnip |
Improved formatting of usage texts automatically generated from John's …
|
|
|
@4746
|
21 years |
mdewsnip |
Initial attempt at a generic print usage function which works with the …
|
|
|
@4744
|
21 years |
mdewsnip |
Tidied up and structures (representing the options of the plugin) in …
|
|
|
@3834
|
21 years |
sjboddie |
Prevent "use bytes" from causing errors for older perls
|
|
|
@3767
|
21 years |
sjboddie |
Scattered some "use bytes" pragmas around to try to prevent perl-5.8 …
|
|
|
@3731
|
21 years |
jrm21 |
If textcat returns too many possibilities, use the default language …
|
|
|
@3540
|
22 years |
kjdon |
added John T's changes into CVS - added info to enable retrieval of …
|
|
|
@3515
|
22 years |
jrm21 |
call a plugin's set_OID() method if one exists, otherwise use the …
|
|
|
@3427
|
22 years |
sjboddie |
The input encoding will now default to utf8 instead of iso-8859-1. …
|
|
|
@3086
|
22 years |
nzdl |
* empty log message *
|
|
|
@2835
|
23 years |
dmm9 |
Corrected pluginfo entry and renamed extract_date to …
|
|
|
@2816
|
23 years |
sjboddie |
Added cover_image option to BasPlug for associating a jpeg image as a …
|
|
|
@2811
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2796
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2795
|
23 years |
sjboddie |
Got ZIPPlug working under under windows
|
|
|
@2785
|
23 years |
sjboddie |
The build process now creates a summary of how many files were …
|
|
|
@2755
|
23 years |
jrm21 |
import.pl now takes an option for saving file conversion failures to a …
|
|
|
@2751
|
23 years |
sjboddie |
Had a go at enriching the default document structure.
Added …
|
|
|
@2734
|
23 years |
sjboddie |
Chinese text segmentation is now done whenever language="zh" instead …
|
|
|
@2604
|
23 years |
jrm21 |
when extracting email addresses, we now include people in the .net …
|
|
|
@2601
|
23 years |
jrm21 |
modified usage to not mention HTMLplug blocking rtf.
|
|
|
@2327
|
23 years |
sjboddie |
* empty log message *
|
|
|
@2235
|
23 years |
sjboddie |
Hacked the textcat package about so that it only reads all the …
|
|
|
@2219
|
23 years |
sjboddie |
Had another go at suppressing the "subroutine redefined" warnings as …
|
|
|
@2084
|
23 years |
jrm21 |
usage message is now formatted to fit within 80 columns.
|
|
|
@1999
|
23 years |
sjboddie |
Fixed a small problem with language detection code.
|
|
|
@1954
|
23 years |
jmt14 |
* empty log message *
|
|
|
@1903
|
23 years |
sjboddie |
We now use textcats best guess if it returns 3 or less possibilities …
|
|
|
@1874
|
23 years |
sjboddie |
* empty log message *
|
|
|
@1870
|
23 years |
sjboddie |
Tidied up language support stuff.
|
|
|
@1868
|
23 years |
sjboddie |
Made a bunch of changes to the building code to support lots of new …
|
|
|
@1857
|
23 years |
dmm9 |
date extraction options documented
|
|
|
@1855
|
23 years |
paynter |
Trivial change to warning message.
|
|
|
@1846
|
23 years |
sjboddie |
Removed a call to a function that I removed in my previous changes - oops
|
|
|
@1845
|
23 years |
paynter |
Changed a "!=" to a "ne".
|
|
|
@1844
|
23 years |
sjboddie |
Added an 'auto' argument to BasPlug's '-input_encoding' option ('auto' …
|
|
|
@1838
|
23 years |
sjboddie |
Added support for Cyrillic languages (windows codepage 1251) - yet to …
|
|
|
@1756
|
24 years |
say1 |
added detection and handling of unreadable files
|
|
|
@1720
|
24 years |
dmm9 |
Added information to the usage text about date extraction option
|
|
|
@1719
|
24 years |
dmm9 |
Added information to the usage text about date extraction option
|
|
|
@1718
|
24 years |
dmm9 |
Added information to the usage text about date extraction option
|
|
|
@1686
|
24 years |
jrm21 |
HTMLPlug no longer blocks .pdf files. (also updated reference to this …
|
|
|
@1605
|
24 years |
say1 |
fixed some of my earlier mistakes. sorry Stefan
|
|
|
@1602
|
24 years |
say1 |
metadata extraction work. (email addresses, generalised HTML tags, …
|
|
|
@1424
|
24 years |
sjboddie |
Added a -out option to most of the perl building scripts to allow …
|
|
|
@1411
|
24 years |
dmm9 |
added the options for the date extractor
|
|
|
@1396
|
24 years |
say1 |
changed initialisation code for acronyms
|
|
|
@1393
|
24 years |
say1 |
acronym markup functionality
|
|
|
@1384
|
24 years |
paynter |
Changed language extraction to ignoer encoding information, so that …
|
|
|
@1379
|
24 years |
paynter |
Fixed bug that gave gsdlsourcedocument metadata relative path instead …
|
|
|
@1360
|
24 years |
say1 |
clarified status messages
|
|
|
@1335
|
24 years |
say1 |
many acronym changes
|
|
|
@1317
|
24 years |
paynter |
Added -extract_language option, which uses the textcat language …
|
|
|
@1244
|
24 years |
sjboddie |
Caught up most general plugins (that's the ones in …
|
|
|
@1242
|
24 years |
sjboddie |
Added Stuart Yeate's acronym extraction code and made it a standard …
|
|
|
@1229
|
24 years |
sjboddie |
fixed bug in options
|
|
|