|
|
@16765
|
16 years |
ak19 |
Only removes comments in head tag now when working out the encoding
|
|
|
@16753
|
16 years |
ak19 |
get_language_encoding for HTMLFiles strips out the comments before …
|
|
|
@16735
|
16 years |
ak19 |
When a directory of interlinking html files is dropped into GLI, …
|
|
|
@16726
|
16 years |
mdewsnip |
Added quote marks around sqlite executable path so it works when …
|
|
|
@16725
|
16 years |
mdewsnip |
Undid one of my changes from yesterday -- turns out the oaiserver …
|
|
|
@16724
|
16 years |
ak19 |
1. Dr Bainbridge added some language-encoding related methods that …
|
|
|
@16719
|
16 years |
ak19 |
Dr Bainbridge has changed several methods so that they can now be …
|
|
|
@16704
|
16 years |
mdewsnip |
Fixed two bugs with resumption token support.
|
|
|
@16700
|
16 years |
kjdon |
changed a comment
|
|
|
@16699
|
16 years |
kjdon |
added auxiliary parameter to new - needed if you want to do new …
|
|
|
@16698
|
16 years |
kjdon |
added auxiliary parameter to new - needed if you want to do new …
|
|
|
@16697
|
16 years |
kjdon |
if marc mapping file cannot be located, print a warning about can't …
|
|
|
@16696
|
16 years |
kjdon |
added an option to XML parser to strip out namespaces. did this so …
|
|
|
@16695
|
16 years |
kjdon |
the last commit was by mistake - this one removes the print statements …
|
|
|
@16694
|
16 years |
kjdon |
MARCXMLPlugin uses textcat_language_and_encoding method from …
|
|
|
@16693
|
16 years |
kjdon |
MARCXMLPlugin uses textcat_language_and_encoding method from …
|
|
|
@16692
|
16 years |
kjdon |
code to read in marc mapping files moved from MARCXMLPlugin to …
|
|
|
@16677
|
16 years |
davidb |
Minor tweak to EmailPlugin to avoid directories that match \d+ being …
|
|
|
@16674
|
16 years |
ak19 |
Added caching for textcat results on filecontents as well: a second …
|
|
|
@16673
|
16 years |
ak19 |
Removed comment. URL encode and URL decode subroutines added.
|
|
|
@16672
|
16 years |
ak19 |
URL encode and URL decode subroutines added.
|
|
|
@16670
|
16 years |
ak19 |
Instead of base64 encoding the gsdl_source_filename, it now URL …
|
|
|
@16667
|
16 years |
kjdon |
get_language_encoding was setting ->input_encoding, which means its …
|
|
|
@16647
|
16 years |
kjdon |
removed the segmentation lines from store_saved_metadata - this only …
|
|
|
@16646
|
16 years |
kjdon |
now segments all metadata as well as text
|
|
|
@16644
|
16 years |
kjdon |
now uses CJKTextSegmenter to add segmentation functionality to text …
|
|
|
@16643
|
16 years |
kjdon |
removed a couple of 'use xxx' lines that are not needed
|
|
|
@16642
|
16 years |
kjdon |
separate_cjk option and code moved to CJKTextSegmenter, and used by …
|
|
|
@16641
|
16 years |
kjdon |
upgraded this (using unicode 4.0) to include more Chinese characters …
|
|
|
@16640
|
16 years |
kjdon |
helper plugin to separate cjk text into individual characters
|
|
|
@16639
|
16 years |
kjdon |
moved the require diagnostics line to here from ReadTextFile
|
|
|
@16638
|
16 years |
kjdon |
modified store_block_files: includes script (js) files, don't add a …
|
|
|
@16636
|
16 years |
kjdon |
seems to be no longer used - replaced by iso639.pm
|
|
|
@16635
|
16 years |
kjdon |
modified the line where we get rid of the #Updated 13-Mar-2007 bit …
|
|
|
@16634
|
16 years |
kjdon |
removed NULPlugin.add_metadata_as_text as the translation was …
|
|
|
@16632
|
16 years |
ak19 |
Work on supporting non-utf8 characters in filenames
|
|
|
@16580
|
16 years |
ak19 |
Shared subroutine tmp_area_convert_file now ensures that the tailname …
|
|
|
@16578
|
16 years |
ak19 |
1. Base64 encoded gsdlsourcefilename to preserve original filename. 2. …
|
|
|
@16557
|
16 years |
ak19 |
Auto filename encoding has several additional settings now, these are …
|
|
|
@16556
|
16 years |
ak19 |
Added strings for additional types of auto settings for …
|
|
|
@16555
|
16 years |
ak19 |
Instead of sub get_language_encoding applying function ensure_utf8 on …
|
|
|
@16554
|
16 years |
ak19 |
Added subroutines classify_cached and clear_cache. The first of these …
|
|
|
@16553
|
16 years |
ak19 |
Added method check_is_utf8 that will return 1/true if the given string …
|
|
|
@16521
|
16 years |
kjdon |
pass in the file extension to get_tmp_filename otherwise it doesn't …
|
|
|
@16520
|
16 years |
kjdon |
made smart_block option description say deprecated, and added a …
|
|
|
@16506
|
16 years |
mdewsnip |
Now adds gs2:docOID attributes into "<Sec>" tags as well, to prevent …
|
|
|
@16504
|
16 years |
mdewsnip |
Changed some variable names in preparation for fixing the Lucene …
|
|
|
@16462
|
16 years |
ak19 |
1. FEDORA_VERSION has become the secondary environment variable when …
|
|
|
@16442
|
16 years |
ak19 |
Fixed yesterday's adjustment to envvar_prepend and envvar_append to …
|
|
|
@16436
|
16 years |
ak19 |
Moved the utility subroutine is_dir_empty from gsConvert.pl into here …
|
|
|
@16431
|
16 years |
mdewsnip |
Now passes the Greenstone document OID into GS2LuceneIndexer, to help …
|
|
|
@16426
|
16 years |
ak19 |
Minor changes. Although it may not be necessary, using filename_cat to …
|
|
|
@16414
|
16 years |
ak19 |
Slightly better way of dealing with GSDL3HOME not being set in the …
|
|
|
@16411
|
16 years |
ak19 |
Correction to previous 'bugfix' which was actually a mistake. Use of …
|
|
|
@16407
|
16 years |
ak19 |
Corrected change made yesterday: dealing with the undefined case (as …
|
|
|
@16406
|
16 years |
ak19 |
fedora_client_bin is added to PATH using the util package, since it …
|
|
|
@16404
|
16 years |
ak19 |
Subroutines envvar_prepend and envvar_append now only append a new …
|
|
|
@16398
|
16 years |
ak19 |
Need to take into account that catalina_home is undefined for gs2, …
|
|
|
@16396
|
16 years |
ak19 |
Bugfix that caused ingest to fail in GS3 remote cases but (for some …
|
|
|
@16395
|
16 years |
ak19 |
1. For Fedora CATALINA_HOME ought to be its own tomcat, but When …
|
|
|
@16392
|
16 years |
kjdon |
global block pass: read_block is no more, use can_process_this_file to …
|
|
|
@16391
|
16 years |
kjdon |
global block pass: this plugin now does the blocking - when reading …
|
|
|
@16390
|
16 years |
kjdon |
global block pass: read_block is no more. blockign done in a first …
|
|
|
@16388
|
16 years |
kjdon |
global block pass: added in empty file_block_read method
|
|
|
@16386
|
16 years |
kjdon |
global block pass: now uses process_exp instead of block_exp. during …
|
|
|
@16384
|
16 years |
kjdon |
global block pass: new block_hash arg to read and metadata_read. Also …
|
|
|
@16383
|
16 years |
kjdon |
make sure filename is in utf8 before calling generate_images
|
|
|
@16382
|
16 years |
kjdon |
filename_no_path arg to generate_images must now be in utf8, and then …
|
|
|
@16381
|
16 years |
kjdon |
global block pass: added in plugin:file_block_read, which is the …
|
|
|
@16380
|
16 years |
kjdon |
added two methods, get_full_filenames (which used to be in …
|
|
|
@16379
|
16 years |
kjdon |
global block pass: added in extra argument to plugin::read calls
|
|
|
@16375
|
16 years |
kjdon |
need no strict refs for isisplugin
|
|
|
@16363
|
16 years |
ak19 |
Minor changes to output text
|
|
|
@16341
|
16 years |
kjdon |
save attachments in binary mode so they work on windows. Use …
|
|
|
@16339
|
16 years |
davidb |
Added quotes around exec of 'txt2db' so it will work from within a …
|
|
|
@16308
|
16 years |
kjdon |
unhide separate_cjk option in GLI - no longer a global option, just a …
|
|
|
@16301
|
16 years |
ak19 |
sub tmp_area_convert_file--called to replace a plain text source file …
|
|
|
@16300
|
16 years |
mdewsnip |
Fixed another case where '<' and '>' characters in metadata weren't …
|
|
|
@16281
|
16 years |
mdewsnip |
Changed the "-create" parameter to GS2LuceneIndexer to "-removeold" to …
|
|
|
@16266
|
16 years |
davidb |
get_tmp_filename() can now optionally take an argument that is the …
|
|
|
@16259
|
16 years |
mdewsnip |
Removed the "-incremental" option from buildcol.pl (because it didn't …
|
|
|
@16257
|
16 years |
mdewsnip |
Tidied up the block of code that determines whether each doc.xml file …
|
|
|
@16254
|
16 years |
mdewsnip |
Fixed up some crazy XML parsing code and removed unnecessary …
|
|
|
@16252
|
16 years |
mdewsnip |
Fixes to get_doc_dir() and get_new_doc_dir() so if you are importing …
|
|
|
@16247
|
16 years |
ak19 |
Regular expression that processes imagelinks is slightly modified by …
|
|
|
@16240
|
16 years |
mdewsnip |
Added "CREATE INDEX" on document_metadata(docOID) to hugely increase …
|
|
|
@16226
|
16 years |
mdewsnip |
Changed a "DELETE FROM" then "INSERT INTO" into a "INSERT OR REPLACE", …
|
|
|
@16225
|
16 years |
mdewsnip |
Added "IF NOT EXISTS" to the two "CREATE TABLE" commands, to prevent …
|
|
|
@16224
|
16 years |
mdewsnip |
Added "IF NOT EXISTS" to close_infodb_write_handle_sqlite(), just in case.
|
|
|
@16223
|
16 years |
mdewsnip |
Added a couple of comments and now creates an index on the …
|
|
|
@16222
|
16 years |
mdewsnip |
Added a "store_metadata_coverage" option to the collect.cfg file to …
|
|
|
@16193
|
16 years |
kjdon |
forgot to define outhandle last commit
|
|
|
@16178
|
16 years |
mdewsnip |
Greatly improved SQLite database writing speed by adding "BEGIN …
|
|
|
@16177
|
16 years |
mdewsnip |
Variable name change for consistency.
|
|
|
@16176
|
16 years |
mdewsnip |
Added a close_infodb_write_handle() functions and initial versions for …
|
|
|
@16136
|
16 years |
osborn |
Additions for GLI Scheduling Component
|
|
|
@16125
|
16 years |
osborn |
new parser for GLI scheduling component
|
|
|
@16124
|
16 years |
osborn |
Added the check for quotestr type in subroutine processArg
|
|
|
@16104
|
16 years |
kjdon |
tried to make the 'xxxplugin processing file' print statements more …
|
|
|
@16102
|
16 years |
davidb |
Some minor adjustments to ingesting documents into a Fedoar …
|
|
|