source: gsdl/trunk/perllib

Revision Log Mode:


Legend:

Added
Modified
Copied or renamed
Diff Rev Age Author Log Message
(edit) @16695   16 years kjdon the last commit was by mistake - this one removes the print statements …
(edit) @16694   16 years kjdon MARCXMLPlugin uses textcat_language_and_encoding method from …
(edit) @16693   16 years kjdon MARCXMLPlugin uses textcat_language_and_encoding method from …
(edit) @16692   16 years kjdon code to read in marc mapping files moved from MARCXMLPlugin to …
(edit) @16677   16 years davidb Minor tweak to EmailPlugin to avoid directories that match \d+ being …
(edit) @16674   16 years ak19 Added caching for textcat results on filecontents as well: a second …
(edit) @16673   16 years ak19 Removed comment. URL encode and URL decode subroutines added.
(edit) @16672   16 years ak19 URL encode and URL decode subroutines added.
(edit) @16670   16 years ak19 Instead of base64 encoding the gsdl_source_filename, it now URL …
(edit) @16667   16 years kjdon get_language_encoding was setting ->input_encoding, which means its …
(edit) @16647   16 years kjdon removed the segmentation lines from store_saved_metadata - this only …
(edit) @16646   16 years kjdon now segments all metadata as well as text
(edit) @16644   16 years kjdon now uses CJKTextSegmenter to add segmentation functionality to text …
(edit) @16643   16 years kjdon removed a couple of 'use xxx' lines that are not needed
(edit) @16642   16 years kjdon separate_cjk option and code moved to CJKTextSegmenter, and used by …
(edit) @16641   16 years kjdon upgraded this (using unicode 4.0) to include more Chinese characters …
(edit) @16640   16 years kjdon helper plugin to separate cjk text into individual characters
(edit) @16639   16 years kjdon moved the require diagnostics line to here from ReadTextFile
(edit) @16638   16 years kjdon modified store_block_files: includes script (js) files, don't add a …
(edit) @16636   16 years kjdon seems to be no longer used - replaced by iso639.pm
(edit) @16635   16 years kjdon modified the line where we get rid of the #Updated 13-Mar-2007 bit …
(edit) @16634   16 years kjdon removed NULPlugin.add_metadata_as_text as the translation was …
(edit) @16632   16 years ak19 Work on supporting non-utf8 characters in filenames
(edit) @16580   16 years ak19 Shared subroutine tmp_area_convert_file now ensures that the tailname …
(edit) @16578   16 years ak19 1. Base64 encoded gsdlsourcefilename to preserve original filename. 2. …
(edit) @16557   16 years ak19 Auto filename encoding has several additional settings now, these are …
(edit) @16556   16 years ak19 Added strings for additional types of auto settings for …
(edit) @16555   16 years ak19 Instead of sub get_language_encoding applying function ensure_utf8 on …
(edit) @16554   16 years ak19 Added subroutines classify_cached and clear_cache. The first of these …
(edit) @16553   16 years ak19 Added method check_is_utf8 that will return 1/true if the given string …
(edit) @16521   16 years kjdon pass in the file extension to get_tmp_filename otherwise it doesn't …
(edit) @16520   16 years kjdon made smart_block option description say deprecated, and added a …
(edit) @16506   16 years mdewsnip Now adds gs2:docOID attributes into "<Sec>" tags as well, to prevent …
(edit) @16504   16 years mdewsnip Changed some variable names in preparation for fixing the Lucene …
(edit) @16462   16 years ak19 1. FEDORA_VERSION has become the secondary environment variable when …
(edit) @16442   16 years ak19 Fixed yesterday's adjustment to envvar_prepend and envvar_append to …
(edit) @16436   16 years ak19 Moved the utility subroutine is_dir_empty from gsConvert.pl into here …
(edit) @16431   16 years mdewsnip Now passes the Greenstone document OID into GS2LuceneIndexer, to help …
(edit) @16426   16 years ak19 Minor changes. Although it may not be necessary, using filename_cat to …
(edit) @16414   16 years ak19 Slightly better way of dealing with GSDL3HOME not being set in the …
(edit) @16411   16 years ak19 Correction to previous 'bugfix' which was actually a mistake. Use of …
(edit) @16407   16 years ak19 Corrected change made yesterday: dealing with the undefined case (as …
(edit) @16406   16 years ak19 fedora_client_bin is added to PATH using the util package, since it …
(edit) @16404   16 years ak19 Subroutines envvar_prepend and envvar_append now only append a new …
(edit) @16398   16 years ak19 Need to take into account that catalina_home is undefined for gs2, …
(edit) @16396   16 years ak19 Bugfix that caused ingest to fail in GS3 remote cases but (for some …
(edit) @16395   16 years ak19 1. For Fedora CATALINA_HOME ought to be its own tomcat, but When …
(edit) @16392   16 years kjdon global block pass: read_block is no more, use can_process_this_file to …
(edit) @16391   16 years kjdon global block pass: this plugin now does the blocking - when reading …
(edit) @16390   16 years kjdon global block pass: read_block is no more. blockign done in a first …
(edit) @16388   16 years kjdon global block pass: added in empty file_block_read method
(edit) @16386   16 years kjdon global block pass: now uses process_exp instead of block_exp. during …
(edit) @16384   16 years kjdon global block pass: new block_hash arg to read and metadata_read. Also …
(edit) @16383   16 years kjdon make sure filename is in utf8 before calling generate_images
(edit) @16382   16 years kjdon filename_no_path arg to generate_images must now be in utf8, and then …
(edit) @16381   16 years kjdon global block pass: added in plugin:file_block_read, which is the …
(edit) @16380   16 years kjdon added two methods, get_full_filenames (which used to be in …
(edit) @16379   16 years kjdon global block pass: added in extra argument to plugin::read calls
(edit) @16375   16 years kjdon need no strict refs for isisplugin
(edit) @16363   16 years ak19 Minor changes to output text
(edit) @16341   16 years kjdon save attachments in binary mode so they work on windows. Use …
(edit) @16339   16 years davidb Added quotes around exec of 'txt2db' so it will work from within a …
(edit) @16308   16 years kjdon unhide separate_cjk option in GLI - no longer a global option, just a …
(edit) @16301   16 years ak19 sub tmp_area_convert_file--called to replace a plain text source file …
(edit) @16300   16 years mdewsnip Fixed another case where '<' and '>' characters in metadata weren't …
(edit) @16281   16 years mdewsnip Changed the "-create" parameter to GS2LuceneIndexer to "-removeold" to …
(edit) @16266   16 years davidb get_tmp_filename() can now optionally take an argument that is the …
(edit) @16259   16 years mdewsnip Removed the "-incremental" option from buildcol.pl (because it didn't …
(edit) @16257   16 years mdewsnip Tidied up the block of code that determines whether each doc.xml file …
(edit) @16254   16 years mdewsnip Fixed up some crazy XML parsing code and removed unnecessary …
(edit) @16252   16 years mdewsnip Fixes to get_doc_dir() and get_new_doc_dir() so if you are importing …
(edit) @16247   16 years ak19 Regular expression that processes imagelinks is slightly modified by …
(edit) @16240   16 years mdewsnip Added "CREATE INDEX" on document_metadata(docOID) to hugely increase …
(edit) @16226   16 years mdewsnip Changed a "DELETE FROM" then "INSERT INTO" into a "INSERT OR REPLACE", …
(edit) @16225   16 years mdewsnip Added "IF NOT EXISTS" to the two "CREATE TABLE" commands, to prevent …
(edit) @16224   16 years mdewsnip Added "IF NOT EXISTS" to close_infodb_write_handle_sqlite(), just in case.
(edit) @16223   16 years mdewsnip Added a couple of comments and now creates an index on the …
(edit) @16222   16 years mdewsnip Added a "store_metadata_coverage" option to the collect.cfg file to …
(edit) @16193   16 years kjdon forgot to define outhandle last commit
(edit) @16178   16 years mdewsnip Greatly improved SQLite database writing speed by adding "BEGIN …
(edit) @16177   16 years mdewsnip Variable name change for consistency.
(edit) @16176   16 years mdewsnip Added a close_infodb_write_handle() functions and initial versions for …
(edit) @16136   16 years osborn Additions for GLI Scheduling Component
(edit) @16125   16 years osborn new parser for GLI scheduling component
(edit) @16124   16 years osborn Added the check for quotestr type in subroutine processArg
(edit) @16104   16 years kjdon tried to make the 'xxxplugin processing file' print statements more …
(edit) @16102   16 years davidb Some minor adjustments to ingesting documents into a Fedoar …
(edit) @16025   16 years kjdon added license info
(edit) @16024   16 years kjdon indented the file properly
(edit) @16022   16 years kjdon removed SourceUTF8 metadata, Source metadata is now utf8. Note, still …
(edit) @16021   16 years kjdon commented out input_encoding stuff cos we don't have that option …
(edit) @16019   16 years kjdon changed some more string keys
(edit) @16018   16 years kjdon added in some missing plugin strings
(edit) @16017   16 years kjdon renamed lots of keys - the ones where there wasn't a simple mapping …
(edit) @16016   16 years kjdon changed some key names for strings.properties
(edit) @16014   16 years kjdon changed some strings.properties key names.
(edit) @16013   16 years kjdon updated soem plugin names in some of the keys for strings.properties
(edit) @16012   16 years kjdon moved the -first option to AutoExtractMetadata
(edit) @16011   16 years kjdon moved the -first option to here from ReadTextFile
(edit) @16010   16 years kjdon changed the imagemagick check before calling generate images
Note: See TracRevisionLog for help on using the revision log.