Changeset 32577 for main/trunk/greenstone2/perllib/plugins
- Timestamp:
- 2018-11-06T16:26:57+13:00 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/greenstone2/perllib/plugins/GreenstoneSQLPlugin.pm
r32575 r32577 43 43 # back in from the sql db while the remainder is to be read back in from the docsql .xml files. 44 44 45 # TODO: deal with incremental vs removeold. If docs removed from import folder, then import step46 # won't delete it from archives but buildcol step will. Need to implement this with this database plugin or wherever the actual flow is47 48 # TODO Q: is "reindex" = del from db + add to db?49 # - is this okay for reindexing, or will it need to modify existing values (update table)50 # - if it's okay, what does reindex need to accomplish (and how) if the OID changes because hash id produced is different?51 # - delete is accomplished in GS SQL Plugin, during buildcol.pl. When should reindexing take place?52 # during SQL plugout/import.pl or during plugin? If adding is done by GSSQLPlugout, does it need to53 # be reimplemented in GSSQLPlugin to support the adding portion of reindexing.54 55 45 # TODO: Add public instructions on using this plugin and its plugout: start with installing mysql binary, changing pwd, running the server (and the client against it for checking: basic cmds like create and drop). Then discuss db name, table names (per coll), db cols and col types, and how the plugout and plugin work. 56 46 # Discuss the plugin/plugout parameters. 57 47 48 # TODO, test on windows and mac. 49 # Note: if parsing fails (e.g. using wrong plugout like GS XML plugout, which chokes on args intended for SQL plugout) then SQL plugin init would have already been called and done connection, but disconnect would not have been done because SQL plugin disconnect would not have been called upon parse failure. 58 50 59 51 # DONE: … … 78 70 # effect that if the db doesn't exist, gssql::use_db() fails, as it won't create db. 79 71 # This got fixed when GSSQLPlugin stopped connecting on init(). 80 72 # 73 # 74 #+ TODO: deal with incremental vs removeold. If docs removed from import folder, then import step 75 # won't delete it from archives but buildcol step will. Need to implement this with this database plugin or wherever the actual flow is. 76 # 77 # + TODO Q: is "reindex" = del from db + add to db? 78 # - is this okay for reindexing, or will it need to modify existing values (update table) 79 # - if it's okay, what does reindex need to accomplish (and how) if the OID changes because hash id produced is different? 80 # - delete is accomplished in GS SQL Plugin, during buildcol.pl. When should reindexing take place? 81 # during SQL plugout/import.pl or during plugin? If adding is done by GSSQLPlugout, does it need to 82 # be reimplemented in GSSQLPlugin to support the adding portion of reindexing. 83 # 84 # INCREMENTAL REBUILDING IMPLEMENTED CORRECTLY AND WORKS: 85 # Overriding plugins' remove_all() method covered removeold. 86 # Overriding plugins' remove_one() method is all I needed to do for reindex and deletion 87 # (incremental and non-incremental) to work. 88 # but doing all this needed an overhaul of gssql.pm and its use by the GS SQL plugin and plugout. 89 # - needed to correct plugin.pm::remove_some() to process all files 90 # - and needed to correct GreenstoneSQLPlugin::close_document() to setOID() after all 91 # All incremental import and buildcol worked after that: 92 # - deleting files and running incr-import and incr-buildcol (= "incr delete"), 93 # - deleting files and running incr-import and buildcol (="non-incr delete") 94 # - modifying meta and doing an incr rebuild 95 # - modifying fulltext and doing an incr rebuild 96 # - renaming a file forces a reindex: doc is removed from db and added back in, due to remove_one() 97 # - tested CSV file: adding some records, changing some records 98 # + CSVPlugin test (collection csvsql) 99 # + MetadataCSVPlugin test (modified collection sqltest to have metadata.csv refer to the 100 # filenames of sqltest's documents) 101 # + shared image test (collection shareimg): if 2 html files reference the same image, the docs 102 # are indeed both reindexed if the image is modified (e.g. I replaced the image with another 103 # of the same name) which in the GS SQL plugin/plugout case is that the 2 docs are deleted 104 # and added in again. 81 105 82 106 ######################################################################################## … … 189 213 my ($pluginfo, $base_dir, $processor, $maxdocs) = @_; 190 214 215 $self->SUPER::remove_all(@_); 216 191 217 print STDERR " Building with removeold option set, so deleting current collection's tables if they exist\n" if($self->{'verbosity'}); 192 218 … … 227 253 # SO DON'T RETURN IF CAN'T_PROCESS_THIS_FILE 228 254 255 256 my $gs_sql = $self->{'gs_sql'} || return 0; # couldn't make the connection or no db etc 257 229 258 print STDERR "*****************************\nAsked to remove_one oid\n***********************\n"; 230 231 my $gs_sql = $self->{'gs_sql'} || return 0; # couldn't make the connection or no db etc 259 print STDERR "Num oids: " . scalar (@$oids) . "\n"; 232 260 233 261 my $proc_mode = $self->{'process_mode'}; … … 368 396 369 397 370 # TODO: only want to work with sql db if buildcol.pl. Unfortunately, also runs on import.pl.371 # During import, the GS SQL Plugin is called before the GS SQL Plugout with undesirable side372 # effect that if the db doesn't exist, gssql::use_db() fails, as it won't create db.373 374 398 # GS SQL Plugin::init() (and deinit()) is called by import.pl and also by buildcol.pl 375 399 # This means it connects and deconnects during import.pl as well. This is okay 376 # as removeold, which should drop the collection tables, happens during the import phase 377 # and therefore also requires a db connection.400 # as removeold, which should drop the collection tables, happens during the import phase, 401 # calling GreenstoneSQLPlugin::and therefore also requires a db connection. 378 402 # TODO: Eventually can try moving get_gssql_instance into gssql.pm? That way both GS SQL Plugin 379 403 # and Plugout would be using one connection during import.pl phase when both plugs exist.
Note:
See TracChangeset
for help on using the changeset viewer.