- Timestamp:
- 2018-10-26T15:10:47+13:00 (6 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
main/trunk/greenstone2/perllib/plugins/GreenstoneSQLPlugin.pm
r32542 r32543 40 40 # TODO: 41 41 # - Run TODOs here, in Plugout and in gssql.pm by Dr Bainbridge. 42 # Ask about docsql naming convention adopted to identify OID. Better way?42 # + Ask about docsql naming convention adopted to identify OID. Better way? 43 43 # collection names -> table names: it seems hyphens not allowed. Changed to underscores. 44 44 # + Startup parameters (except removeold/build_mode) 45 # - how do we detect we're to do removeold during plugout in import.pl phase 45 46 # - incremental building: where do we need to add code to delete rows from our sql table after 46 47 # incrementally importing a coll with fewer docs (for instance)? What about deleted/modified meta? 47 # - Courier documents in lucene-sql collection: character (degree symbol) not preserved. Is this because we encode in utf8 when putting into db and reading back in?48 # - "Courier" demo documents in lucene-sql collection: character (degree symbol) not preserved in title. Is this because we encode in utf8 when putting into db and reading back in? 48 49 # - Have not yet tested writing out just meta or just fulltxt to sql db and reading just that 49 50 # back in from the sql db while the remainder is to be read back in from the docsql .xml files. … … 51 52 # (sections) which in this case had made it easy to reconstruct the doc_obj in memory in the correct order 52 53 54 # TODO: deal with incremental vs removeold. If docs removed from import folder, then import step 55 # won't delete it from archives but buildcol step will. Need to implement this with this database plugin or wherever the actual flow is 56 57 # TODO: Add public instructions on using this plugin and its plugout: start with installing mysql binary, changing pwd, running the server (and the client against it for checking, basic cmds like create and drop). Then discuss db name, table names (per coll), db cols and col types, and how the plugout and plugin work. 58 # Discuss the plugin/plugout parameters. 59 60 53 61 # GreenstoneSQLPlugin inherits from GreenstoneXMLPlugin so that it if meta or fulltext 54 62 # is still written out to doc.xml (docsql .xml), that will be processed as usual, … … 56 64 # is written out by GreenstoneSQLPlugout into the SQL db). 57 65 58 # TODO:59 # no more docoid in docsql .xml filename, set OID as attribute of root element inside docsql.xml file instead60 # and parse it out61 62 # TODO: deal with incremental vs removeold. If docs removed from import folder, then import step63 # won't delete it from archives but buildcol step will. Need to implement this with this database plugin or wherever the actual flow is64 65 # TODO: Add public instructions on using this plugin and its plugout: start with installing mysql binary, changing pwd, running the server (and the client against it for checking, basic cmds like create and drop). Then discuss db name, table names (per coll), db cols and col types, and how the plugout and plugin work.66 # Discuss the plugin/plugout parameters.67 66 68 67 sub BEGIN { … … 201 200 202 201 my $oid = $self->{'doc_oid'}; # we stored current doc's OID during sub xml_start_tag() 203 print $outhandle " ====OID of document (meta|text) to be read in from DB: $oid\n"202 print $outhandle "++++ OID of document (meta|text) to be read in from DB: $oid\n" 204 203 if $self->{'verbosity'} > 1; 205 206 204 207 205 # For now, we have access to doc_obj (until just before super::close_document() terminates) 208 209 $self->{'doc_obj'}->set_OID($oid); # complex method. Is this necessary, since we just want to write meta and txt for the docobj to index? 210 211 # checking that complicated looking method set_OID() hasn't modified oid 212 if($oid ne $self->{'doc_obj'}->get_OID()) { 213 print STDERR "@@@@ WARNING: OID after setting on doc_obj = " . $self->{'doc_obj'}->get_OID() . " and is not the same as original OID $oid from docsqloid.xml filename\n"; 214 } 215 216 217 # TODO: This function is called on a per doc.xml file basis 218 # but we can process all docs of a collection in one go when dealing with the SQL tables for 219 # the collection. How and where should we read in the collection tables then? 220 # TODO: Perhaps MySQLPlugout could write out a token file (.gssql) into archives during import.pl 221 # and if that file is detected, then MySQLPlugin::read() is passed in that file during 222 # buildcol.pl. And that file will trigger reading the 2 tables for the collection??? 206 207 # no need to call $self->{'doc_obj'}->set_OID($oid); 208 # because either the OID is stored in the SQL db as meta 'Identifier' alongside other metadata 209 # or it's stored in the doc.xml as metadata 'Identifier' alongside other metadata 210 # Either way, Identifier meta will be read into the docobj automatically with other meta. 211 223 212 my $proc_mode = $self->{'process_mode'}; 224 213 if($proc_mode eq "all" || $proc_mode eq "meta_only") { … … 226 215 227 216 my $sth = $gs_sql->select_from_metatable_matching_docid($oid); 228 print $outhandle "### stmt: ".$sth->{'Statement'}."\n" if $self->{'verbosity'} > 1; 217 print $outhandle "### SQL select stmt: ".$sth->{'Statement'}."\n" 218 if $self->{'verbosity'} > 1; 229 219 230 220 print $outhandle "----------SQL DB contains meta-----------\n" if $self->{'verbosity'} > 1; … … 281 271 282 272 283 # TODO: only want to work with sql db if buildcol.pl. Unfortunately, also runs on import.pl 284 # call init() not begin() because there can be multiple plugin passes 273 # TODO: only want to work with sql db if buildcol.pl. Unfortunately, also runs on import.pl. 274 # During import, the GS SQL Plugin is called before the GS SQL Plugout with undesirable side 275 # effect that if the db doesn't exist, gssql::use_db() fails, as it won't create db. 276 277 # Call init() not begin() because there can be multiple plugin passes 285 278 # and init() should be called before all passes: 286 279 # one for doc level and another for section level indexing
Note:
See TracChangeset
for help on using the changeset viewer.