Changeset 32591 for main


Ignore:
Timestamp:
11/09/18 19:01:04 (2 years ago)
Author:
ak19
Message:
  1. gssql destructor DESTROY doesn't really do anything now, as DBI's destructor will ensure disconnection if not already disconnected. So we don't disconnect in gssql DESTROY, which we used to do but which would only happen on Cancel specifically as other gssql code would already disconnect on natural termination. 2. gssql no longer sets connection parameter fallbacks for those that have fallbacks, since the defaults for these parameters are set by the GS SQL Plugs' configuration options. 3. No need to (html) escape meta/full text when storing in DB or unescape when reading it out of DB, since DB is like a regular text file: can just put text in there. It's only html/xml files where text needs to be stored escaped. 4. Having committed doc.pm::add_utf8_textREF() method in previous commit, GreenstoneSQLPlugin can now pass the fulltxt by reference using this method when constructing the doc_obj from what's read in from the DB.
Location:
main/trunk/greenstone2/perllib
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/gssql.pm

    r32588 r32591  
    5252# + TODO: remove unnecessary warn() since PrintError is active
    5353
    54 # TODO: drop table if exists and create table if exists are available in MySQL. Use those cmds
     54# + TODO: drop table if exists and create table if exists are available in MySQL. Use those cmds
    5555# instead of always first checking for existence ourselves? Only when subclassing to specific
    5656# mysql class?
     
    157157#
    158158# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#disconnect
    159 # "Disconnects the database from the database handle. disconnect is typically only used before exitin# g the program. The handle is of little use after disconnecting.
     159# "Disconnects the database from the database handle. disconnect is typically only used before exiting the program. The handle is of little use after disconnecting.
    160160#
    161161# The transaction behaviour of the disconnect method is, sadly, undefined. Some database systems (such as Oracle and Ingres) will automatically commit any outstanding changes, but others (such as Informix) will rollback any outstanding changes. Applications not using AutoCommit should explicitly call commit or rollback before calling disconnect.
     
    166166#
    167167# If you disconnect from a database while you still have active statement handles (e.g., SELECT statement handles that may have more data to fetch), you will get a warning. The warning may indicate that a fetch loop terminated early, perhaps due to an uncaught error. To avoid the warning call the finish method on the active handles."
     168#
    168169#
    169170sub DESTROY {
     
    171172   
    172173    if (${^GLOBAL_PHASE} eq 'DESTRUCT') {
    173    
     174
    174175    if ($_dbh_instance) { # database handle still active. Use singleton handle!
    175 
    176         # rollback code has moved to finish_signal_handler() where it belongs?
    177        
    178         # NOTE: if RaiseError is set on dbi connection, then on any error, perl process will die()
    179         # which will end up calling this DESTROY. If it was a die() that called DESTROY
    180         # then need to rollback the db here. However, if it was not a die() but natural termination
    181         # of the perl process, destroy() will also get called. In that case we don't want to rollback
    182         # but do a commit() to the DB instead.
    183         # Perhaps detecting the difference may be accomplished by checking ref_count:
    184         # - If ref_count not 0 it may require a rollback?
    185         # - If ref_count 0 it may be a natural termination and require a commit? Except that ref_count
    186         # is set back to 0 in finished(), which will do the commit when ref_count becomes 0. So shouldn't
    187         # (have to) do that here.
     176                          # dbh instance being active implies build was cancelled
     177
     178        # rollback code has moved to finish_signal_handler() where it belongs
     179        # as rollback() should only happen on cancel/unnatural termination
     180        # vs commit() happening in finished() before disconnect, which is natural termination.
     181
    188182       
    189183        # We're now finally ready to disconnect, as is required for both natural and premature termination
    190         print STDERR "XXXXXXXX Global Destruct: Disconnecting from database\n";
    191         $_dbh_instance->disconnect or warn $_dbh_instance->errstr;
    192         $_dbh_instance = undef;
    193         $ref_count = 0;
     184        # (Though natural termination would have disconnected already)
     185        # We now leave DBI's own destructor to do the disconnection when perl calls its DESTROY()
     186        # We'll just print a message to stop anyone from worrying whether cancelling build
     187        # will ensure disconnection still happens. It happens, but silently.
     188        print STDERR "   Global Destruct Phase: DBI's own destructor will disconnect database\n";
     189        #$_dbh_instance->disconnect or warn $_dbh_instance->errstr;
     190        #$_dbh_instance = undef;
     191        #$ref_count = 0;
    194192    }
    195193    return;
     
    246244    }
    247245    if($params_map->{'autocommit'}) {
    248         print STDERR "   SQL DB CANCEL SUPPORT OFF.\n";
     246        print STDERR "   SQL DB CANCEL SUPPORT OFF.\n" if($params_map->{'verbosity'} > 2);
    249247    } else {
    250248        print STDERR "   SQL DB CANCEL SUPPORT ON.\n";
     
    259257    my $db_enc = "utf8mb4" if $params_map->{'db_encoding'} eq "utf8";
    260258
    261     # these are the params for connecting to MySQL
    262     my $db_driver = $params_map->{'db_driver'} || "mysql";
    263     my $db_user = $params_map->{'db_client_user'} || "root";
     259    # Params for connecting to MySQL
     260    # These params are ensured default/fallback values by the GS SQL Plugs
     261    # so no need to set it here
     262    my $db_driver = $params_map->{'db_driver'};
     263    my $db_host = $params_map->{'db_host'};
     264    my $db_user = $params_map->{'db_client_user'};
     265
     266    # params that can be undef are db_client_pwd and db_port
    264267    my $db_pwd = $params_map->{'db_client_pwd'}; # even if undef and password was necessary,
    265268                                     # we'll see a sensible error message when connect fails
    266     my $db_host = $params_map->{'db_host'} || "127.0.0.1";
    267269        # localhost doesn't work for us, but 127.0.0.1 works
    268270        # https://metacpan.org/pod/DBD::mysql
     
    355357    if($ref_count == 0) { # Only commit transaction when we're about to actually disconnect, not before
    356358   
    357     # TODO: If AutoCommit was off, meaning transactions were on/enabled,
     359    # + TODO: If AutoCommit was off, meaning transactions were on/enabled,
    358360    # then here is where we commit our one long transaction.
    359361    # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#commit
     
    521523    }
    522524
    523     # TODO Q: commit here, so that future select statements work?
     525    # + TODO Q: commit here, so that future select statements work?
    524526    # See https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#Transactions
    525527}
  • main/trunk/greenstone2/perllib/plugins/GreenstoneSQLPlugin.pm

    r32589 r32591  
    4343# back in from the sql db while the remainder is to be read back in from the docsql .xml files.
    4444
    45 # TODO: Add public instructions on using this plugin and its plugout: start with installing mysql binary, changing pwd, running the server (and the client against it for checking: basic cmds like create and drop). Then discuss db name, table names (per coll), db cols and col types, and how the plugout and plugin work.
     45# + TODO: Add public instructions on using this plugin and its plugout: start with installing mysql binary, changing pwd, running the server (and the client against it for checking: basic cmds like create and drop). Then discuss db name, table names (per coll), db cols and col types, and how the plugout and plugin work.
    4646# Discuss the plugin/plugout parameters.
    4747
     
    145145        'desc' => "{GreenstoneSQLPlug.rollbacl_on_cancel}" } ];
    146146
     147# TODO: If subclassing gssql for other supporting databases and if they have different required
     148# connection parameters, we can check how WordPlugin, upon detecting Word is installed,
     149# dynamically loads Word specific configuration options.
    147150my $arguments =
    148151    [ { 'name' => "process_exp",
     
    161164    'type' => "enum",
    162165    'list' => $rollback_on_cancel_list,
    163     'deft' => "false", # TODO Q: what's the better default? If "true", any memory concerns?
     166    'deft' => "false", # better default than true
    164167    'reqd' => "no",
    165168    'hiddengli' => "no"},
     
    178181    'type' => "string",
    179182    'deft' => "",
    180     'reqd' => "no"}, # pwd required? NO.
     183    'reqd' => "no"}, # pwd not required: can create mysql accounts without pwd
    181184      { 'name' => "db_host",
    182185    'desc' => "{GreenstoneSQLPlug.db_host}",
    183186    'type' => "string",
    184     'deft' => "127.0.0.1",
     187    'deft' => "127.0.0.1", # NOTE: make this int? No default for port, since it's not a required connection param
    185188    'reqd' => "yes"},
    186189      { 'name' => "db_port",
     
    226229# as removeold, which should drop the collection tables, happens during the import phase,
    227230# calling GreenstoneSQLPlugin::and therefore also requires a db connection.
    228 # TODO: Eventually can try moving get_gssql_instance into gssql.pm? That way both GS SQL Plugin
     231# + TODO: Eventually can try moving get_gssql_instance into gssql.pm? That way both GS SQL Plugin
    229232# and Plugout would be using one connection during import.pl phase when both plugs exist.
    230233
     
    421424}
    422425
    423 # TODO Q: Why are there 4 passes when we're only indexing at doc and section level (2 passes)? What's the dummy pass, why is there a pass for infodb?
     426# There are multiple passes processing the document (see buildcol's mode parameter description):
     427# - compressing the text which may be a dummy pass for lucene/solr, wherein they still want the
     428# docobj for different purposes,
     429# - the pass(es) for indexing, e.g. doc/didx and section/sidx level passes
     430# - and an infodb pass for processing the classifiers. This pass too needs the docobj
     431# Since all passes need the doc_obj, all are read in from docsql + SQL db into the docobj in memory
    424432
    425433# We should only ever get here during the buildcol.pl phase
     
    472480        if $self->{'verbosity'} > 2;
    473481       
    474         # TODO:  we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_meta directly:
    475         $doc_obj->add_utf8_metadata($sid, $metaname, &docprint::unescape_text($metaval));
     482        # + TODO:  we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_meta directly:
     483        #$doc_obj->add_utf8_metadata($sid, $metaname, &docprint::unescape_text($metaval));
     484       
     485        # data stored unescaped in db: escaping only for html/xml files, not for txt files or db
     486        $doc_obj->add_utf8_metadata($sid, $metaname, $metaval);
    476487    }
    477488    print $outhandle "----------FIN READING DOC's META FROM SQL DB------------\n"
     
    501512       
    502513        # TODO - pass by ref?
    503         # TODO: we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_text directly:
    504         my $textref = &docprint::unescape_textref(\$text);
    505         $doc_obj->add_utf8_text($sid, $$textref);
     514        # + TODO: we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_text directly:
     515        # data stored unescaped in db: escaping is only for html/xml files, not for txt files or db
     516        #my $textref = &docprint::unescape_textref(\$text);
     517        $doc_obj->add_utf8_textref($sid, \$text);
    506518    }   
    507519    print $outhandle "----------FIN READING DOC's TXT FROM SQL DB------------\n"
  • main/trunk/greenstone2/perllib/plugouts/GreenstoneSQLPlugout.pm

    r32589 r32591  
    4141# + TODO: SIGTERM rollback and disconnect?
    4242# + TODO Q: what about verbosity for debugging, instead of current situation of printing out upon debug set at the expense of writing to db
    43 #X TODO Q: introduced site param to plugins and plugouts. Did I do it right? And should they have hiddengli = "yes". No longer a param
    44 # Did I do the pass by ref in docprint's escape and unescape textref functions correctly, and how they're called here?
     43#+ TODO Q: introduced site param to plugins and plugouts. Did I do it right? And should they have hiddengli = "yes". No longer a param
     44# !!!! Did I do the pass by ref in docprint's escape and unescape textref functions correctly, and how they're called here?
    4545#   Any more optimisation I can do around this?
    4646
     
    8888      'type' => "enum",
    8989      'list' => $rollback_on_cancel_list,
    90       'deft' => "false", # TODO Q: what's the better default? If "true", any memory concerns?
     90      'deft' => "false", # better default than true
    9191      'reqd' => "no",
    9292      'hiddengli' => "no"},
     
    105105      'type' => "string",
    106106      'deft' => "",
    107       'reqd' => "no"}, # pwd required? NO.
     107      'reqd' => "no"}, # pwd not required: can create mysql accounts without pwd
    108108    { 'name' => "db_host",
    109109      'desc' => "{GreenstoneSQLPlug.db_host}",
    110110      'type' => "string",
    111       'deft' => "127.0.0.1",
     111      'deft' => "127.0.0.1", # localhost doesn't work for us, but 127.0.0.1 works. See gsmysql.pm
    112112      'reqd' => "yes"},
    113113    { 'name' => "db_port",
     
    347347        # TODO: does it need to be stored escaped, as it requires unescaping when read back in
    348348        # from db (unlike for reading back in from doc.xml)
    349         my $escaped_meta_value = &docprint::escape_text($data->[1]);
     349
     350        # Treat db like a text file instead of an html/xml file: don't need to escape text
     351        # going into it
     352        #my $escaped_meta_value = &docprint::escape_text($data->[1]);
     353        my $meta_value = $data->[1];
    350354       
    351355        # Write out the current section's meta to collection db's METADATA table       
     
    356360        # OR if debugging, then it will print the SQL insert statement but not execute it
    357361       
    358         $gs_sql->insert_row_into_metadata_table($doc_oid, $section_name, $meta_name, $escaped_meta_value, $self->{'debug'});
     362        $gs_sql->insert_row_into_metadata_table($doc_oid, $section_name, $meta_name, $meta_value, $self->{'debug'});
    359363    }
    360364    }
     
    362366   
    363367    if($proc_mode eq "all" || $proc_mode eq "text_only" ) {
    364    
    365     my $section_textref = &docprint::escape_textref(\$section_ptr->{'text'});
     368
     369    # See above, no need to html-escape for db
     370    my $section_text = $section_ptr->{'text'}; #&docprint::escape_textref(\$section_ptr->{'text'});
    366371   
    367372    # fulltxt column can be SQL NULL. undef value gets written out as NULL:
     
    369374    # The following will do the SQL insertion
    370375    # or if debug, the following will print the SQL insert stmt without executing it
    371     $gs_sql->insert_row_into_fulltxt_table($doc_oid, $section_name, $section_textref, $self->{'debug'});
     376    $gs_sql->insert_row_into_fulltxt_table($doc_oid, $section_name, \$section_text, $self->{'debug'});
    372377   
    373378    }
Note: See TracChangeset for help on using the changeset viewer.