Ignore:
Timestamp:
2018-11-09T19:01:04+13:00 (5 years ago)
Author:
ak19
Message:
  1. gssql destructor DESTROY doesn't really do anything now, as DBI's destructor will ensure disconnection if not already disconnected. So we don't disconnect in gssql DESTROY, which we used to do but which would only happen on Cancel specifically as other gssql code would already disconnect on natural termination. 2. gssql no longer sets connection parameter fallbacks for those that have fallbacks, since the defaults for these parameters are set by the GS SQL Plugs' configuration options. 3. No need to (html) escape meta/full text when storing in DB or unescape when reading it out of DB, since DB is like a regular text file: can just put text in there. It's only html/xml files where text needs to be stored escaped. 4. Having committed doc.pm::add_utf8_textREF() method in previous commit, GreenstoneSQLPlugin can now pass the fulltxt by reference using this method when constructing the doc_obj from what's read in from the DB.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/plugins/GreenstoneSQLPlugin.pm

    r32589 r32591  
    4343# back in from the sql db while the remainder is to be read back in from the docsql .xml files.
    4444
    45 # TODO: Add public instructions on using this plugin and its plugout: start with installing mysql binary, changing pwd, running the server (and the client against it for checking: basic cmds like create and drop). Then discuss db name, table names (per coll), db cols and col types, and how the plugout and plugin work.
     45# + TODO: Add public instructions on using this plugin and its plugout: start with installing mysql binary, changing pwd, running the server (and the client against it for checking: basic cmds like create and drop). Then discuss db name, table names (per coll), db cols and col types, and how the plugout and plugin work.
    4646# Discuss the plugin/plugout parameters.
    4747
     
    145145        'desc' => "{GreenstoneSQLPlug.rollbacl_on_cancel}" } ];
    146146
     147# TODO: If subclassing gssql for other supporting databases and if they have different required
     148# connection parameters, we can check how WordPlugin, upon detecting Word is installed,
     149# dynamically loads Word specific configuration options.
    147150my $arguments =
    148151    [ { 'name' => "process_exp",
     
    161164    'type' => "enum",
    162165    'list' => $rollback_on_cancel_list,
    163     'deft' => "false", # TODO Q: what's the better default? If "true", any memory concerns?
     166    'deft' => "false", # better default than true
    164167    'reqd' => "no",
    165168    'hiddengli' => "no"},
     
    178181    'type' => "string",
    179182    'deft' => "",
    180     'reqd' => "no"}, # pwd required? NO.
     183    'reqd' => "no"}, # pwd not required: can create mysql accounts without pwd
    181184      { 'name' => "db_host",
    182185    'desc' => "{GreenstoneSQLPlug.db_host}",
    183186    'type' => "string",
    184     'deft' => "127.0.0.1",
     187    'deft' => "127.0.0.1", # NOTE: make this int? No default for port, since it's not a required connection param
    185188    'reqd' => "yes"},
    186189      { 'name' => "db_port",
     
    226229# as removeold, which should drop the collection tables, happens during the import phase,
    227230# calling GreenstoneSQLPlugin::and therefore also requires a db connection.
    228 # TODO: Eventually can try moving get_gssql_instance into gssql.pm? That way both GS SQL Plugin
     231# + TODO: Eventually can try moving get_gssql_instance into gssql.pm? That way both GS SQL Plugin
    229232# and Plugout would be using one connection during import.pl phase when both plugs exist.
    230233
     
    421424}
    422425
    423 # TODO Q: Why are there 4 passes when we're only indexing at doc and section level (2 passes)? What's the dummy pass, why is there a pass for infodb?
     426# There are multiple passes processing the document (see buildcol's mode parameter description):
     427# - compressing the text which may be a dummy pass for lucene/solr, wherein they still want the
     428# docobj for different purposes,
     429# - the pass(es) for indexing, e.g. doc/didx and section/sidx level passes
     430# - and an infodb pass for processing the classifiers. This pass too needs the docobj
     431# Since all passes need the doc_obj, all are read in from docsql + SQL db into the docobj in memory
    424432
    425433# We should only ever get here during the buildcol.pl phase
     
    472480        if $self->{'verbosity'} > 2;
    473481       
    474         # TODO:  we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_meta directly:
    475         $doc_obj->add_utf8_metadata($sid, $metaname, &docprint::unescape_text($metaval));
     482        # + TODO:  we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_meta directly:
     483        #$doc_obj->add_utf8_metadata($sid, $metaname, &docprint::unescape_text($metaval));
     484       
     485        # data stored unescaped in db: escaping only for html/xml files, not for txt files or db
     486        $doc_obj->add_utf8_metadata($sid, $metaname, $metaval);
    476487    }
    477488    print $outhandle "----------FIN READING DOC's META FROM SQL DB------------\n"
     
    501512       
    502513        # TODO - pass by ref?
    503         # TODO: we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_text directly:
    504         my $textref = &docprint::unescape_textref(\$text);
    505         $doc_obj->add_utf8_text($sid, $$textref);
     514        # + TODO: we accessed the db in utf8 mode, so, we can call doc_obj->add_utf8_text directly:
     515        # data stored unescaped in db: escaping is only for html/xml files, not for txt files or db
     516        #my $textref = &docprint::unescape_textref(\$text);
     517        $doc_obj->add_utf8_textref($sid, \$text);
    506518    }   
    507519    print $outhandle "----------FIN READING DOC's TXT FROM SQL DB------------\n"
Note: See TracChangeset for help on using the changeset viewer.