root/main/trunk/greenstone2/perllib/gsmysql.pm @ 32593

Revision 32593, 36.2 KB (checked in by ak19, 6 months ago)

Now that gssql.pm has become gsmysql.pm, it now uses the presumably optimised MySQL specific statements CREATE TABLE IF NOT EXISTS and DROP TABLE IF EXISTS, as opposed to my less efficient of looping through tables in the currently loaded db to check if a table exists or not, before deciding on whether a table needs to be created/deleted.

Line 
1###########################################################################
2#
3# gsmysql.pm -- Uses DBI for MySQL related utility functions used by
4# GreenstoneSQLPlugout and GreenstoneSQLPlugin too.
5# A component of the Greenstone digital library software
6# from the New Zealand Digital Library Project at the
7# University of Waikato, New Zealand.
8#
9# Copyright (C) 1999 New Zealand Digital Library Project
10#
11# This program is free software; you can redistribute it and/or modify
12# it under the terms of the GNU General Public License as published by
13# the Free Software Foundation; either version 2 of the License, or
14# (at your option) any later version.
15#
16# This program is distributed in the hope that it will be useful,
17# but WITHOUT ANY WARRANTY; without even the implied warranty of
18# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
19# GNU General Public License for more details.
20#
21# You should have received a copy of the GNU General Public License
22# along with this program; if not, write to the Free Software
23# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
24#
25###########################################################################
26
27package gsmysql;
28
29use strict;
30no strict 'refs';
31no strict 'subs';
32
33use DBI; # the central package for this module used by GreenstoneSQL Plugout and Plugin
34
35#################
36# Database functions that use the perl DBI module (with the DBD driver module for mysql)
37# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm
38# https://metacpan.org/pod/DBD::mysql
39#################
40
41##############################
42
43# TODO Q: If disconnect is automatically called when object destroyed, what does that mean
44# for our file-global handle object, is disconnect only called at end of perl process?
45# Does that mean we don't need to explicitly call disconnect in gsmysql object's destroy during
46# the GLOBAL destruction phase?
47# https://perldoc.perl.org/perlobj.html#Destructors
48
49#+ TODO: add infrastructure for db_port, AutoCommit etc
50# For port, see https://stackoverflow.com/questions/2248665/perl-script-to-connect-to-mysql-server-port-3307
51
52# + TODO: remove unnecessary warn() since PrintError is active
53
54# + TODO: drop table if exists and create table if exists are available in MySQL. Use those cmds
55# instead of always first checking for existence ourselves? Only when subclassing to specific
56# mysql class?
57
58
59# + TODO Q: What on cancelling a build: delete table? But what if it was a rebuild and the rebuild is cancelled (not the original build)?
60# Do we create a copy of the orig database as backup, then start populating current db, and if cancelled, delete current db and RENAME backup table to current?
61# https://stackoverflow.com/questions/3280006/duplicating-a-mysql-table-indexes-and-data
62# BUT what if the table is HUGE? (Think of a collection with millions of docs.) Huge overhead in copying?
63# The alternative is we just quit on cancel, but then: cancel could leave the table in a partial committed state, with no way of rolling back.
64# Unless they do a full rebuild, which will recreate the table from scratch?
65# SOLUTION-> rollback transaction on error, see https://www.effectiveperlprogramming.com/2010/07/set-custom-dbi-error-handlers/
66# In that case, should set AutoCommit to off on connection, and remember to commit at end.
67
68# + TODO: Consider AutoCommit status (and Autocommit off allowing commit or rollback for GS coll build cancel) later
69
70
71##############################
72
73# singleton connection
74my $_dbh_instance = undef; # calls undef() function. See https://perlmaven.com/undef-and-defined-in-perl
75my $ref_count = 0;
76
77# Need params_map keys:
78# - collection_name
79# - db_encoding (db content encoding) - MySQL can set this at server, db, table levels. For MySQL
80# we set the enc during connect at server level. Not sure whether other DB's support it at the
81# same levels.
82
83# For connection to MySQL, need:
84#  - db_driver, db_client_user, db_client_pwd, db_host, (db_port not used at present)
85# So these will be parameterised, but in a hashmap, for just the connect method.
86
87# Parameterise (one or more methods may use them):
88# - db_name (which is the GS3 sitename, or "greenstone2" for GS2)
89
90# Database access related functions
91# http://g2pc1.bu.edu/~qzpeng/manual/MySQL%20Commands.htm
92# https://www.guru99.com/insert-into.html
93
94# Add signal handlers to cleanup and disconnect from db on sudden termination, incl cancel build
95# https://www.perl.com/article/37/2013/8/18/Catch-and-Handle-Signals-in-Perl/
96$SIG{INT}  = \&finish_signal_handler;
97$SIG{TERM}  = \&finish_signal_handler;
98$SIG{KILL}  = \&finish_signal_handler;
99
100sub finish_signal_handler {
101    my ($sig) = @_; # one of INT|KILL|TERM
102
103    if ($_dbh_instance) { # database handle (note, using singleton) still active.
104   
105    # If autocommit wasn't set, then this is a cancel operation.
106    # If we've not disconnected from the sql db yet and if we've not committed
107    # transactions yet, then cancel means we do a rollback here
108   
109    if($_dbh_instance->{AutoCommit} == 0) {
110        print STDERR "   User cancelled: rolling back SQL database transaction.\n";
111        $_dbh_instance->rollback(); # will warn on failure, nothing more we can/want to do,
112    }
113    }
114   
115    die "Caught a $sig signal $!"; # die() will always call destructor (sub DESTROY)
116}
117
118sub new
119
120    my $class = shift(@_);
121   
122    my ($params_map) = @_;
123   
124    # library_url: to be specified on the cmdline if not using a GS-included web server
125    # the GSDL_LIBRARY_URL env var is useful when running cmdline buildcol.pl in the linux package manager versions of GS3
126   
127    # https://stackoverflow.com/questions/7083453/copying-a-hashref-in-perl
128    # Making a shallow copy works, and can handle unknown params:
129    #my $self = $params_map;
130
131    # but being explicit for class params needed for MySQL:
132    my $self = {
133    'collection_name' => $params_map->{'collection_name'},
134    'verbosity' => $params_map->{'verbosity'} || 1
135    };
136
137    # The db_encoding option is presently not passed in to this constructor as parameter.
138    # Placed here to indicate it's sort of optional.
139    # Since docxml are all in utf8, the contents of the GS SQL database should be too,
140    # So making utf8 the hidden default at present.
141    $self->{'db_encoding'} = $params_map->{'db_encoding'} || "utf8";
142   
143    $self = bless($self, $class);   
144
145    $self->{'tablename_prefix'} = $self->sanitize_name($params_map->{'collection_name'});
146   
147    return $self;
148}
149
150# On die(), an object's destructor is called.
151# See https://www.perl.com/article/37/2013/8/18/Catch-and-Handle-Signals-in-Perl/
152# We want to ensure we've closed the db connection in such cases.
153# "It’s common to call die when handling SIGINT and SIGTERM. die is useful because it will ensure that Perl stops correctly: for example Perl will execute a destructor method if present when die is called, but the destructor method will not be called if a SIGINT or SIGTERM is received and no signal handler calls die."
154#
155# Useful: https://perldoc.perl.org/perlobj.html#Destructors
156# For more on when destroy is called, see https://www.perlmonks.org/?node_id=1020920
157#
158# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#disconnect
159# "Disconnects the database from the database handle. disconnect is typically only used before exiting the program. The handle is of little use after disconnecting.
160#
161# The transaction behaviour of the disconnect method is, sadly, undefined. Some database systems (such as Oracle and Ingres) will automatically commit any outstanding changes, but others (such as Informix) will rollback any outstanding changes. Applications not using AutoCommit should explicitly call commit or rollback before calling disconnect.
162#
163# The database is automatically disconnected by the DESTROY method if still connected when there are no longer any references to the handle. The DESTROY method for each driver should implicitly call rollback to undo any uncommitted changes. This is vital behaviour to ensure that incomplete transactions don't get committed simply because Perl calls DESTROY on every object before exiting. Also, do not rely on the order of object destruction during "global destruction", as it is undefined.
164#
165# Generally, if you want your changes to be committed or rolled back when you disconnect, then you should explicitly call "commit" or "rollback" before disconnecting.
166#
167# If you disconnect from a database while you still have active statement handles (e.g., SELECT statement handles that may have more data to fetch), you will get a warning. The warning may indicate that a fetch loop terminated early, perhaps due to an uncaught error. To avoid the warning call the finish method on the active handles."
168#
169#
170sub DESTROY {
171    my $self = shift;
172   
173    if (${^GLOBAL_PHASE} eq 'DESTRUCT') {
174
175    if ($_dbh_instance) { # database handle still active. Use singleton handle!
176                          # dbh instance being active implies build was cancelled
177
178        # rollback code has moved to finish_signal_handler() where it belongs
179        # as rollback() should only happen on cancel/unnatural termination
180        # vs commit() happening in finished() before disconnect, which is natural termination.
181
182       
183        # We're now finally ready to disconnect, as is required for both natural and premature termination
184        # (Though natural termination would have disconnected already)
185        # We now leave DBI's own destructor to do the disconnection when perl calls its DESTROY()
186        # We'll just print a message to stop anyone from worrying whether cancelling build
187        # will ensure disconnection still happens. It happens, but silently.
188        print STDERR "   Global Destruct Phase: DBI's own destructor will disconnect database\n";
189        #$_dbh_instance->disconnect or warn $_dbh_instance->errstr;
190        #$_dbh_instance = undef;
191        #$ref_count = 0;
192    }
193    return;
194    }
195
196    # "Always include a call to $self->SUPER::DESTROY in our destructors (even if we don't yet have any base/parent classes). (p. 145)"
197    # Superclass and destroy, call to SUPER: https://www.perlmonks.org/?node_id=879920
198    # discussion also covers multiple-inheritance (MI)   
199    $self->SUPER::DESTROY if $self->can("SUPER::DESTROY");
200
201}
202
203
204
205################### BASIC DB OPERATIONS ##################
206
207# THE NEW DB FUNCTIONS
208# NOTE: FULLTEXT is a reserved keyword in (My)SQL. So we can't name a table or any of its columns "fulltext".
209# https://dev.mysql.com/doc/refman/5.5/en/keywords.html
210
211
212
213# SINGLETON / GET INSTANCE PATTERN
214# https://stackoverflow.com/questions/16655603/perl-objects-class-variable-initialization
215# https://stackoverflow.com/questions/7587157/how-can-i-set-a-static-variable-that-can-be-accessed-by-all-subclasses-of-the-sa
216# Singleton without Moose: https://www.perl.com/article/52/2013/12/11/Implementing-the-singleton-pattern-in-Perl/
217
218sub connect_to_db
219{
220    my $self= shift (@_);
221    my ($params_map) = @_;
222   
223    $params_map->{'db_encoding'} = $self->{'db_encoding'};
224    $params_map->{'verbosity'} = $self->{'verbosity'};
225   
226    $self->{'db_handle'} = &_get_connection_instance($params_map); # getting singleton (class method)
227    if($self->{'db_handle'}) {
228    $ref_count++; # if successful, keep track of the number of refs to the single db connection
229    return $self->{'db_handle'};
230    }
231    return undef;
232}
233
234# SINGLETON METHOD #
235# TODO: where should the defaults for these params be, here or in GS-SQLPlugin/Plugout?
236sub _get_connection_instance
237{
238    #my $self= shift (@_); # singleton method doesn't use self, but callers don't need to know that
239    my ($params_map) = @_;
240   
241    if($params_map->{'verbosity'}) {
242    if(!defined $params_map->{'autocommit'}) {
243        print STDERR "  Autocommit parameter not defined\n";
244    }
245    if($params_map->{'autocommit'}) {
246        print STDERR "   SQL DB CANCEL SUPPORT OFF.\n" if($params_map->{'verbosity'} > 2);
247    } else {
248        print STDERR "   SQL DB CANCEL SUPPORT ON.\n";
249    }
250    }
251   
252    return $_dbh_instance if($_dbh_instance);
253
254    # or make the connection
255   
256    # For proper utf8 support in MySQL, encoding should be 'utf8mb4' as 'utf8' is insufficient
257    my $db_enc = "utf8mb4" if $params_map->{'db_encoding'} eq "utf8";
258
259    # Params for connecting to MySQL
260    # These params are ensured default/fallback values by the GS SQL Plugs
261    # so no need to set it here
262    my $db_driver = $params_map->{'db_driver'};
263    my $db_host = $params_map->{'db_host'};
264    my $db_user = $params_map->{'db_client_user'};
265
266    # params that can be undef are db_client_pwd and db_port
267    my $db_pwd = $params_map->{'db_client_pwd'}; # even if undef and password was necessary,
268                                     # we'll see a sensible error message when connect fails
269        # localhost doesn't work for us, but 127.0.0.1 works
270        # https://metacpan.org/pod/DBD::mysql
271        # "The hostname, if not specified or specified as '' or 'localhost', will default to a MySQL server
272        # running on the local machine using the default for the UNIX socket. To connect to a MySQL server
273        # on the local machine via TCP, you must specify the loopback IP address (127.0.0.1) as the host."
274    my $db_port = $params_map->{'db_port'}; # leave as undef if unspecified,
275                 # as our tests never used port anyway (must have internally
276                 # defaulted to whatever default port is used for MySQL)
277
278   
279    #my $connect_str = "dbi:$db_driver:database=$db_name;host=$db_host";
280    # But don't provide db now - this allows checking the db exists later when loading the db
281    my $connect_str = "dbi:$db_driver:host=$db_host";
282    $connect_str .= ";port=$db_port" if $db_port;
283
284    if($params_map->{'verbosity'}) {
285    print STDERR "Away to make connection to $db_driver database with:\n";
286    print STDERR " - hostname $db_host; username: $db_user";
287    print STDERR "; and the provided password" if $db_pwd;
288    print STDERR "\nAssuming the mysql server has been started with: --character_set_server=utf8mb4\n" if $db_driver eq "mysql";
289    }
290
291    # DBI AutoCommit connection param is on/1 by default, so if a value for this is not defined
292    # as a method parameter to _get_connection_instance, then fallback to the default of on/1
293    # More: https://www.oreilly.com/library/view/programming-the-perl/1565926994/re44.html
294    my $autocommit = (defined $params_map->{'autocommit'}) ? $params_map->{'autocommit'} : 1;
295   
296    my $dbh = DBI->connect("$connect_str", $db_user, $db_pwd,
297               {
298                   ShowErrorStatement => 1, # more informative as DBI will append failed SQL stmt to error message
299                   PrintError => 1, # on by default, but being explicit
300                   RaiseError => 0, # off by default, but being explicit
301                   AutoCommit => $autocommit,
302                   mysql_enable_utf8mb4 => 1 # tells MySQL to use UTF-8 for communication and tells DBD::mysql to decode the data, see https://stackoverflow.com/questions/46727362/perl-mysql-utf8mb4-issue-possible-bug
303               });
304
305    if(!$dbh) {
306    # NOTE, despite handle dbh being undefined, error code will be in DBI->err (note caps)
307    return 0;   
308    }
309
310    # set encoding https://metacpan.org/pod/DBD::mysql
311    # https://dev.mysql.com/doc/refman/5.7/en/charset.html
312    # https://dev.mysql.com/doc/refman/5.7/en/charset-conversion.html
313    # Setting the encoding at db server level: $dbh->do("set NAMES '" . $db_enc . "'");
314    # HOWEVER:
315    # It turned out insufficient setting the encoding to utf8, as that only supports utf8 chars that
316    # need up to 3 bytes. We may need up to 4 bytes per utf8 character, e.g. chars with macron,
317    # and for that, we need the encoding to be set to utf8mb4.
318    # To set up a MySQL db to use utf8mb4 requires configuration on the server side too.
319    # https://stackoverflow.com/questions/10957238/incorrect-string-value-when-trying-to-insert-utf-8-into-mysql-via-jdbc
320    # https://stackoverflow.com/questions/46727362/perl-mysql-utf8mb4-issue-possible-bug
321    # To set up the db for utf8mb4, therefore,
322    # the MySQL server needs to be configured for that char encoding by running the server as:
323    # mysql-5.7.23-linux-glibc2.12-x86_64/bin>./mysqld_safe --datadir=/Scratch/ak19/mysql/data --character_set_server=utf8mb4
324    # AND when connecting to the server, we can can either set mysql_enable_utf8mb4 => 1
325    # as a connection option
326    # OR we need to do both "set NAMES utf8mb4" AND "$dbh->{mysql_enable_utf8mb4} = 1;" after connecting
327    #
328    # Search results for DBI Set Names imply the "SET NAMES '<enc>'" command is mysql specific too,
329    # so setting the mysql specific option during connection above as "mysql_enable_utf8mb4 => 1"
330    # is no more objectionable. It has the advantage of cutting out the 2 extra lines of doing
331    # set NAMES '<enc>' and $dbh->{mysql_enable_utf8mb4} = 1 here.
332    # These lines may be preferred if more db_driver options are to be supported in future:
333    # then a separate method called set_db_encoding($enc) can work out what db_driver we're using
334    # and if mysql and enc=utfy, then it can do the following whereas it will issue other do stmts
335    # for other db_drivers, see https://www.perlmonks.org/?node_id=259456:
336   
337    #my $stmt = "set NAMES '" . $db_enc . "'";
338    #$dbh->do($stmt) || warn("Unable to set charset encoding at db server level to: " . $db_enc . "\n"); # tells MySQL to use UTF-8 for communication
339    #$dbh->{mysql_enable_utf8mb4} = 1; # tells DBD::mysql to decode the data
340   
341    # if we're here, then connection succeeded, store handle
342    $_dbh_instance = $dbh;
343    return $_dbh_instance;
344   
345}
346
347# Will disconnect if this instance of gsmysql holds the last reference to the db connection
348# If disconnecting and autocommit is off, then this will commit before disconnecting
349sub finished {
350    my $self= shift (@_);
351    my $dbh = $self->{'db_handle'};
352   
353    my $rc = 1; # return code: everything went fine, regardless of whether we needed to commit
354                # (AutoCommit on or off)
355   
356    $ref_count--;
357    if($ref_count == 0) { # Only commit transaction when we're about to actually disconnect, not before
358   
359    # + TODO: If AutoCommit was off, meaning transactions were on/enabled,
360    # then here is where we commit our one long transaction.
361    # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#commit
362    if($dbh->{AutoCommit} == 0) {
363        print STDERR "   Committing transaction to SQL database now.\n" if $self->{'verbosity'};
364        $rc = $dbh->commit() or warn("SQL DB COMMIT FAILED: " . $dbh->errstr); # important problem
365        # worth embellishing error message
366    }
367    # else if autocommit was on, then we'd have committed after every db operation, so nothing to do
368   
369    $self->_force_disconnect_from_db();
370    }
371
372    return $rc;
373}
374
375
376# Call this method on die(), so that you're sure the perl process has disconnected from SQL db
377# Disconnect from db - https://metacpan.org/pod/DBI#disconnect
378# TODO: make sure to have committed or rolled back before disconnect
379# and that you've call finish() on statement handles if any fetch remnants remain
380sub _force_disconnect_from_db {
381    my $self= shift (@_);
382
383    if($_dbh_instance) {
384    # make sure any active stmt handles are finished
385    # NO: "When all the data has been fetched from a SELECT statement, the driver will automatically call finish for you. So you should not call it explicitly except when you know that you've not fetched all the data from a statement handle and the handle won't be destroyed soon."
386   
387    print STDERR "    GSMySQL disconnecting from database\n" if $self->{'verbosity'};
388    # Just go through the singleton db handle to disconnect
389    $_dbh_instance->disconnect or warn $_dbh_instance->errstr;
390    $_dbh_instance = undef;
391    }
392    # Number of gsmysql objects that share a live connection is now 0, as the connection's dead
393    # either because the last gsmysql object finished() or because connection was killed (force)
394    $ref_count = 0;
395}
396
397
398# Load the designated database, i.e. 'use <dbname>;'.
399# If the database doesn't yet exist, creates it and loads it.
400# (Don't create the collection's tables yet, though)
401# At the end it will have loaded the requested database (in MySQL: "use <db>;") on success.
402# As usual, returns success or failure value that can be evaluated in a boolean context.
403sub use_db {
404    my $self= shift (@_);
405    my ($db_name) = @_;
406    my $dbh = $self->{'db_handle'};
407    $db_name = $self->sanitize_name($db_name);
408   
409    print STDERR "Attempting to use database $db_name\n" if($self->{'verbosity'});
410   
411    # perl DBI switch database: https://www.perlmonks.org/?node_id=995434
412    # do() returns undef on error.
413    # connection succeeded, try to load our database. If that didn't work, attempt to create db
414    my $success = $dbh->do("use $db_name");
415   
416    if(!$success && $dbh->err == 1049) { # "Unknown database" error has code 1049 (mysql only?) meaning db doesn't exist yet
417
418    print STDERR "Database $db_name didn't exist, creating it along with the tables for the current collection...\n" if($self->{'verbosity'});
419   
420    # attempt to create the db and its tables
421    $self->create_db($db_name) || return 0;
422
423    print STDERR "   Created database $db_name\n" if($self->{'verbosity'} > 1);
424   
425    # once more attempt to use db, now that it exists
426    $dbh->do("use $db_name") || return 0;
427    #$dbh->do("use $db_name") or die "Error (code" . $dbh->err ."): " . $dbh->errstr . "\n";
428
429    $success = 1;
430    }
431    elsif($success) { # database existed and loaded successfully, but
432    # before proceeding check that the current collection's tables exist
433
434    print STDERR "@@@ DATABASE $db_name EXISTED\n" if($self->{'verbosity'} > 2);
435    }
436   
437    return $success; # could still return 0, if database failed to load with an error code != 1049
438}
439
440
441# We should already have done "use <database>;" if this gets called.
442# Just load this collection's metatable
443sub ensure_meta_table_exists {
444    my $self = shift (@_);
445   
446    my $tablename = $self->get_metadata_table_name();
447    # if(!$self->table_exists($tablename)) {
448    #   $self->create_metadata_table() || return 0;
449    # } else {
450    #   print STDERR "@@@ Meta table exists\n" if($self->{'verbosity'} > 2);
451    # }
452    $self->create_metadata_table() || return 0; # will now only create it if it doesn't already exist
453    return 1;
454}
455
456# We should already have done "use <database>;" if this gets called.
457# Just load this collection's metatable
458sub ensure_fulltxt_table_exists {
459    my $self = shift (@_);
460   
461    my $tablename = $self->get_fulltext_table_name();   
462    # if(!$self->table_exists($tablename)) {
463    #   $self->create_fulltext_table() || return 0;
464    # } else {
465    #   print STDERR "@@@ Fulltxt table exists\n" if($self->{'verbosity'} > 2);
466    # }
467    $self->create_fulltext_table() || return 0; # will now only create it if it doesn't already exist
468    return 1;
469}
470
471
472sub create_db {
473    my $self= shift (@_);
474    my ($db_name) = @_;
475    my $dbh = $self->{'db_handle'};
476    $db_name = $self->sanitize_name($db_name);
477   
478    # https://stackoverflow.com/questions/5025768/how-can-i-create-a-mysql-database-from-a-perl-script
479    return $dbh->do("create database $db_name"); # do() will return undef on fail, https://metacpan.org/pod/DBI#do
480}
481
482## NOTE: these 2 create_table methods use mysql specific "CREATE TABLE IF NOT EXISTS" syntax
483## vs general SQL CREATE TABLE syntax which would produce an error message if the table
484## already existed
485## And unless do() fails, these two create methods will now always return true,
486## even if table existed and didn't need to be created.
487sub create_metadata_table {
488    my $self= shift (@_);
489    my $dbh = $self->{'db_handle'};
490   
491    my $table_name = $self->get_metadata_table_name();
492    print STDERR "   Will create table $table_name if it doesn't exist\n" if($self->{'verbosity'} > 2);
493   
494    # If using an auto incremented primary key:
495    my $stmt = "CREATE TABLE IF NOT EXISTS $table_name (id INT NOT NULL AUTO_INCREMENT, did VARCHAR(63) NOT NULL, sid VARCHAR(63) NOT NULL, metaname VARCHAR(127) NOT NULL, metavalue VARCHAR(1023) NOT NULL, PRIMARY KEY(id));";
496    return $dbh->do($stmt);
497}
498
499# TODO: Investigate: https://dev.mysql.com/doc/search/?d=10&p=1&q=FULLTEXT
500# 12.9.1 Natural Language Full-Text Searches
501# to see whether we have to index the 'fulltxt' column of the 'fulltext' tables
502# or let user edit this file, or add it as another option
503sub create_fulltext_table {
504    my $self= shift (@_);
505    my $dbh = $self->{'db_handle'};
506   
507    my $table_name = $self->get_fulltext_table_name();
508    print STDERR "   Will create table $table_name if it doesn't exist\n" if($self->{'verbosity'} > 2);
509   
510    # If using an auto incremented primary key:
511    my $stmt = "CREATE TABLE IF NOT EXISTS $table_name (id INT NOT NULL AUTO_INCREMENT, did VARCHAR(63) NOT NULL, sid VARCHAR(63) NOT NULL, fulltxt LONGTEXT, PRIMARY KEY(id));";
512    return $dbh->do($stmt);
513
514}
515
516## NOTE: this method uses mysql specific "DROP TABLE IF EXISTS" syntax vs general SQL DROP TABLE
517## syntax which would produce an error message if the table didn't exist
518sub delete_collection_tables {
519    my $self= shift (@_);
520    my $dbh = $self->{'db_handle'};
521
522    # drop table <tablename>
523    # my $table = $self->get_metadata_table_name();
524    # if($self->table_exists($table)) {
525    #   $dbh->do("drop table $table");
526    # }
527    # $table = $self->get_fulltext_table_name();
528    # if($self->table_exists($table)) {
529    #   $dbh->do("drop table $table");
530    # }
531    my $table = $self->get_metadata_table_name();   
532    $dbh->do("drop table if exists $table");
533
534    $table = $self->get_fulltext_table_name();
535    $dbh->do("drop table if exists $table");
536
537    # If prepared select statement handles already exist, would need to commit here
538    # so that future select statements using those prepared handles work.
539    # See https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#Transactions
540}
541
542# Don't call this: it will delete the meta and full text tables for ALL collections in $db_name (localsite by default)!
543# This method is just here for debugging (for testing creating a database when there is none)
544#
545# "IF EXISTS is used to prevent an error from occurring if the database does not exist. ... DROP DATABASE returns the number of tables that were removed. The DROP DATABASE statement removes from the given database directory those files and directories that MySQL itself may create during normal operation.Jun 20, 2012"
546# MySQL 8.0 Reference Manual :: 13.1.22 DROP DATABASE Syntax
547# https://dev.mysql.com/doc/en/drop-database.html
548sub _delete_database {
549    my $self= shift (@_);
550    my ($db_name) = @_;
551    my $dbh = $self->{'db_handle'};
552    $db_name = $self->sanitize_name($db_name);
553   
554    print STDERR "!!! Deleting database $db_name\n" if($self->{'verbosity'});
555   
556    # "drop database dbname"
557    $dbh->do("drop database $db_name") || return 0;
558
559    return 1;
560}
561
562
563########################### DB STATEMENTS ###########################
564
565# USEFUL: https://metacpan.org/pod/DBI
566# "Many methods have an optional \%attr parameter which can be used to pass information to the driver implementing the method. Except where specifically documented, the \%attr parameter can only be used to pass driver specific hints. In general, you can ignore \%attr parameters or pass it as undef."
567
568# More efficient to use prepare() to prepare an SQL statement once and then execute() it many times
569# (binding different values to placeholders) than running do() which will prepare each time and
570# execute each time. Also, do() is not useful with SQL select statements as it doesn't fetch rows.
571# Can prepare and cache prepared statements or retrieve prepared statements if cached in one step:
572# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#prepare_cached
573
574# https://www.guru99.com/insert-into.html
575# and https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html
576#     for inserting multiple rows at once
577# https://www.perlmonks.org/bare/?node_id=316183
578# https://metacpan.org/pod/DBI#do
579# https://www.quora.com/What-is-the-difference-between-prepare-and-do-statements-in-Perl-while-we-make-a-connection-to-the-database-for-executing-the-query
580# https://docstore.mik.ua/orelly/linux/dbi/ch05_05.htm
581
582# https://metacpan.org/pod/DBI#performance
583# 'The q{...} style quoting used in this example avoids clashing with quotes that may be used in the SQL statement. Use the double-quote like qq{...} operator if you want to interpolate variables into the string. See "Quote and Quote-like Operators" in perlop for more details.'
584#
585# This method uses lazy loading to prepare the SQL insert stmt once for a table and store it,
586# then execute the (stored) statement each time it's needed for that table.
587sub insert_row_into_metadata_table {
588    my $self = shift (@_);
589    my ($doc_oid, $section_name, $meta_name, $escaped_meta_value, $debug_only) = @_;
590   
591    my $dbh = $self->{'db_handle'};
592   
593    my $tablename = $self->get_metadata_table_name();
594    my $sth = $dbh->prepare_cached(qq{INSERT INTO $tablename (did, sid, metaname, metavalue) VALUES (?, ?, ?, ?)});# || warn("Could not prepare insert statement for metadata table\n");
595
596    # Now we're ready to execute the command, unless we're only debugging
597   
598    if($debug_only) {
599    # just print the statement we were going to execute
600    print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $meta_name, $escaped_meta_value)\n";
601    }
602    else {
603    print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $meta_name, $escaped_meta_value)\n" if $self->{'verbosity'} > 2;
604   
605    $sth->execute($doc_oid, $section_name, $meta_name, $escaped_meta_value)
606        || warn ("Unable to write metadata row to db:\n\tOID $doc_oid, section $section_name,\n\tmeta name: $meta_name, val: $escaped_meta_value");
607    # Execution failure will print out info anyway: since db connection sets PrintError
608    }
609}
610
611# As above. Likewise uses lazy loading to prepare the SQL insert stmt once for a table and store it,
612# then execute the (stored) statement each time it's needed for that table.
613sub insert_row_into_fulltxt_table {
614    my $self = shift (@_);
615    #my ($did, $sid, $fulltext) = @_;
616    my ($doc_oid, $section_name, $section_textref, $debug_only) = @_;
617   
618    my $dbh = $self->{'db_handle'};
619   
620    my $tablename = $self->get_fulltext_table_name();
621    my $sth = $dbh->prepare_cached(qq{INSERT INTO $tablename (did, sid, fulltxt) VALUES (?, ?, ?)});# || warn("Could not prepare insert statement for fulltxt table\n");
622   
623    # Now we're ready to execute the command, unless we're only debugging
624
625    # don't display the fulltxt value as it could be too long
626    my $txt_repr = $$section_textref ? "<TXT>" : "NULL";   
627    if($debug_only) { # only print statement, don't execute it
628    print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $txt_repr)\n";
629    }
630    else { 
631    print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $txt_repr)\n" if $self->{'verbosity'} > 2;
632   
633    $sth->execute($doc_oid, $section_name, $$section_textref)
634        || warn ("Unable to write fulltxt row to db for row:\n\tOID $doc_oid, section $section_name"); # Execution failure will print out info anyway: since db connection sets PrintError
635    }
636}
637
638
639## The 2 select statements used by GreenstoneSQLPlugin
640
641# Using fetchall_arrayref on statement handle, to run on prepared and executed stmt
642#   https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#fetchall_arrayref
643# instead of selectall_arrayref on database handle which will prepare, execute and fetch
644#   https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#selectall_arrayref
645#
646# Returns the statement handle that prepared and executed
647# a "SELECT * FROM <COLL>_metadata WHERE did = $oid" SQL statement.
648# Caller can call fetchrow_array() on returned statement handle, $sth
649# Have to use prepare() and execute() instead of do() since do() does
650# not allow for fetching result set thereafter:
651# do(): "This method  is typically most useful for non-SELECT statements that either cannot be prepared in advance (due to a limitation of the driver) or do not need to be executed repeatedly. It should not be used for SELECT statements because it does not return a statement handle (so you can't fetch any data)." https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#do
652sub select_from_metatable_matching_docid {
653    my $self= shift (@_);
654    my ($oid, $outhandle) = @_;
655   
656    my $dbh = $self->{'db_handle'};
657    my $tablename = $self->get_metadata_table_name();
658   
659    my $sth = $dbh->prepare_cached(qq{SELECT * FROM $tablename WHERE did = ?});
660    $sth->execute( $oid ); # will print msg on fail
661
662    print $outhandle "### SQL select stmt: ".$sth->{'Statement'}."\n"
663    if ($self->{'verbosity'} > 2);
664   
665    my $rows_ref = $sth->fetchall_arrayref();
666    # "If an error occurs, fetchall_arrayref returns the data fetched thus far, which may be none.
667    # You should check $sth->err afterwards (or use the RaiseError attribute) to discover if the
668    # data is complete or was truncated due to an error."
669    # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#fetchall_arrayref
670    # https://www.oreilly.com/library/view/programming-the-perl/1565926994/ch04s05.html
671    warn("Data fetching from $tablename terminated early by error: " . $dbh->err) if $dbh->err;
672    return $rows_ref;
673}
674
675
676# See select_from_metatable_matching_docid() above.
677# Returns the statement handle that prepared and executed
678# a "SELECT * FROM <COLL>_metadata WHERE did = $oid" SQL statement.
679# Caller can call fetchrow_array() on returned statement handle, $sth
680sub select_from_texttable_matching_docid {
681    my $self= shift (@_);
682    my ($oid, $outhandle) = @_;
683   
684    my $dbh = $self->{'db_handle'};
685    my $tablename = $self->get_fulltext_table_name();
686   
687    my $sth = $dbh->prepare_cached(qq{SELECT * FROM $tablename WHERE did = ?});
688    $sth->execute( $oid ); # will print msg on fail
689   
690    print $outhandle "### SQL select stmt: ".$sth->{'Statement'}."\n"
691    if ($self->{'verbosity'} > 2);
692   
693    my $rows_ref = $sth->fetchall_arrayref();
694    # Need explicit warning:
695    warn("Data fetching from $tablename terminated early by error: " . $dbh->err) if $dbh->err;
696    return $rows_ref;
697
698}
699
700# delete all records in metatable with specified docid
701# https://www.tutorialspoint.com/mysql/mysql-delete-query.htm
702# DELETE FROM table_name [WHERE Clause]
703# see example under 'do' at https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm
704sub delete_recs_from_metatable_with_docid {
705    my $self= shift (@_);
706    my ($oid) = @_;
707   
708    my $dbh = $self->{'db_handle'};
709
710    my $tablename = $self->get_metadata_table_name();
711    my $sth = $dbh->prepare_cached(qq{DELETE FROM $tablename WHERE did = ?});
712    $sth->execute( $oid ) or warn $dbh->errstr; # dbh set to print errors even without doing warn()
713}
714
715# delete all records in metatable with specified docid
716sub delete_recs_from_texttable_with_docid {
717    my $self= shift (@_);
718    my ($oid) = @_;
719   
720    my $dbh = $self->{'db_handle'};
721
722    my $tablename = $self->get_fulltext_table_name();
723    my $sth = $dbh->prepare_cached(qq{DELETE FROM $tablename WHERE did = ?});
724    $sth->execute( $oid ) or warn $dbh->errstr; # dbh set to print errors even without doing warn()
725}
726
727# Can call this after connection succeeded to get the database handle, dbh,
728# if any specific DB operation (SQL statement, create/delete)
729# needs to be executed that is not already provided as a method of this class.
730sub get_db_handle {
731    my $self= shift (@_);
732    return $self->{'db_handle'};
733}
734
735################ HELPER METHODS ##############
736
737# More basic helper methods
738sub get_metadata_table_name {
739    my $self= shift (@_);
740    my $table_name = $self->{'tablename_prefix'} . "_metadata";
741    return $table_name;
742}
743
744# FULLTEXT is a reserved keyword in (My)SQL. https://dev.mysql.com/doc/refman/5.5/en/keywords.html
745# So we can't name a table or any of its columns "fulltext". We use "fulltxt" instead.
746sub get_fulltext_table_name {
747    my $self= shift (@_);
748    my $table_name = $self->{'tablename_prefix'} . "_fulltxt";
749    return $table_name;
750}
751
752# Attempt to make sure the name parameter (for db or table name) is acceptable syntax
753# for the db in question, e.g. for mysql. For example, (My)SQL doesn't like tables or
754# databases with '-' (hyphens) in their names
755sub sanitize_name {
756    my $self= shift (@_);
757    my ($name) = @_;
758    $name =~ s/-/_/g;
759    return $name;
760}
761
762
763# I can get my version of table_exists to work, but it's not so ideal
764# Interesting that MySQL has non-standard command to CREATE TABLE IF NOT EXISTS and DROP TABLE IF EXISTS,
765# see https://www.perlmonks.org/bare/?node=DBI%20Recipes
766#    The page further has a table_exists function that could work with proper comparison
767# TODO Q: Couldn't get the first solution at https://www.perlmonks.org/bare/?node_id=500050 to work though
768sub table_exists {
769    my $self = shift (@_);
770    my $dbh = $self->{'db_handle'};
771    my ($table_name) = @_;
772
773    my @table_list = $dbh->tables;
774    #my $tables_str = @table_list[0];
775    foreach my $table (@table_list) {
776    return 1 if ($table =~ m/$table_name/);
777    }
778    return 0;
779}
780
7811;
Note: See TracBrowser for help on using the browser.