source: main/trunk/greenstone2/perllib/gsmysql.pm@ 32594

Last change on this file since 32594 was 32594, checked in by ak19, 5 years ago

If rollback_on_cancel is turned on, then before the database connection happens, the (warning) message suggested by Dr Bainbridge is issued to the user with a 5 second sleep: to backup their archives and index folders to keep it in sync with what the GS SQL database will do on cancel (which is to rollback until before the import or buildcol script, whichever was run). Sample copy commands, specific to the user's OS, are displayed. When cancel is pressed, if a rollback is performed, then the final message is displayed reminding the user to restore their backed up archives and index folders.

File size: 38.1 KB
RevLine 
[32529]1###########################################################################
2#
[32592]3# gsmysql.pm -- Uses DBI for MySQL related utility functions used by
[32583]4# GreenstoneSQLPlugout and GreenstoneSQLPlugin too.
[32529]5# A component of the Greenstone digital library software
6# from the New Zealand Digital Library Project at the
7# University of Waikato, New Zealand.
8#
9# Copyright (C) 1999 New Zealand Digital Library Project
10#
11# This program is free software; you can redistribute it and/or modify
12# it under the terms of the GNU General Public License as published by
13# the Free Software Foundation; either version 2 of the License, or
14# (at your option) any later version.
15#
16# This program is distributed in the hope that it will be useful,
17# but WITHOUT ANY WARRANTY; without even the implied warranty of
18# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19# GNU General Public License for more details.
20#
21# You should have received a copy of the GNU General Public License
22# along with this program; if not, write to the Free Software
23# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
24#
25###########################################################################
26
[32592]27package gsmysql;
[32529]28
29use strict;
30no strict 'refs';
31no strict 'subs';
32
[32536]33use DBI; # the central package for this module used by GreenstoneSQL Plugout and Plugin
[32594]34use FileUtils;
35use gsprintf;
[32529]36
[32588]37#################
38# Database functions that use the perl DBI module (with the DBD driver module for mysql)
39# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm
40# https://metacpan.org/pod/DBD::mysql
41#################
[32581]42
[32580]43##############################
44
[32588]45# TODO Q: If disconnect is automatically called when object destroyed, what does that mean
46# for our file-global handle object, is disconnect only called at end of perl process?
[32592]47# Does that mean we don't need to explicitly call disconnect in gsmysql object's destroy during
[32588]48# the GLOBAL destruction phase?
49# https://perldoc.perl.org/perlobj.html#Destructors
50
51#+ TODO: add infrastructure for db_port, AutoCommit etc
[32580]52# For port, see https://stackoverflow.com/questions/2248665/perl-script-to-connect-to-mysql-server-port-3307
53
54# + TODO: remove unnecessary warn() since PrintError is active
55
[32591]56# + TODO: drop table if exists and create table if exists are available in MySQL. Use those cmds
[32585]57# instead of always first checking for existence ourselves? Only when subclassing to specific
[32588]58# mysql class?
59
60
61# + TODO Q: What on cancelling a build: delete table? But what if it was a rebuild and the rebuild is cancelled (not the original build)?
62# Do we create a copy of the orig database as backup, then start populating current db, and if cancelled, delete current db and RENAME backup table to current?
63# https://stackoverflow.com/questions/3280006/duplicating-a-mysql-table-indexes-and-data
64# BUT what if the table is HUGE? (Think of a collection with millions of docs.) Huge overhead in copying?
65# The alternative is we just quit on cancel, but then: cancel could leave the table in a partial committed state, with no way of rolling back.
66# Unless they do a full rebuild, which will recreate the table from scratch?
67# SOLUTION-> rollback transaction on error, see https://www.effectiveperlprogramming.com/2010/07/set-custom-dbi-error-handlers/
68# In that case, should set AutoCommit to off on connection, and remember to commit at end.
69
70# + TODO: Consider AutoCommit status (and Autocommit off allowing commit or rollback for GS coll build cancel) later
71
72
[32580]73##############################
74
[32578]75# singleton connection
76my $_dbh_instance = undef; # calls undef() function. See https://perlmaven.com/undef-and-defined-in-perl
77my $ref_count = 0;
78
[32529]79# Need params_map keys:
80# - collection_name
[32530]81# - db_encoding (db content encoding) - MySQL can set this at server, db, table levels. For MySQL
82# we set the enc during connect at server level. Not sure whether other DB's support it at the
83# same levels.
84
85# For connection to MySQL, need:
[32529]86# - db_driver, db_client_user, db_client_pwd, db_host, (db_port not used at present)
[32530]87# So these will be parameterised, but in a hashmap, for just the connect method.
88
89# Parameterise (one or more methods may use them):
[32583]90# - db_name (which is the GS3 sitename, or "greenstone2" for GS2)
[32530]91
[32588]92# Database access related functions
93# http://g2pc1.bu.edu/~qzpeng/manual/MySQL%20Commands.htm
94# https://www.guru99.com/insert-into.html
[32582]95
[32588]96# Add signal handlers to cleanup and disconnect from db on sudden termination, incl cancel build
97# https://www.perl.com/article/37/2013/8/18/Catch-and-Handle-Signals-in-Perl/
[32582]98$SIG{INT} = \&finish_signal_handler;
99$SIG{TERM} = \&finish_signal_handler;
100$SIG{KILL} = \&finish_signal_handler;
101
102sub finish_signal_handler {
103 my ($sig) = @_; # one of INT|KILL|TERM
104
105 if ($_dbh_instance) { # database handle (note, using singleton) still active.
106
[32588]107 # If autocommit wasn't set, then this is a cancel operation.
[32582]108 # If we've not disconnected from the sql db yet and if we've not committed
109 # transactions yet, then cancel means we do a rollback here
110
111 if($_dbh_instance->{AutoCommit} == 0) {
112 print STDERR " User cancelled: rolling back SQL database transaction.\n";
113 $_dbh_instance->rollback(); # will warn on failure, nothing more we can/want to do,
[32594]114
115 print STDERR "****************************\n";
116 &gsprintf::gsprintf(STDERR, "{gsmysql.restore_backups_on_build_cancel_msg}\n");
117 print STDERR "****************************\n";
[32582]118 }
119 }
120
121 die "Caught a $sig signal $!"; # die() will always call destructor (sub DESTROY)
122}
123
[32529]124sub new
[32561]125{
[32529]126 my $class = shift(@_);
127
128 my ($params_map) = @_;
129
130 # library_url: to be specified on the cmdline if not using a GS-included web server
131 # the GSDL_LIBRARY_URL env var is useful when running cmdline buildcol.pl in the linux package manager versions of GS3
132
[32531]133 # https://stackoverflow.com/questions/7083453/copying-a-hashref-in-perl
134 # Making a shallow copy works, and can handle unknown params:
135 #my $self = $params_map;
[32529]136
[32531]137 # but being explicit for class params needed for MySQL:
138 my $self = {
139 'collection_name' => $params_map->{'collection_name'},
[32560]140 'verbosity' => $params_map->{'verbosity'} || 1
[32531]141 };
142
[32559]143 # The db_encoding option is presently not passed in to this constructor as parameter.
144 # Placed here to indicate it's sort of optional.
145 # Since docxml are all in utf8, the contents of the GS SQL database should be too,
146 # So making utf8 the hidden default at present.
147 $self->{'db_encoding'} = $params_map->{'db_encoding'} || "utf8";
148
[32561]149 $self = bless($self, $class);
150
151 $self->{'tablename_prefix'} = $self->sanitize_name($params_map->{'collection_name'});
152
153 return $self;
[32529]154}
155
[32581]156# On die(), an object's destructor is called.
157# See https://www.perl.com/article/37/2013/8/18/Catch-and-Handle-Signals-in-Perl/
158# We want to ensure we've closed the db connection in such cases.
159# "It’s common to call die when handling SIGINT and SIGTERM. die is useful because it will ensure that Perl stops correctly: for example Perl will execute a destructor method if present when die is called, but the destructor method will not be called if a SIGINT or SIGTERM is received and no signal handler calls die."
[32582]160#
[32583]161# Useful: https://perldoc.perl.org/perlobj.html#Destructors
[32588]162# For more on when destroy is called, see https://www.perlmonks.org/?node_id=1020920
[32582]163#
164# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#disconnect
[32591]165# "Disconnects the database from the database handle. disconnect is typically only used before exiting the program. The handle is of little use after disconnecting.
[32582]166#
167# The transaction behaviour of the disconnect method is, sadly, undefined. Some database systems (such as Oracle and Ingres) will automatically commit any outstanding changes, but others (such as Informix) will rollback any outstanding changes. Applications not using AutoCommit should explicitly call commit or rollback before calling disconnect.
168#
169# The database is automatically disconnected by the DESTROY method if still connected when there are no longer any references to the handle. The DESTROY method for each driver should implicitly call rollback to undo any uncommitted changes. This is vital behaviour to ensure that incomplete transactions don't get committed simply because Perl calls DESTROY on every object before exiting. Also, do not rely on the order of object destruction during "global destruction", as it is undefined.
170#
171# Generally, if you want your changes to be committed or rolled back when you disconnect, then you should explicitly call "commit" or "rollback" before disconnecting.
172#
173# If you disconnect from a database while you still have active statement handles (e.g., SELECT statement handles that may have more data to fetch), you will get a warning. The warning may indicate that a fetch loop terminated early, perhaps due to an uncaught error. To avoid the warning call the finish method on the active handles."
174#
[32591]175#
[32581]176sub DESTROY {
177 my $self = shift;
[32585]178
[32581]179 if (${^GLOBAL_PHASE} eq 'DESTRUCT') {
[32591]180
[32582]181 if ($_dbh_instance) { # database handle still active. Use singleton handle!
[32591]182 # dbh instance being active implies build was cancelled
[32582]183
[32591]184 # rollback code has moved to finish_signal_handler() where it belongs
185 # as rollback() should only happen on cancel/unnatural termination
186 # vs commit() happening in finished() before disconnect, which is natural termination.
187
[32583]188
189 # We're now finally ready to disconnect, as is required for both natural and premature termination
[32591]190 # (Though natural termination would have disconnected already)
191 # We now leave DBI's own destructor to do the disconnection when perl calls its DESTROY()
192 # We'll just print a message to stop anyone from worrying whether cancelling build
193 # will ensure disconnection still happens. It happens, but silently.
194 print STDERR " Global Destruct Phase: DBI's own destructor will disconnect database\n";
195 #$_dbh_instance->disconnect or warn $_dbh_instance->errstr;
196 #$_dbh_instance = undef;
197 #$ref_count = 0;
[32581]198 }
199 return;
200 }
[32583]201
[32585]202 # "Always include a call to $self->SUPER::DESTROY in our destructors (even if we don't yet have any base/parent classes). (p. 145)"
203 # Superclass and destroy, call to SUPER: https://www.perlmonks.org/?node_id=879920
[32588]204 # discussion also covers multiple-inheritance (MI)
[32585]205 $self->SUPER::DESTROY if $self->can("SUPER::DESTROY");
206
[32581]207}
[32578]208
[32529]209
210
[32538]211################### BASIC DB OPERATIONS ##################
212
[32529]213# THE NEW DB FUNCTIONS
214# NOTE: FULLTEXT is a reserved keyword in (My)SQL. So we can't name a table or any of its columns "fulltext".
215# https://dev.mysql.com/doc/refman/5.5/en/keywords.html
216
217
[32578]218
219# SINGLETON / GET INSTANCE PATTERN
220# https://stackoverflow.com/questions/16655603/perl-objects-class-variable-initialization
221# https://stackoverflow.com/questions/7587157/how-can-i-set-a-static-variable-that-can-be-accessed-by-all-subclasses-of-the-sa
222# Singleton without Moose: https://www.perl.com/article/52/2013/12/11/Implementing-the-singleton-pattern-in-Perl/
223
224sub connect_to_db
225{
[32529]226 my $self= shift (@_);
[32594]227 my ($params_map) = @_; # map instead of named vars with an eye on gssql inheritance
[32578]228
229 $params_map->{'db_encoding'} = $self->{'db_encoding'};
230 $params_map->{'verbosity'} = $self->{'verbosity'};
231
232 $self->{'db_handle'} = &_get_connection_instance($params_map); # getting singleton (class method)
233 if($self->{'db_handle'}) {
234 $ref_count++; # if successful, keep track of the number of refs to the single db connection
235 return $self->{'db_handle'};
236 }
237 return undef;
238}
239
240# SINGLETON METHOD #
241# TODO: where should the defaults for these params be, here or in GS-SQLPlugin/Plugout?
242sub _get_connection_instance
243{
244 #my $self= shift (@_); # singleton method doesn't use self, but callers don't need to know that
[32594]245 my ($params_map) = @_;
246
[32582]247
[32578]248 return $_dbh_instance if($_dbh_instance);
[32594]249 # or make the connection
[32578]250
[32594]251
252 # some useful user messages first
253 if(!defined $params_map->{'autocommit'} && $params_map->{'verbosity'}) {
254 print STDERR " Autocommit parameter not defined\n";
255 }
256 if($params_map->{'autocommit'}) {
257 print STDERR " SQL DB CANCEL SUPPORT OFF.\n" if($params_map->{'verbosity'} > 2);
258 } else { # rollback on cancel support on
259 &issue_backup_on_build_message();
260 }
261
[32578]262
[32559]263 # For proper utf8 support in MySQL, encoding should be 'utf8mb4' as 'utf8' is insufficient
[32578]264 my $db_enc = "utf8mb4" if $params_map->{'db_encoding'} eq "utf8";
[32529]265
[32591]266 # Params for connecting to MySQL
267 # These params are ensured default/fallback values by the GS SQL Plugs
268 # so no need to set it here
269 my $db_driver = $params_map->{'db_driver'};
270 my $db_host = $params_map->{'db_host'};
271 my $db_user = $params_map->{'db_client_user'};
272
273 # params that can be undef are db_client_pwd and db_port
[32580]274 my $db_pwd = $params_map->{'db_client_pwd'}; # even if undef and password was necessary,
275 # we'll see a sensible error message when connect fails
276 # localhost doesn't work for us, but 127.0.0.1 works
277 # https://metacpan.org/pod/DBD::mysql
278 # "The hostname, if not specified or specified as '' or 'localhost', will default to a MySQL server
279 # running on the local machine using the default for the UNIX socket. To connect to a MySQL server
280 # on the local machine via TCP, you must specify the loopback IP address (127.0.0.1) as the host."
281 my $db_port = $params_map->{'db_port'}; # leave as undef if unspecified,
282 # as our tests never used port anyway (must have internally
283 # defaulted to whatever default port is used for MySQL)
284
285
[32529]286 #my $connect_str = "dbi:$db_driver:database=$db_name;host=$db_host";
[32580]287 # But don't provide db now - this allows checking the db exists later when loading the db
288 my $connect_str = "dbi:$db_driver:host=$db_host";
289 $connect_str .= ";port=$db_port" if $db_port;
[32558]290
[32578]291 if($params_map->{'verbosity'}) {
[32560]292 print STDERR "Away to make connection to $db_driver database with:\n";
293 print STDERR " - hostname $db_host; username: $db_user";
[32563]294 print STDERR "; and the provided password" if $db_pwd;
[32560]295 print STDERR "\nAssuming the mysql server has been started with: --character_set_server=utf8mb4\n" if $db_driver eq "mysql";
296 }
[32582]297
298 # DBI AutoCommit connection param is on/1 by default, so if a value for this is not defined
299 # as a method parameter to _get_connection_instance, then fallback to the default of on/1
[32588]300 # More: https://www.oreilly.com/library/view/programming-the-perl/1565926994/re44.html
[32582]301 my $autocommit = (defined $params_map->{'autocommit'}) ? $params_map->{'autocommit'} : 1;
[32558]302
[32529]303 my $dbh = DBI->connect("$connect_str", $db_user, $db_pwd,
304 {
305 ShowErrorStatement => 1, # more informative as DBI will append failed SQL stmt to error message
306 PrintError => 1, # on by default, but being explicit
307 RaiseError => 0, # off by default, but being explicit
[32582]308 AutoCommit => $autocommit,
[32557]309 mysql_enable_utf8mb4 => 1 # tells MySQL to use UTF-8 for communication and tells DBD::mysql to decode the data, see https://stackoverflow.com/questions/46727362/perl-mysql-utf8mb4-issue-possible-bug
[32529]310 });
311
312 if(!$dbh) {
[32557]313 # NOTE, despite handle dbh being undefined, error code will be in DBI->err (note caps)
[32529]314 return 0;
315 }
316
317 # set encoding https://metacpan.org/pod/DBD::mysql
318 # https://dev.mysql.com/doc/refman/5.7/en/charset.html
319 # https://dev.mysql.com/doc/refman/5.7/en/charset-conversion.html
[32557]320 # Setting the encoding at db server level: $dbh->do("set NAMES '" . $db_enc . "'");
321 # HOWEVER:
322 # It turned out insufficient setting the encoding to utf8, as that only supports utf8 chars that
323 # need up to 3 bytes. We may need up to 4 bytes per utf8 character, e.g. chars with macron,
324 # and for that, we need the encoding to be set to utf8mb4.
325 # To set up a MySQL db to use utf8mb4 requires configuration on the server side too.
326 # https://stackoverflow.com/questions/10957238/incorrect-string-value-when-trying-to-insert-utf-8-into-mysql-via-jdbc
327 # https://stackoverflow.com/questions/46727362/perl-mysql-utf8mb4-issue-possible-bug
328 # To set up the db for utf8mb4, therefore,
329 # the MySQL server needs to be configured for that char encoding by running the server as:
330 # mysql-5.7.23-linux-glibc2.12-x86_64/bin>./mysqld_safe --datadir=/Scratch/ak19/mysql/data --character_set_server=utf8mb4
331 # AND when connecting to the server, we can can either set mysql_enable_utf8mb4 => 1
332 # as a connection option
333 # OR we need to do both "set NAMES utf8mb4" AND "$dbh->{mysql_enable_utf8mb4} = 1;" after connecting
334 #
335 # Search results for DBI Set Names imply the "SET NAMES '<enc>'" command is mysql specific too,
336 # so setting the mysql specific option during connection above as "mysql_enable_utf8mb4 => 1"
337 # is no more objectionable. It has the advantage of cutting out the 2 extra lines of doing
338 # set NAMES '<enc>' and $dbh->{mysql_enable_utf8mb4} = 1 here.
339 # These lines may be preferred if more db_driver options are to be supported in future:
340 # then a separate method called set_db_encoding($enc) can work out what db_driver we're using
341 # and if mysql and enc=utfy, then it can do the following whereas it will issue other do stmts
342 # for other db_drivers, see https://www.perlmonks.org/?node_id=259456:
[32529]343
[32557]344 #my $stmt = "set NAMES '" . $db_enc . "'";
345 #$dbh->do($stmt) || warn("Unable to set charset encoding at db server level to: " . $db_enc . "\n"); # tells MySQL to use UTF-8 for communication
346 #$dbh->{mysql_enable_utf8mb4} = 1; # tells DBD::mysql to decode the data
347
[32529]348 # if we're here, then connection succeeded, store handle
[32578]349 $_dbh_instance = $dbh;
350 return $_dbh_instance;
351
[32529]352}
353
[32592]354# Will disconnect if this instance of gsmysql holds the last reference to the db connection
[32583]355# If disconnecting and autocommit is off, then this will commit before disconnecting
[32579]356sub finished {
[32578]357 my $self= shift (@_);
[32583]358 my $dbh = $self->{'db_handle'};
[32582]359
[32583]360 my $rc = 1; # return code: everything went fine, regardless of whether we needed to commit
361 # (AutoCommit on or off)
362
[32578]363 $ref_count--;
[32583]364 if($ref_count == 0) { # Only commit transaction when we're about to actually disconnect, not before
[32582]365
[32591]366 # + TODO: If AutoCommit was off, meaning transactions were on/enabled,
[32583]367 # then here is where we commit our one long transaction.
368 # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#commit
369 if($dbh->{AutoCommit} == 0) {
370 print STDERR " Committing transaction to SQL database now.\n" if $self->{'verbosity'};
371 $rc = $dbh->commit() or warn("SQL DB COMMIT FAILED: " . $dbh->errstr); # important problem
372 # worth embellishing error message
373 }
374 # else if autocommit was on, then we'd have committed after every db operation, so nothing to do
375
376 $self->_force_disconnect_from_db();
[32582]377 }
378
379 return $rc;
[32578]380}
381
[32582]382
[32578]383# Call this method on die(), so that you're sure the perl process has disconnected from SQL db
384# Disconnect from db - https://metacpan.org/pod/DBI#disconnect
385# TODO: make sure to have committed or rolled back before disconnect
386# and that you've call finish() on statement handles if any fetch remnants remain
[32583]387sub _force_disconnect_from_db {
[32578]388 my $self= shift (@_);
389
390 if($_dbh_instance) {
391 # make sure any active stmt handles are finished
392 # NO: "When all the data has been fetched from a SELECT statement, the driver will automatically call finish for you. So you should not call it explicitly except when you know that you've not fetched all the data from a statement handle and the handle won't be destroyed soon."
393
[32592]394 print STDERR " GSMySQL disconnecting from database\n" if $self->{'verbosity'};
[32578]395 # Just go through the singleton db handle to disconnect
396 $_dbh_instance->disconnect or warn $_dbh_instance->errstr;
397 $_dbh_instance = undef;
398 }
[32592]399 # Number of gsmysql objects that share a live connection is now 0, as the connection's dead
400 # either because the last gsmysql object finished() or because connection was killed (force)
[32578]401 $ref_count = 0;
402}
403
404
[32563]405# Load the designated database, i.e. 'use <dbname>;'.
406# If the database doesn't yet exist, creates it and loads it.
407# (Don't create the collection's tables yet, though)
408# At the end it will have loaded the requested database (in MySQL: "use <db>;") on success.
409# As usual, returns success or failure value that can be evaluated in a boolean context.
410sub use_db {
[32529]411 my $self= shift (@_);
[32563]412 my ($db_name) = @_;
[32529]413 my $dbh = $self->{'db_handle'};
[32561]414 $db_name = $self->sanitize_name($db_name);
[32529]415
[32560]416 print STDERR "Attempting to use database $db_name\n" if($self->{'verbosity'});
[32558]417
[32529]418 # perl DBI switch database: https://www.perlmonks.org/?node_id=995434
419 # do() returns undef on error.
420 # connection succeeded, try to load our database. If that didn't work, attempt to create db
421 my $success = $dbh->do("use $db_name");
422
423 if(!$success && $dbh->err == 1049) { # "Unknown database" error has code 1049 (mysql only?) meaning db doesn't exist yet
[32558]424
[32561]425 print STDERR "Database $db_name didn't exist, creating it along with the tables for the current collection...\n" if($self->{'verbosity'});
[32558]426
[32529]427 # attempt to create the db and its tables
428 $self->create_db($db_name) || return 0;
429
[32560]430 print STDERR " Created database $db_name\n" if($self->{'verbosity'} > 1);
[32529]431
432 # once more attempt to use db, now that it exists
433 $dbh->do("use $db_name") || return 0;
[32563]434 #$dbh->do("use $db_name") or die "Error (code" . $dbh->err ."): " . $dbh->errstr . "\n";
[32529]435
436 $success = 1;
437 }
438 elsif($success) { # database existed and loaded successfully, but
439 # before proceeding check that the current collection's tables exist
440
[32560]441 print STDERR "@@@ DATABASE $db_name EXISTED\n" if($self->{'verbosity'} > 2);
[32529]442 }
443
444 return $success; # could still return 0, if database failed to load with an error code != 1049
445}
446
[32571]447
[32563]448# We should already have done "use <database>;" if this gets called.
449# Just load this collection's metatable
450sub ensure_meta_table_exists {
451 my $self = shift (@_);
452
453 my $tablename = $self->get_metadata_table_name();
[32593]454 # if(!$self->table_exists($tablename)) {
455 # $self->create_metadata_table() || return 0;
456 # } else {
457 # print STDERR "@@@ Meta table exists\n" if($self->{'verbosity'} > 2);
458 # }
459 $self->create_metadata_table() || return 0; # will now only create it if it doesn't already exist
[32563]460 return 1;
461}
[32558]462
[32563]463# We should already have done "use <database>;" if this gets called.
464# Just load this collection's metatable
465sub ensure_fulltxt_table_exists {
466 my $self = shift (@_);
[32561]467
[32563]468 my $tablename = $self->get_fulltext_table_name();
[32593]469 # if(!$self->table_exists($tablename)) {
470 # $self->create_fulltext_table() || return 0;
471 # } else {
472 # print STDERR "@@@ Fulltxt table exists\n" if($self->{'verbosity'} > 2);
473 # }
474 $self->create_fulltext_table() || return 0; # will now only create it if it doesn't already exist
[32563]475 return 1;
[32529]476}
477
478
479sub create_db {
480 my $self= shift (@_);
[32557]481 my ($db_name) = @_;
[32529]482 my $dbh = $self->{'db_handle'};
[32561]483 $db_name = $self->sanitize_name($db_name);
[32529]484
485 # https://stackoverflow.com/questions/5025768/how-can-i-create-a-mysql-database-from-a-perl-script
486 return $dbh->do("create database $db_name"); # do() will return undef on fail, https://metacpan.org/pod/DBI#do
487}
488
[32593]489## NOTE: these 2 create_table methods use mysql specific "CREATE TABLE IF NOT EXISTS" syntax
490## vs general SQL CREATE TABLE syntax which would produce an error message if the table
491## already existed
492## And unless do() fails, these two create methods will now always return true,
493## even if table existed and didn't need to be created.
[32529]494sub create_metadata_table {
495 my $self= shift (@_);
496 my $dbh = $self->{'db_handle'};
497
498 my $table_name = $self->get_metadata_table_name();
[32593]499 print STDERR " Will create table $table_name if it doesn't exist\n" if($self->{'verbosity'} > 2);
[32558]500
[32529]501 # If using an auto incremented primary key:
[32593]502 my $stmt = "CREATE TABLE IF NOT EXISTS $table_name (id INT NOT NULL AUTO_INCREMENT, did VARCHAR(63) NOT NULL, sid VARCHAR(63) NOT NULL, metaname VARCHAR(127) NOT NULL, metavalue VARCHAR(1023) NOT NULL, PRIMARY KEY(id));";
[32529]503 return $dbh->do($stmt);
504}
505
506# TODO: Investigate: https://dev.mysql.com/doc/search/?d=10&p=1&q=FULLTEXT
507# 12.9.1 Natural Language Full-Text Searches
508# to see whether we have to index the 'fulltxt' column of the 'fulltext' tables
509# or let user edit this file, or add it as another option
510sub create_fulltext_table {
511 my $self= shift (@_);
512 my $dbh = $self->{'db_handle'};
513
514 my $table_name = $self->get_fulltext_table_name();
[32593]515 print STDERR " Will create table $table_name if it doesn't exist\n" if($self->{'verbosity'} > 2);
[32558]516
[32529]517 # If using an auto incremented primary key:
[32593]518 my $stmt = "CREATE TABLE IF NOT EXISTS $table_name (id INT NOT NULL AUTO_INCREMENT, did VARCHAR(63) NOT NULL, sid VARCHAR(63) NOT NULL, fulltxt LONGTEXT, PRIMARY KEY(id));";
[32529]519 return $dbh->do($stmt);
520
521}
522
[32593]523## NOTE: this method uses mysql specific "DROP TABLE IF EXISTS" syntax vs general SQL DROP TABLE
524## syntax which would produce an error message if the table didn't exist
[32538]525sub delete_collection_tables {
526 my $self= shift (@_);
527 my $dbh = $self->{'db_handle'};
[32580]528
[32538]529 # drop table <tablename>
[32593]530 # my $table = $self->get_metadata_table_name();
531 # if($self->table_exists($table)) {
532 # $dbh->do("drop table $table");
533 # }
534 # $table = $self->get_fulltext_table_name();
535 # if($self->table_exists($table)) {
536 # $dbh->do("drop table $table");
537 # }
538 my $table = $self->get_metadata_table_name();
539 $dbh->do("drop table if exists $table");
540
[32538]541 $table = $self->get_fulltext_table_name();
[32593]542 $dbh->do("drop table if exists $table");
[32582]543
[32593]544 # If prepared select statement handles already exist, would need to commit here
545 # so that future select statements using those prepared handles work.
[32582]546 # See https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#Transactions
[32538]547}
[32529]548
[32538]549# Don't call this: it will delete the meta and full text tables for ALL collections in $db_name (localsite by default)!
[32541]550# This method is just here for debugging (for testing creating a database when there is none)
[32580]551#
552# "IF EXISTS is used to prevent an error from occurring if the database does not exist. ... DROP DATABASE returns the number of tables that were removed. The DROP DATABASE statement removes from the given database directory those files and directories that MySQL itself may create during normal operation.Jun 20, 2012"
553# MySQL 8.0 Reference Manual :: 13.1.22 DROP DATABASE Syntax
554# https://dev.mysql.com/doc/en/drop-database.html
[32538]555sub _delete_database {
556 my $self= shift (@_);
557 my ($db_name) = @_;
558 my $dbh = $self->{'db_handle'};
[32561]559 $db_name = $self->sanitize_name($db_name);
560
[32560]561 print STDERR "!!! Deleting database $db_name\n" if($self->{'verbosity'});
[32538]562
563 # "drop database dbname"
564 $dbh->do("drop database $db_name") || return 0;
565
566 return 1;
567}
568
569
570########################### DB STATEMENTS ###########################
571
[32529]572# USEFUL: https://metacpan.org/pod/DBI
573# "Many methods have an optional \%attr parameter which can be used to pass information to the driver implementing the method. Except where specifically documented, the \%attr parameter can only be used to pass driver specific hints. In general, you can ignore \%attr parameters or pass it as undef."
574
[32574]575# More efficient to use prepare() to prepare an SQL statement once and then execute() it many times
576# (binding different values to placeholders) than running do() which will prepare each time and
577# execute each time. Also, do() is not useful with SQL select statements as it doesn't fetch rows.
578# Can prepare and cache prepared statements or retrieve prepared statements if cached in one step:
579# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#prepare_cached
[32529]580
581# https://www.guru99.com/insert-into.html
582# and https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html
583# for inserting multiple rows at once
584# https://www.perlmonks.org/bare/?node_id=316183
585# https://metacpan.org/pod/DBI#do
586# https://www.quora.com/What-is-the-difference-between-prepare-and-do-statements-in-Perl-while-we-make-a-connection-to-the-database-for-executing-the-query
587# https://docstore.mik.ua/orelly/linux/dbi/ch05_05.htm
588
589# https://metacpan.org/pod/DBI#performance
590# 'The q{...} style quoting used in this example avoids clashing with quotes that may be used in the SQL statement. Use the double-quote like qq{...} operator if you want to interpolate variables into the string. See "Quote and Quote-like Operators" in perlop for more details.'
[32573]591#
592# This method uses lazy loading to prepare the SQL insert stmt once for a table and store it,
593# then execute the (stored) statement each time it's needed for that table.
594sub insert_row_into_metadata_table {
595 my $self = shift (@_);
[32580]596 my ($doc_oid, $section_name, $meta_name, $escaped_meta_value, $debug_only) = @_;
[32573]597
[32529]598 my $dbh = $self->{'db_handle'};
[32574]599
600 my $tablename = $self->get_metadata_table_name();
[32580]601 my $sth = $dbh->prepare_cached(qq{INSERT INTO $tablename (did, sid, metaname, metavalue) VALUES (?, ?, ?, ?)});# || warn("Could not prepare insert statement for metadata table\n");
[32529]602
[32573]603 # Now we're ready to execute the command, unless we're only debugging
[32529]604
[32573]605 if($debug_only) {
606 # just print the statement we were going to execute
[32580]607 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $meta_name, $escaped_meta_value)\n";
[32576]608 }
609 else {
610 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $meta_name, $escaped_meta_value)\n" if $self->{'verbosity'} > 2;
[32573]611
612 $sth->execute($doc_oid, $section_name, $meta_name, $escaped_meta_value)
613 || warn ("Unable to write metadata row to db:\n\tOID $doc_oid, section $section_name,\n\tmeta name: $meta_name, val: $escaped_meta_value");
614 # Execution failure will print out info anyway: since db connection sets PrintError
615 }
[32529]616}
617
[32573]618# As above. Likewise uses lazy loading to prepare the SQL insert stmt once for a table and store it,
619# then execute the (stored) statement each time it's needed for that table.
620sub insert_row_into_fulltxt_table {
[32529]621 my $self = shift (@_);
622 #my ($did, $sid, $fulltext) = @_;
[32580]623 my ($doc_oid, $section_name, $section_textref, $debug_only) = @_;
[32573]624
[32529]625 my $dbh = $self->{'db_handle'};
626
[32574]627 my $tablename = $self->get_fulltext_table_name();
[32580]628 my $sth = $dbh->prepare_cached(qq{INSERT INTO $tablename (did, sid, fulltxt) VALUES (?, ?, ?)});# || warn("Could not prepare insert statement for fulltxt table\n");
[32574]629
[32573]630 # Now we're ready to execute the command, unless we're only debugging
631
[32580]632 # don't display the fulltxt value as it could be too long
[32576]633 my $txt_repr = $$section_textref ? "<TXT>" : "NULL";
[32580]634 if($debug_only) { # only print statement, don't execute it
635 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $txt_repr)\n";
[32576]636 }
637 else {
638 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $txt_repr)\n" if $self->{'verbosity'} > 2;
639
[32573]640 $sth->execute($doc_oid, $section_name, $$section_textref)
[32580]641 || warn ("Unable to write fulltxt row to db for row:\n\tOID $doc_oid, section $section_name"); # Execution failure will print out info anyway: since db connection sets PrintError
[32573]642 }
[32529]643}
644
[32538]645
646## The 2 select statements used by GreenstoneSQLPlugin
647
[32575]648# Using fetchall_arrayref on statement handle, to run on prepared and executed stmt
649# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#fetchall_arrayref
650# instead of selectall_arrayref on database handle which will prepare, execute and fetch
651# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#selectall_arrayref
652#
[32538]653# Returns the statement handle that prepared and executed
654# a "SELECT * FROM <COLL>_metadata WHERE did = $oid" SQL statement.
655# Caller can call fetchrow_array() on returned statement handle, $sth
656# Have to use prepare() and execute() instead of do() since do() does
657# not allow for fetching result set thereafter:
658# do(): "This method is typically most useful for non-SELECT statements that either cannot be prepared in advance (due to a limitation of the driver) or do not need to be executed repeatedly. It should not be used for SELECT statements because it does not return a statement handle (so you can't fetch any data)." https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#do
659sub select_from_metatable_matching_docid {
[32529]660 my $self= shift (@_);
[32575]661 my ($oid, $outhandle) = @_;
[32538]662
[32529]663 my $dbh = $self->{'db_handle'};
[32575]664 my $tablename = $self->get_metadata_table_name();
[32529]665
[32575]666 my $sth = $dbh->prepare_cached(qq{SELECT * FROM $tablename WHERE did = ?});
[32538]667 $sth->execute( $oid ); # will print msg on fail
[32575]668
669 print $outhandle "### SQL select stmt: ".$sth->{'Statement'}."\n"
670 if ($self->{'verbosity'} > 2);
[32529]671
[32575]672 my $rows_ref = $sth->fetchall_arrayref();
673 # "If an error occurs, fetchall_arrayref returns the data fetched thus far, which may be none.
674 # You should check $sth->err afterwards (or use the RaiseError attribute) to discover if the
675 # data is complete or was truncated due to an error."
676 # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#fetchall_arrayref
677 # https://www.oreilly.com/library/view/programming-the-perl/1565926994/ch04s05.html
678 warn("Data fetching from $tablename terminated early by error: " . $dbh->err) if $dbh->err;
679 return $rows_ref;
[32529]680}
681
[32574]682
[32575]683# See select_from_metatable_matching_docid() above.
[32538]684# Returns the statement handle that prepared and executed
685# a "SELECT * FROM <COLL>_metadata WHERE did = $oid" SQL statement.
686# Caller can call fetchrow_array() on returned statement handle, $sth
687sub select_from_texttable_matching_docid {
[32529]688 my $self= shift (@_);
[32575]689 my ($oid, $outhandle) = @_;
[32538]690
[32529]691 my $dbh = $self->{'db_handle'};
[32575]692 my $tablename = $self->get_fulltext_table_name();
[32529]693
[32575]694 my $sth = $dbh->prepare_cached(qq{SELECT * FROM $tablename WHERE did = ?});
[32538]695 $sth->execute( $oid ); # will print msg on fail
696
[32575]697 print $outhandle "### SQL select stmt: ".$sth->{'Statement'}."\n"
698 if ($self->{'verbosity'} > 2);
699
700 my $rows_ref = $sth->fetchall_arrayref();
701 # Need explicit warning:
702 warn("Data fetching from $tablename terminated early by error: " . $dbh->err) if $dbh->err;
703 return $rows_ref;
704
[32538]705}
[32529]706
[32544]707# delete all records in metatable with specified docid
708# https://www.tutorialspoint.com/mysql/mysql-delete-query.htm
709# DELETE FROM table_name [WHERE Clause]
710# see example under 'do' at https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm
711sub delete_recs_from_metatable_with_docid {
712 my $self= shift (@_);
713 my ($oid) = @_;
714
715 my $dbh = $self->{'db_handle'};
[32571]716
[32574]717 my $tablename = $self->get_metadata_table_name();
718 my $sth = $dbh->prepare_cached(qq{DELETE FROM $tablename WHERE did = ?});
[32571]719 $sth->execute( $oid ) or warn $dbh->errstr; # dbh set to print errors even without doing warn()
[32544]720}
[32538]721
[32544]722# delete all records in metatable with specified docid
723sub delete_recs_from_texttable_with_docid {
724 my $self= shift (@_);
725 my ($oid) = @_;
726
[32571]727 my $dbh = $self->{'db_handle'};
728
[32574]729 my $tablename = $self->get_fulltext_table_name();
730 my $sth = $dbh->prepare_cached(qq{DELETE FROM $tablename WHERE did = ?});
[32571]731 $sth->execute( $oid ) or warn $dbh->errstr; # dbh set to print errors even without doing warn()
[32544]732}
733
[32538]734# Can call this after connection succeeded to get the database handle, dbh,
735# if any specific DB operation (SQL statement, create/delete)
736# needs to be executed that is not already provided as a method of this class.
737sub get_db_handle {
738 my $self= shift (@_);
739 return $self->{'db_handle'};
[32529]740}
741
[32538]742################ HELPER METHODS ##############
743
[32529]744# More basic helper methods
745sub get_metadata_table_name {
746 my $self= shift (@_);
[32531]747 my $table_name = $self->{'tablename_prefix'} . "_metadata";
[32529]748 return $table_name;
749}
750
751# FULLTEXT is a reserved keyword in (My)SQL. https://dev.mysql.com/doc/refman/5.5/en/keywords.html
752# So we can't name a table or any of its columns "fulltext". We use "fulltxt" instead.
753sub get_fulltext_table_name {
754 my $self= shift (@_);
[32531]755 my $table_name = $self->{'tablename_prefix'} . "_fulltxt";
[32529]756 return $table_name;
757}
758
[32561]759# Attempt to make sure the name parameter (for db or table name) is acceptable syntax
760# for the db in question, e.g. for mysql. For example, (My)SQL doesn't like tables or
761# databases with '-' (hyphens) in their names
762sub sanitize_name {
763 my $self= shift (@_);
764 my ($name) = @_;
765 $name =~ s/-/_/g;
766 return $name;
767}
[32531]768
[32561]769
[32529]770# I can get my version of table_exists to work, but it's not so ideal
771# Interesting that MySQL has non-standard command to CREATE TABLE IF NOT EXISTS and DROP TABLE IF EXISTS,
772# see https://www.perlmonks.org/bare/?node=DBI%20Recipes
773# The page further has a table_exists function that could work with proper comparison
[32543]774# TODO Q: Couldn't get the first solution at https://www.perlmonks.org/bare/?node_id=500050 to work though
[32529]775sub table_exists {
776 my $self = shift (@_);
777 my $dbh = $self->{'db_handle'};
778 my ($table_name) = @_;
779
780 my @table_list = $dbh->tables;
781 #my $tables_str = @table_list[0];
782 foreach my $table (@table_list) {
783 return 1 if ($table =~ m/$table_name/);
784 }
785 return 0;
786}
787
[32594]788# regular function, not method
789# Called when rollback_on_cancel is on.
790# Warns they user to make backups of their archives and index dir
791# and sleeps for 5 seconds so they can do that
792sub issue_backup_on_build_message
793{
794 # warn the user they'll need to backup their archives (and index?) folders
795 # plugout stores archivedir in $self->{'output_dir'}, but not available in plugin
796 # But we're only making an example copy command anyway:
797 my $archivesdir = &FileUtils::filenameConcatenate($ENV{'GSDLCOLLECTDIR'}, "archives");
798 my $archives_rollbackdir = $archivesdir.".rollback";
799
800 # Assume user knows what they're doing if a rollback directory already exists
801 # instead of wasting time waiting for sleep to terminate
802 return if FileUtils::directoryExists("$archives_rollbackdir");
803
804 my $indexdir = &FileUtils::filenameConcatenate($ENV{'GSDLCOLLECTDIR'}, "index");
805
806
807 # use rsync command on unix
808 my $example_copy_cmds = "rsync -pavH $archivesdir $archivesdir.rollback\n";
809 $example_copy_cmds .= "rsync -pavH $indexdir $indexdir.rollback\n";
810
811 if (($ENV{'GSDLOS'} =~ /^windows$/i) && ($^O ne "cygwin")) {
812 # https://stackoverflow.com/questions/4601161/copying-all-contents-of-folder-to-another-folder-using-batch-file
813 $example_copy_cmds = "xcopy /EVI $archivesdir $archivesdir.rollback\n";
814 $example_copy_cmds .= "xcopy /EVI $indexdir $indexdir.rollback\n";
815
816 }
817 print STDERR "****************************\n";
818 &gsprintf::gsprintf(STDERR, "{gsmysql.backup_on_build_msg}\n", $example_copy_cmds);
819 print STDERR "****************************\n";
820 sleep 5; # 5s
821}
822
[32529]8231;
Note: See TracBrowser for help on using the repository browser.