source: main/trunk/greenstone2/perllib/gssql.pm@ 32582

Last change on this file since 32582 was 32582, checked in by ak19, 5 years ago

Now that previous commit(s) put sig handlers in place in gs_sql, have been able to add in Undo on build/import Cancel for the GS SQL Plugs. This utilizes AutoCommit vs Transaction (rollback/commit) behaviour. On cancel, a sig handler is triggered (SIGINT) and, if AutoCommit is off, does a rollback before die() which calls object destructor and disconnects from db. On regular program execution running to normal termination, the last finish() call on gs_sql that will trigger the disconnect, will now first do a commit(), if AutoCommit is off, before disconnecting. For now, the default for both GreenstoneSQLPlugs is to support Undo (i.e. transactions), which turns AutoCommit off. Not sure whether this will be robust: what if transactions take place in memory, we could be dealing with millions of docs of large full-txt. Another issue is that the SQL DB may be out of sync with archives and index folder on Cancel: archives and index just terminate and are in an intermediate state depending on when cancel was pressed. Whereas the GS SQL DB is in a rolled back state as if the import or build never took place. A third issue is that during buildcol (perhaps specifically during buildcol's doc processing phase), pressing Cancel still continues buildcol: the current perl process is cancelled but the next one continues, rather than terminating buildcol in entirety. What happens with the GS SQL DB is that any 'transaction' until then is rolled back, perhaps a transaction regarding one doc if the Cancel affects on a doc basis, and the next process (next doc processing?) continues and allows for further transactions that are all committed at the end on natural termination of buildcol. Need to whether Undo behavious is really what we want. But it's available now and we can simply change the default to not support Undo if we want the old behaviour again.

File size: 34.1 KB
Line 
1###########################################################################
2#
3# gssql.pm -- DBI for SQL related utility functions used by
4# GreenstoneSQLPlugout and hereafter by GreenstoneSQLPlugin too.
5# A component of the Greenstone digital library software
6# from the New Zealand Digital Library Project at the
7# University of Waikato, New Zealand.
8#
9# Copyright (C) 1999 New Zealand Digital Library Project
10#
11# This program is free software; you can redistribute it and/or modify
12# it under the terms of the GNU General Public License as published by
13# the Free Software Foundation; either version 2 of the License, or
14# (at your option) any later version.
15#
16# This program is distributed in the hope that it will be useful,
17# but WITHOUT ANY WARRANTY; without even the implied warranty of
18# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19# GNU General Public License for more details.
20#
21# You should have received a copy of the GNU General Public License
22# along with this program; if not, write to the Free Software
23# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
24#
25###########################################################################
26
27package gssql;
28
29use strict;
30no strict 'refs';
31no strict 'subs';
32
33use DBI; # the central package for this module used by GreenstoneSQL Plugout and Plugin
34
35
36##############################
37
38# TODO: add infrastructure for db_port, AutoCommit etc
39# For port, see https://stackoverflow.com/questions/2248665/perl-script-to-connect-to-mysql-server-port-3307
40
41# + TODO: remove unnecessary warn() since PrintError is active
42# https://perldoc.perl.org/perlobj.html#Destructors
43
44# TODO: drop table if exists and create table if exists are available in MySQL. Use those cmds
45# instead of always first checking for existence ourselves?
46##############################
47
48# singleton connection
49my $_dbh_instance = undef; # calls undef() function. See https://perlmaven.com/undef-and-defined-in-perl
50my $ref_count = 0;
51
52# Need params_map keys:
53# - collection_name
54# - db_encoding (db content encoding) - MySQL can set this at server, db, table levels. For MySQL
55# we set the enc during connect at server level. Not sure whether other DB's support it at the
56# same levels.
57
58# For connection to MySQL, need:
59# - db_driver, db_client_user, db_client_pwd, db_host, (db_port not used at present)
60# So these will be parameterised, but in a hashmap, for just the connect method.
61
62# Parameterise (one or more methods may use them):
63# - db_name (which is the GS3 sitename)
64
65
66
67$SIG{INT} = \&finish_signal_handler;
68$SIG{TERM} = \&finish_signal_handler;
69$SIG{KILL} = \&finish_signal_handler;
70
71sub finish_signal_handler {
72 my ($sig) = @_; # one of INT|KILL|TERM
73
74 if ($_dbh_instance) { # database handle (note, using singleton) still active.
75
76 # TODO: If autocommit wasn't set, then this is a cancel operation.
77 # If we've not disconnected from the sql db yet and if we've not committed
78 # transactions yet, then cancel means we do a rollback here
79
80 if($_dbh_instance->{AutoCommit} == 0) {
81 print STDERR " User cancelled: rolling back SQL database transaction.\n";
82 $_dbh_instance->rollback(); # will warn on failure, nothing more we can/want to do,
83 }
84 }
85
86
87 die "Caught a $sig signal $!"; # die() will always call destructor (sub DESTROY)
88}
89
90sub new
91{
92 my $class = shift(@_);
93
94 my ($params_map) = @_;
95
96 # library_url: to be specified on the cmdline if not using a GS-included web server
97 # the GSDL_LIBRARY_URL env var is useful when running cmdline buildcol.pl in the linux package manager versions of GS3
98
99 # https://stackoverflow.com/questions/7083453/copying-a-hashref-in-perl
100 # Making a shallow copy works, and can handle unknown params:
101 #my $self = $params_map;
102
103 # but being explicit for class params needed for MySQL:
104 my $self = {
105 'collection_name' => $params_map->{'collection_name'},
106 'verbosity' => $params_map->{'verbosity'} || 1
107 };
108
109 # The db_encoding option is presently not passed in to this constructor as parameter.
110 # Placed here to indicate it's sort of optional.
111 # Since docxml are all in utf8, the contents of the GS SQL database should be too,
112 # So making utf8 the hidden default at present.
113 $self->{'db_encoding'} = $params_map->{'db_encoding'} || "utf8";
114
115 $self = bless($self, $class);
116
117 $self->{'tablename_prefix'} = $self->sanitize_name($params_map->{'collection_name'});
118
119 return $self;
120}
121
122# On die(), an object's destructor is called.
123# See https://www.perl.com/article/37/2013/8/18/Catch-and-Handle-Signals-in-Perl/
124# We want to ensure we've closed the db connection in such cases.
125# "It’s common to call die when handling SIGINT and SIGTERM. die is useful because it will ensure that Perl stops correctly: for example Perl will execute a destructor method if present when die is called, but the destructor method will not be called if a SIGINT or SIGTERM is received and no signal handler calls die."
126#
127# https://perldoc.perl.org/perlobj.html#Destructors
128#
129# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#disconnect
130# "Disconnects the database from the database handle. disconnect is typically only used before exitin# g the program. The handle is of little use after disconnecting.
131#
132# The transaction behaviour of the disconnect method is, sadly, undefined. Some database systems (such as Oracle and Ingres) will automatically commit any outstanding changes, but others (such as Informix) will rollback any outstanding changes. Applications not using AutoCommit should explicitly call commit or rollback before calling disconnect.
133#
134# The database is automatically disconnected by the DESTROY method if still connected when there are no longer any references to the handle. The DESTROY method for each driver should implicitly call rollback to undo any uncommitted changes. This is vital behaviour to ensure that incomplete transactions don't get committed simply because Perl calls DESTROY on every object before exiting. Also, do not rely on the order of object destruction during "global destruction", as it is undefined.
135#
136# Generally, if you want your changes to be committed or rolled back when you disconnect, then you should explicitly call "commit" or "rollback" before disconnecting.
137#
138# If you disconnect from a database while you still have active statement handles (e.g., SELECT statement handles that may have more data to fetch), you will get a warning. The warning may indicate that a fetch loop terminated early, perhaps due to an uncaught error. To avoid the warning call the finish method on the active handles."
139#
140sub DESTROY {
141 my $self = shift;
142
143 if (${^GLOBAL_PHASE} eq 'DESTRUCT') {
144
145 if ($_dbh_instance) { # database handle still active. Use singleton handle!
146
147 # THIS CODE HAS MOVED TO finish_signal_handler() WHERE IT BELONGS
148 # If autocommit wasn't set, then this is a cancel operation.
149 # If we've not disconnected from the sql db yet and if we've not committed
150 # transactions yet, then cancel means we do a rollback here
151
152 # if($_dbh_instance->{AutoCommit} == 0) {
153
154 # $_dbh_instance->rollback(); # will warn on failure, nothing more we can/want to do,
155 # # don't do a die() here: possibility of infinite loop and we still want to disconnect
156 # }
157
158 # Either way, we're now finally ready to disconnect as is required for premature
159 # termination too
160 print STDERR "XXXXXXXX Global Destruct: Disconnecting from database\n";
161 $_dbh_instance->disconnect or warn $_dbh_instance->errstr;
162 $_dbh_instance = undef;
163 $ref_count = 0;
164 }
165 return;
166 }
167}
168
169#################################
170
171# Database access related functions
172# http://g2pc1.bu.edu/~qzpeng/manual/MySQL%20Commands.htm
173# https://www.guru99.com/insert-into.html
174
175# TODO Q: What on cancelling a build: delete table? But what if it was a rebuild and the rebuild is cancelled (not the original build)?
176# Do we create a copy of the orig database as backup, then start populating current db, and if cancelled, delete current db and RENAME backup table to current?
177# https://stackoverflow.com/questions/3280006/duplicating-a-mysql-table-indexes-and-data
178# BUT what if the table is HUGE? (Think of a collection with millions of docs.) Huge overhead in copying?
179# The alternative is we just quit on cancel, but then: cancel could leave the table in a partial committed state, with no way of rolling back.
180# Unless they do a full rebuild, which will recreate the table from scratch?
181# SOLUTION-> rollback transaction on error, see https://www.effectiveperlprogramming.com/2010/07/set-custom-dbi-error-handlers/
182# But then should set AutoCommit to off on connection, and remember to commit every time
183
184#################
185# Database functions that use the perl DBI module (with the DBD driver module for mysql)
186#################
187
188################### BASIC DB OPERATIONS ##################
189
190# THE NEW DB FUNCTIONS
191# NOTE: FULLTEXT is a reserved keyword in (My)SQL. So we can't name a table or any of its columns "fulltext".
192# https://dev.mysql.com/doc/refman/5.5/en/keywords.html
193
194# TODO: Consider AutoCommit status (and Autocommit off allowing commit or rollback for GS coll build cancel) later
195
196
197
198# SINGLETON / GET INSTANCE PATTERN
199# https://stackoverflow.com/questions/16655603/perl-objects-class-variable-initialization
200# https://stackoverflow.com/questions/7587157/how-can-i-set-a-static-variable-that-can-be-accessed-by-all-subclasses-of-the-sa
201# Singleton without Moose: https://www.perl.com/article/52/2013/12/11/Implementing-the-singleton-pattern-in-Perl/
202
203sub connect_to_db
204{
205 my $self= shift (@_);
206 my ($params_map) = @_;
207
208 $params_map->{'db_encoding'} = $self->{'db_encoding'};
209 $params_map->{'verbosity'} = $self->{'verbosity'};
210
211 $self->{'db_handle'} = &_get_connection_instance($params_map); # getting singleton (class method)
212 if($self->{'db_handle'}) {
213 $ref_count++; # if successful, keep track of the number of refs to the single db connection
214 return $self->{'db_handle'};
215 }
216 return undef;
217}
218
219# SINGLETON METHOD #
220# TODO: where should the defaults for these params be, here or in GS-SQLPlugin/Plugout?
221sub _get_connection_instance
222{
223 #my $self= shift (@_); # singleton method doesn't use self, but callers don't need to know that
224 my ($params_map) = @_;
225
226 if($params_map->{'verbosity'}) {
227 if(!defined $params_map->{'autocommit'}) {
228 print STDERR " Autocommit parameter not defined\n";
229 }
230 if($params_map->{'autocommit'}) {
231 print STDERR " SQL DB UNDO SUPPORT OFF.\n";
232 } else {
233 print STDERR " SQL DB UNDO SUPPORT ON.\n";
234 }
235 }
236
237 return $_dbh_instance if($_dbh_instance);
238
239 # or make the connection
240
241 # For proper utf8 support in MySQL, encoding should be 'utf8mb4' as 'utf8' is insufficient
242 my $db_enc = "utf8mb4" if $params_map->{'db_encoding'} eq "utf8";
243
244 # these are the params for connecting to MySQL
245 my $db_driver = $params_map->{'db_driver'} || "mysql";
246 my $db_user = $params_map->{'db_client_user'} || "root";
247 my $db_pwd = $params_map->{'db_client_pwd'}; # even if undef and password was necessary,
248 # we'll see a sensible error message when connect fails
249 my $db_host = $params_map->{'db_host'} || "127.0.0.1";
250 # localhost doesn't work for us, but 127.0.0.1 works
251 # https://metacpan.org/pod/DBD::mysql
252 # "The hostname, if not specified or specified as '' or 'localhost', will default to a MySQL server
253 # running on the local machine using the default for the UNIX socket. To connect to a MySQL server
254 # on the local machine via TCP, you must specify the loopback IP address (127.0.0.1) as the host."
255 my $db_port = $params_map->{'db_port'}; # leave as undef if unspecified,
256 # as our tests never used port anyway (must have internally
257 # defaulted to whatever default port is used for MySQL)
258
259
260 #my $connect_str = "dbi:$db_driver:database=$db_name;host=$db_host";
261 # But don't provide db now - this allows checking the db exists later when loading the db
262 my $connect_str = "dbi:$db_driver:host=$db_host";
263 $connect_str .= ";port=$db_port" if $db_port;
264
265 if($params_map->{'verbosity'}) {
266 print STDERR "Away to make connection to $db_driver database with:\n";
267 print STDERR " - hostname $db_host; username: $db_user";
268 print STDERR "; and the provided password" if $db_pwd;
269 print STDERR "\nAssuming the mysql server has been started with: --character_set_server=utf8mb4\n" if $db_driver eq "mysql";
270 }
271
272 # DBI AutoCommit connection param is on/1 by default, so if a value for this is not defined
273 # as a method parameter to _get_connection_instance, then fallback to the default of on/1
274 my $autocommit = (defined $params_map->{'autocommit'}) ? $params_map->{'autocommit'} : 1;
275
276 my $dbh = DBI->connect("$connect_str", $db_user, $db_pwd,
277 {
278 ShowErrorStatement => 1, # more informative as DBI will append failed SQL stmt to error message
279 PrintError => 1, # on by default, but being explicit
280 RaiseError => 0, # off by default, but being explicit
281 AutoCommit => $autocommit,
282 mysql_enable_utf8mb4 => 1 # tells MySQL to use UTF-8 for communication and tells DBD::mysql to decode the data, see https://stackoverflow.com/questions/46727362/perl-mysql-utf8mb4-issue-possible-bug
283 });
284
285 if(!$dbh) {
286 # NOTE, despite handle dbh being undefined, error code will be in DBI->err (note caps)
287 return 0;
288 }
289
290 # set encoding https://metacpan.org/pod/DBD::mysql
291 # https://dev.mysql.com/doc/refman/5.7/en/charset.html
292 # https://dev.mysql.com/doc/refman/5.7/en/charset-conversion.html
293 # Setting the encoding at db server level: $dbh->do("set NAMES '" . $db_enc . "'");
294 # HOWEVER:
295 # It turned out insufficient setting the encoding to utf8, as that only supports utf8 chars that
296 # need up to 3 bytes. We may need up to 4 bytes per utf8 character, e.g. chars with macron,
297 # and for that, we need the encoding to be set to utf8mb4.
298 # To set up a MySQL db to use utf8mb4 requires configuration on the server side too.
299 # https://stackoverflow.com/questions/10957238/incorrect-string-value-when-trying-to-insert-utf-8-into-mysql-via-jdbc
300 # https://stackoverflow.com/questions/46727362/perl-mysql-utf8mb4-issue-possible-bug
301 # To set up the db for utf8mb4, therefore,
302 # the MySQL server needs to be configured for that char encoding by running the server as:
303 # mysql-5.7.23-linux-glibc2.12-x86_64/bin>./mysqld_safe --datadir=/Scratch/ak19/mysql/data --character_set_server=utf8mb4
304 # AND when connecting to the server, we can can either set mysql_enable_utf8mb4 => 1
305 # as a connection option
306 # OR we need to do both "set NAMES utf8mb4" AND "$dbh->{mysql_enable_utf8mb4} = 1;" after connecting
307 #
308 # Search results for DBI Set Names imply the "SET NAMES '<enc>'" command is mysql specific too,
309 # so setting the mysql specific option during connection above as "mysql_enable_utf8mb4 => 1"
310 # is no more objectionable. It has the advantage of cutting out the 2 extra lines of doing
311 # set NAMES '<enc>' and $dbh->{mysql_enable_utf8mb4} = 1 here.
312 # These lines may be preferred if more db_driver options are to be supported in future:
313 # then a separate method called set_db_encoding($enc) can work out what db_driver we're using
314 # and if mysql and enc=utfy, then it can do the following whereas it will issue other do stmts
315 # for other db_drivers, see https://www.perlmonks.org/?node_id=259456:
316
317 #my $stmt = "set NAMES '" . $db_enc . "'";
318 #$dbh->do($stmt) || warn("Unable to set charset encoding at db server level to: " . $db_enc . "\n"); # tells MySQL to use UTF-8 for communication
319 #$dbh->{mysql_enable_utf8mb4} = 1; # tells DBD::mysql to decode the data
320
321 # if we're here, then connection succeeded, store handle
322 $_dbh_instance = $dbh;
323 return $_dbh_instance;
324
325}
326
327# Will disconnect if this instance of gssql holds the last reference to the db connection
328sub finished {
329 my $self= shift (@_);
330
331 # TODO: if AutoCommit was off, meaning transactions were on/enabled,
332 # then here is where we commit our one long transaction.
333 # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#commit
334 my $rc = 1;
335
336 $ref_count--;
337 if($ref_count == 0) {
338 # Only commit transaction when we're about to disconnect, not before
339 # If autocommit was on, then we'd have committed after every db operation, so nothing to do
340 $rc = $self->do_commit_if_on();
341
342 $self->force_disconnect_from_db();
343 }
344
345 return $rc;
346}
347
348sub do_commit_if_on {
349 my $self= shift (@_);
350 my $dbh = $self->{'db_handle'};
351
352 my $rc = 1; # return code: everything went fine, regardless of whether we needed to commit
353 # (AutoCommit on or off)
354
355 # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#commit
356 if($dbh->{AutoCommit} == 0) {
357 print STDERR " Committing transaction to SQL database now.\n" if $self->{'verbosity'};
358 $rc = $dbh->commit() or warn("SQL DB COMMIT FAILED: " . $dbh->errstr); # important problem
359 # worth embellishing error message
360 }
361 # If autocommit was on, then we'd have committed after every db operation, so nothing to do
362
363 return $rc;
364}
365
366# Call this method on die(), so that you're sure the perl process has disconnected from SQL db
367# Disconnect from db - https://metacpan.org/pod/DBI#disconnect
368# TODO: make sure to have committed or rolled back before disconnect
369# and that you've call finish() on statement handles if any fetch remnants remain
370sub force_disconnect_from_db {
371 my $self= shift (@_);
372
373 if($_dbh_instance) {
374 # make sure any active stmt handles are finished
375 # NO: "When all the data has been fetched from a SELECT statement, the driver will automatically call finish for you. So you should not call it explicitly except when you know that you've not fetched all the data from a statement handle and the handle won't be destroyed soon."
376
377 print STDERR " GSSQL disconnecting from database\n";
378 # Just go through the singleton db handle to disconnect
379 $_dbh_instance->disconnect or warn $_dbh_instance->errstr;
380 $_dbh_instance = undef;
381 }
382 # Number of gssql objects that share a live connection is now 0, as the connection's dead
383 # either because the last gssql object finished() or because connection was killed (force)
384 $ref_count = 0;
385}
386
387
388# Load the designated database, i.e. 'use <dbname>;'.
389# If the database doesn't yet exist, creates it and loads it.
390# (Don't create the collection's tables yet, though)
391# At the end it will have loaded the requested database (in MySQL: "use <db>;") on success.
392# As usual, returns success or failure value that can be evaluated in a boolean context.
393sub use_db {
394 my $self= shift (@_);
395 my ($db_name) = @_;
396 my $dbh = $self->{'db_handle'};
397 $db_name = $self->sanitize_name($db_name);
398
399 print STDERR "Attempting to use database $db_name\n" if($self->{'verbosity'});
400
401 # perl DBI switch database: https://www.perlmonks.org/?node_id=995434
402 # do() returns undef on error.
403 # connection succeeded, try to load our database. If that didn't work, attempt to create db
404 my $success = $dbh->do("use $db_name");
405
406 if(!$success && $dbh->err == 1049) { # "Unknown database" error has code 1049 (mysql only?) meaning db doesn't exist yet
407
408 print STDERR "Database $db_name didn't exist, creating it along with the tables for the current collection...\n" if($self->{'verbosity'});
409
410 # attempt to create the db and its tables
411 $self->create_db($db_name) || return 0;
412
413 print STDERR " Created database $db_name\n" if($self->{'verbosity'} > 1);
414
415 # once more attempt to use db, now that it exists
416 $dbh->do("use $db_name") || return 0;
417 #$dbh->do("use $db_name") or die "Error (code" . $dbh->err ."): " . $dbh->errstr . "\n";
418
419 $success = 1;
420 }
421 elsif($success) { # database existed and loaded successfully, but
422 # before proceeding check that the current collection's tables exist
423
424 print STDERR "@@@ DATABASE $db_name EXISTED\n" if($self->{'verbosity'} > 2);
425 }
426
427 return $success; # could still return 0, if database failed to load with an error code != 1049
428}
429
430
431# We should already have done "use <database>;" if this gets called.
432# Just load this collection's metatable
433sub ensure_meta_table_exists {
434 my $self = shift (@_);
435
436 my $tablename = $self->get_metadata_table_name();
437 if(!$self->table_exists($tablename)) {
438 #print STDERR " Creating metadata table $tablename\n" if($self->{'verbosity'} > 1);
439 $self->create_metadata_table() || return 0;
440 } else {
441 print STDERR "@@@ Meta table exists\n" if($self->{'verbosity'} > 2);
442 }
443 return 1;
444}
445
446# We should already have done "use <database>;" if this gets called.
447# Just load this collection's metatable
448sub ensure_fulltxt_table_exists {
449 my $self = shift (@_);
450
451 my $tablename = $self->get_fulltext_table_name();
452 if(!$self->table_exists($tablename)) {
453 #print STDERR " Creating fulltxt table $tablename\n" if($self->{'verbosity'} > 1);
454 $self->create_fulltext_table() || return 0;
455 } else {
456 print STDERR "@@@ Fulltxt table exists\n" if($self->{'verbosity'} > 2);
457 }
458 return 1;
459}
460
461
462sub create_db {
463 my $self= shift (@_);
464 my ($db_name) = @_;
465 my $dbh = $self->{'db_handle'};
466 $db_name = $self->sanitize_name($db_name);
467
468 # https://stackoverflow.com/questions/5025768/how-can-i-create-a-mysql-database-from-a-perl-script
469 return $dbh->do("create database $db_name"); # do() will return undef on fail, https://metacpan.org/pod/DBI#do
470}
471
472
473sub create_metadata_table {
474 my $self= shift (@_);
475 my $dbh = $self->{'db_handle'};
476
477 my $table_name = $self->get_metadata_table_name();
478 print STDERR " Creating table $table_name\n" if($self->{'verbosity'} > 1);
479
480 # If using an auto incremented primary key:
481 my $stmt = "CREATE TABLE $table_name (id INT NOT NULL AUTO_INCREMENT, did VARCHAR(63) NOT NULL, sid VARCHAR(63) NOT NULL, metaname VARCHAR(127) NOT NULL, metavalue VARCHAR(1023) NOT NULL, PRIMARY KEY(id));";
482 return $dbh->do($stmt);
483}
484
485# TODO: Investigate: https://dev.mysql.com/doc/search/?d=10&p=1&q=FULLTEXT
486# 12.9.1 Natural Language Full-Text Searches
487# to see whether we have to index the 'fulltxt' column of the 'fulltext' tables
488# or let user edit this file, or add it as another option
489sub create_fulltext_table {
490 my $self= shift (@_);
491 my $dbh = $self->{'db_handle'};
492
493 my $table_name = $self->get_fulltext_table_name();
494 print STDERR " Creating table $table_name\n" if($self->{'verbosity'} > 1);
495
496 # If using an auto incremented primary key:
497 my $stmt = "CREATE TABLE $table_name (id INT NOT NULL AUTO_INCREMENT, did VARCHAR(63) NOT NULL, sid VARCHAR(63) NOT NULL, fulltxt LONGTEXT, PRIMARY KEY(id));";
498 return $dbh->do($stmt);
499
500}
501
502
503sub delete_collection_tables {
504 my $self= shift (@_);
505 my $dbh = $self->{'db_handle'};
506
507 # drop table <tablename>
508 my $table = $self->get_metadata_table_name();
509 if($self->table_exists($table)) {
510 $dbh->do("drop table $table");# || warn("@@@ Couldn't delete $table");
511 }
512 $table = $self->get_fulltext_table_name();
513 if($self->table_exists($table)) {
514 $dbh->do("drop table $table");# || warn("@@@ Couldn't delete $table");
515 }
516
517 # TODO Q: commit here, so that future select statements work?
518 # See https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#Transactions
519}
520
521# Don't call this: it will delete the meta and full text tables for ALL collections in $db_name (localsite by default)!
522# This method is just here for debugging (for testing creating a database when there is none)
523#
524# "IF EXISTS is used to prevent an error from occurring if the database does not exist. ... DROP DATABASE returns the number of tables that were removed. The DROP DATABASE statement removes from the given database directory those files and directories that MySQL itself may create during normal operation.Jun 20, 2012"
525# MySQL 8.0 Reference Manual :: 13.1.22 DROP DATABASE Syntax
526# https://dev.mysql.com/doc/en/drop-database.html
527sub _delete_database {
528 my $self= shift (@_);
529 my ($db_name) = @_;
530 my $dbh = $self->{'db_handle'};
531 $db_name = $self->sanitize_name($db_name);
532
533 print STDERR "!!! Deleting database $db_name\n" if($self->{'verbosity'});
534
535 # "drop database dbname"
536 $dbh->do("drop database $db_name") || return 0;
537
538 return 1;
539}
540
541
542########################### DB STATEMENTS ###########################
543
544# USEFUL: https://metacpan.org/pod/DBI
545# "Many methods have an optional \%attr parameter which can be used to pass information to the driver implementing the method. Except where specifically documented, the \%attr parameter can only be used to pass driver specific hints. In general, you can ignore \%attr parameters or pass it as undef."
546
547# More efficient to use prepare() to prepare an SQL statement once and then execute() it many times
548# (binding different values to placeholders) than running do() which will prepare each time and
549# execute each time. Also, do() is not useful with SQL select statements as it doesn't fetch rows.
550# Can prepare and cache prepared statements or retrieve prepared statements if cached in one step:
551# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#prepare_cached
552
553# https://www.guru99.com/insert-into.html
554# and https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html
555# for inserting multiple rows at once
556# https://www.perlmonks.org/bare/?node_id=316183
557# https://metacpan.org/pod/DBI#do
558# https://www.quora.com/What-is-the-difference-between-prepare-and-do-statements-in-Perl-while-we-make-a-connection-to-the-database-for-executing-the-query
559# https://docstore.mik.ua/orelly/linux/dbi/ch05_05.htm
560
561# https://metacpan.org/pod/DBI#performance
562# 'The q{...} style quoting used in this example avoids clashing with quotes that may be used in the SQL statement. Use the double-quote like qq{...} operator if you want to interpolate variables into the string. See "Quote and Quote-like Operators" in perlop for more details.'
563#
564# This method uses lazy loading to prepare the SQL insert stmt once for a table and store it,
565# then execute the (stored) statement each time it's needed for that table.
566sub insert_row_into_metadata_table {
567 my $self = shift (@_);
568 my ($doc_oid, $section_name, $meta_name, $escaped_meta_value, $debug_only) = @_;
569
570 my $dbh = $self->{'db_handle'};
571
572 my $tablename = $self->get_metadata_table_name();
573 my $sth = $dbh->prepare_cached(qq{INSERT INTO $tablename (did, sid, metaname, metavalue) VALUES (?, ?, ?, ?)});# || warn("Could not prepare insert statement for metadata table\n");
574
575 # Now we're ready to execute the command, unless we're only debugging
576
577 if($debug_only) {
578 # just print the statement we were going to execute
579 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $meta_name, $escaped_meta_value)\n";
580 }
581 else {
582 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $meta_name, $escaped_meta_value)\n" if $self->{'verbosity'} > 2;
583
584 $sth->execute($doc_oid, $section_name, $meta_name, $escaped_meta_value)
585 || warn ("Unable to write metadata row to db:\n\tOID $doc_oid, section $section_name,\n\tmeta name: $meta_name, val: $escaped_meta_value");
586 # Execution failure will print out info anyway: since db connection sets PrintError
587 }
588}
589
590# As above. Likewise uses lazy loading to prepare the SQL insert stmt once for a table and store it,
591# then execute the (stored) statement each time it's needed for that table.
592sub insert_row_into_fulltxt_table {
593 my $self = shift (@_);
594 #my ($did, $sid, $fulltext) = @_;
595 my ($doc_oid, $section_name, $section_textref, $debug_only) = @_;
596
597 my $dbh = $self->{'db_handle'};
598
599 my $tablename = $self->get_fulltext_table_name();
600 my $sth = $dbh->prepare_cached(qq{INSERT INTO $tablename (did, sid, fulltxt) VALUES (?, ?, ?)});# || warn("Could not prepare insert statement for fulltxt table\n");
601
602 # Now we're ready to execute the command, unless we're only debugging
603
604 # don't display the fulltxt value as it could be too long
605 my $txt_repr = $$section_textref ? "<TXT>" : "NULL";
606 if($debug_only) { # only print statement, don't execute it
607 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $txt_repr)\n";
608 }
609 else {
610 print STDERR $sth->{'Statement'} . "($doc_oid, $section_name, $txt_repr)\n" if $self->{'verbosity'} > 2;
611
612 $sth->execute($doc_oid, $section_name, $$section_textref)
613 || warn ("Unable to write fulltxt row to db for row:\n\tOID $doc_oid, section $section_name"); # Execution failure will print out info anyway: since db connection sets PrintError
614 }
615}
616
617
618## The 2 select statements used by GreenstoneSQLPlugin
619
620# Using fetchall_arrayref on statement handle, to run on prepared and executed stmt
621# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#fetchall_arrayref
622# instead of selectall_arrayref on database handle which will prepare, execute and fetch
623# https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#selectall_arrayref
624#
625# Returns the statement handle that prepared and executed
626# a "SELECT * FROM <COLL>_metadata WHERE did = $oid" SQL statement.
627# Caller can call fetchrow_array() on returned statement handle, $sth
628# Have to use prepare() and execute() instead of do() since do() does
629# not allow for fetching result set thereafter:
630# do(): "This method is typically most useful for non-SELECT statements that either cannot be prepared in advance (due to a limitation of the driver) or do not need to be executed repeatedly. It should not be used for SELECT statements because it does not return a statement handle (so you can't fetch any data)." https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#do
631sub select_from_metatable_matching_docid {
632 my $self= shift (@_);
633 my ($oid, $outhandle) = @_;
634
635 my $dbh = $self->{'db_handle'};
636 my $tablename = $self->get_metadata_table_name();
637
638 my $sth = $dbh->prepare_cached(qq{SELECT * FROM $tablename WHERE did = ?});
639 $sth->execute( $oid ); # will print msg on fail
640
641 print $outhandle "### SQL select stmt: ".$sth->{'Statement'}."\n"
642 if ($self->{'verbosity'} > 2);
643
644 my $rows_ref = $sth->fetchall_arrayref();
645 # "If an error occurs, fetchall_arrayref returns the data fetched thus far, which may be none.
646 # You should check $sth->err afterwards (or use the RaiseError attribute) to discover if the
647 # data is complete or was truncated due to an error."
648 # https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm#fetchall_arrayref
649 # https://www.oreilly.com/library/view/programming-the-perl/1565926994/ch04s05.html
650 warn("Data fetching from $tablename terminated early by error: " . $dbh->err) if $dbh->err;
651 return $rows_ref;
652}
653
654
655# See select_from_metatable_matching_docid() above.
656# Returns the statement handle that prepared and executed
657# a "SELECT * FROM <COLL>_metadata WHERE did = $oid" SQL statement.
658# Caller can call fetchrow_array() on returned statement handle, $sth
659sub select_from_texttable_matching_docid {
660 my $self= shift (@_);
661 my ($oid, $outhandle) = @_;
662
663 my $dbh = $self->{'db_handle'};
664 my $tablename = $self->get_fulltext_table_name();
665
666 my $sth = $dbh->prepare_cached(qq{SELECT * FROM $tablename WHERE did = ?});
667 $sth->execute( $oid ); # will print msg on fail
668
669 print $outhandle "### SQL select stmt: ".$sth->{'Statement'}."\n"
670 if ($self->{'verbosity'} > 2);
671
672 my $rows_ref = $sth->fetchall_arrayref();
673 # Need explicit warning:
674 warn("Data fetching from $tablename terminated early by error: " . $dbh->err) if $dbh->err;
675 return $rows_ref;
676
677}
678
679# delete all records in metatable with specified docid
680# https://www.tutorialspoint.com/mysql/mysql-delete-query.htm
681# DELETE FROM table_name [WHERE Clause]
682# see example under 'do' at https://metacpan.org/pod/release/TIMB/DBI-1.634_50/DBI.pm
683sub delete_recs_from_metatable_with_docid {
684 my $self= shift (@_);
685 my ($oid) = @_;
686
687 my $dbh = $self->{'db_handle'};
688
689 my $tablename = $self->get_metadata_table_name();
690 my $sth = $dbh->prepare_cached(qq{DELETE FROM $tablename WHERE did = ?});
691 $sth->execute( $oid ) or warn $dbh->errstr; # dbh set to print errors even without doing warn()
692}
693
694# delete all records in metatable with specified docid
695sub delete_recs_from_texttable_with_docid {
696 my $self= shift (@_);
697 my ($oid) = @_;
698
699 my $dbh = $self->{'db_handle'};
700
701 my $tablename = $self->get_fulltext_table_name();
702 my $sth = $dbh->prepare_cached(qq{DELETE FROM $tablename WHERE did = ?});
703 $sth->execute( $oid ) or warn $dbh->errstr; # dbh set to print errors even without doing warn()
704}
705
706# Can call this after connection succeeded to get the database handle, dbh,
707# if any specific DB operation (SQL statement, create/delete)
708# needs to be executed that is not already provided as a method of this class.
709sub get_db_handle {
710 my $self= shift (@_);
711 return $self->{'db_handle'};
712}
713
714################ HELPER METHODS ##############
715
716# More basic helper methods
717sub get_metadata_table_name {
718 my $self= shift (@_);
719 my $table_name = $self->{'tablename_prefix'} . "_metadata";
720 return $table_name;
721}
722
723# FULLTEXT is a reserved keyword in (My)SQL. https://dev.mysql.com/doc/refman/5.5/en/keywords.html
724# So we can't name a table or any of its columns "fulltext". We use "fulltxt" instead.
725sub get_fulltext_table_name {
726 my $self= shift (@_);
727 my $table_name = $self->{'tablename_prefix'} . "_fulltxt";
728 return $table_name;
729}
730
731# Attempt to make sure the name parameter (for db or table name) is acceptable syntax
732# for the db in question, e.g. for mysql. For example, (My)SQL doesn't like tables or
733# databases with '-' (hyphens) in their names
734sub sanitize_name {
735 my $self= shift (@_);
736 my ($name) = @_;
737 $name =~ s/-/_/g;
738 return $name;
739}
740
741
742# I can get my version of table_exists to work, but it's not so ideal
743# Interesting that MySQL has non-standard command to CREATE TABLE IF NOT EXISTS and DROP TABLE IF EXISTS,
744# see https://www.perlmonks.org/bare/?node=DBI%20Recipes
745# The page further has a table_exists function that could work with proper comparison
746# TODO Q: Couldn't get the first solution at https://www.perlmonks.org/bare/?node_id=500050 to work though
747sub table_exists {
748 my $self = shift (@_);
749 my $dbh = $self->{'db_handle'};
750 my ($table_name) = @_;
751
752 my @table_list = $dbh->tables;
753 #my $tables_str = @table_list[0];
754 foreach my $table (@table_list) {
755 return 1 if ($table =~ m/$table_name/);
756 }
757 return 0;
758}
759
7601;
Note: See TracBrowser for help on using the repository browser.