source: other-projects/nightly-tasks/diffcol/trunk/gen-model-colls.sh@ 28069

Last change on this file since 28069 was 28069, checked in by ak19, 11 years ago
  1. When run in svnupdate mode, the script now performs an svn add on all the new items added to both model-collect's archives and index. There's a handy command that does a recursive add on all new items in a folder. 2. Added the commit_message and debug flags as cmdline args. Note that now x is the cmdline argument shortcut for svndelete and d is the shortcut for debug mode. Running in debug mode means nothing gets committed to svn, but you get to see what the changes made to model-collect are.
File size: 21.4 KB
Line 
1#!/bin/bash
2
3# PURPOSE
4# This is not a nightly script. You use it to regenerate the model-collections
5# if Greenstone has changed fundamentally, such as what HASH OIDs get assigned
6# to documents or something that changes the contents of the index and
7# archives folders. This has happened now with the commits
8# http://trac.greenstone.org/changeset/28022 and
9# http://trac.greenstone.org/changeset/28021
10# These commits generate new stable HASH OIDs for the existing documents.
11
12
13# USAGE
14# Put this file in the toplevel of the Greenstone 2 binary/compiled SVN installation
15# that you want to generate the model collections with.
16# You can provide a list of collection names or none, in which case all the collections
17# are processed.
18
19# Pass in --svnupdate to copy across the contents of archives and index in the
20# rebuilt collection, overwriting their equivalents in the svn model collection,
21# but not removing any extraneous HASH folders already present.
22# !!!!! IMPORTANT: if you pass in svnupdate, it leaves you to do the final commit on
23# the (svn) model-collect folder!
24
25# Pass in --svndelete to remove the archives and index from svn in the model-collect
26# and replace this with the rebuilt archives and index
27# The --svndelete is useful for when the HASH directory naming has changed and everything
28# in archives and index has to be wiped out and moved back in from the rebuilt col.
29# Passing in --svndelete will do the final commits on the model-collect folder.
30
31# If neither flag is passed in, then the collections are rebuilt but the svn model-collect
32# is not updated and the repository is not updated.
33
34# Examples of usage:
35# ./gen-model-colls.sh
36# ./gen-model-colls.sh --svndelete
37# ./gen-model-colls.sh --svnupdate Tudor-Basic Tudor-Enhanced
38
39# The first just rebuilds all the collections in a new folder called collect and stops there
40
41# The second rebuilds all the collections in collect and svn removes the archives and the index
42# folders in model-collect. Then it copies across the rebuilt archives and index into model-collect
43# and svn adds them.
44
45# The third example checks out all the model-collections again, but rebuilds only the 2 collections
46# specified in the new collect folder. Then it copies across the *contents* of the archives and
47# index folders of those 2 collections into their model-collect equivalents. You then still have to
48# do the final svn commit on the model-collect folder after looking over the differences.
49
50# Also valid examples:
51# ./gen-model-colls.sh Tudor-Basic Tudor-Enhanced
52# ./gen-model-colls.sh --svndelete Tudor-Basic Tudor-Enhanced
53# ./gen-model-colls.sh --svnupdate
54
55# PSEUDOCODE
56# This script:
57# Checks out the model-collections folder from SVN
58# Makes a copy
59# In the copy: gets rid of their .svn folders, and builds each collection in turn, moving building to index once done
60# If --svndelete was passed in: svn removes model-collect/archives and model-collect/index, copies over collect/index
61# and collect/archives into model-collect and svn adds model-collect/archives and model-collect/index. Then SVN COMMITS
62# model-collect/archives and model-collect/index.
63# If --svnupdate was passed in: copies collect/archives/* into model-collect/archives/*, and copies collect/index/*
64# into model-collect/index/*, overwriting files that already existed but have now been updated upon rebuild. However,
65# --svnupdate will leave untouched any files and folders unique to model-collect. No SVN commit, that's LEFT UP TO YOU.
66
67# See earlier version of this script:
68# To svn remove what's unique to model-collect and svn add what's been rebuilt in index and archives
69# see http://stackoverflow.com/questions/7502261/delete-folder-content-and-remove-from-version-control
70
71# http://stackoverflow.com/questions/5044214/how-do-i-detect-and-or-delete-empty-subversion-directories
72# http://stackoverflow.com/questions/1301203/removing-svn-files-from-all-directories
73
74#*******************************GLOBAL VARIABLES***************************
75
76# mode can be svndelete or svnupdate
77mode=
78debug_mode=0
79commit_message=
80
81#*****************************FUNCTIONS*****************************
82
83# DON'T ADD ANY FURTHER ECHO STATEMENTS IN FUNCTION get_col_basename
84# "you have to be really careful on what you have in this function, as having any code which will eventually echo will mean that you get incorrect return string."
85# see http://stackoverflow.com/questions/3236871/how-to-return-a-string-value-from-a-bash-function
86function get_col_basename () {
87 collection=$1
88
89 #escape the filename (in case of space)
90 collection=`echo $collection | sed 's@ @\\\ @g'`
91
92 #get just the basename
93 collection=`basename $collection`
94
95 # returning a string does not work in bash
96 # see http://stackoverflow.com/questions/3236871/how-to-return-a-string-value-from-a-bash-function
97
98 #return $collection
99 echo $collection
100}
101
102
103# Function that handles the --svndelete flag (mode) of this script for a single collection
104function svn_delete () {
105
106 # svn remove archives and index in each collection
107 # commit them all
108 # copy over newly rebuilt archives and index into each model-collection
109 # svn add the new archives and index folders of each collection
110 # commit them all
111
112
113 if [ "x$1" == "x" ]; then
114 for collection in collect/*; do
115 _del_col_archives_index $collection
116 done
117 else
118 for collection in "$@"; do
119 _del_col_archives_index $collection
120 done
121 fi
122
123 # svn commit all the svn rm statements done above in one go:
124 # don't do `svn up` at this point, as doing so will then retrieve all the folders that just were svn-removed
125
126 if [ "x$commit_message" == "x" ]; then
127 commit_message="Clean rebuild of model collections 1/2. Clearing out deprecated archives and index."
128 fi
129
130 # Numerical comparisons: http://tldp.org/LDP/abs/html/comparison-ops.html
131 if [ "$debug_mode" -eq "0" ]; then
132 svn commit -m "AUTOCOMMIT by gen-model-colls.sh script. Message: $commit_message" model-collect
133 fi
134
135 # Having svn committed the deletes, do an svn up to locally delete what was svn-removed above,
136 # BEFORE copying from the rebuilt archives and index folders
137 if [ "$debug_mode" -eq "0" ]; then
138 svn up model-collect
139 fi
140
141 # copy from the rebuilt archives and index over into the svn model-collect and svn add them
142 if [ "x$1" == "x" ]; then
143 for collection in collect/*; do
144 _add_col_archives_index $collection
145 done
146 else
147 for collection in "$@"; do
148 _add_col_archives_index $collection
149 done
150 fi
151
152 # commit all the svn add statements done just above in one go
153 if [ "x$commit_message" == "x" ]; then
154 commit_message="Clean rebuild of model collections 2/2. Adding rebuilt archives and index."
155 fi
156
157 if [ "$debug_mode" -eq "0" ]; then
158 svn commit -m "AUTOCOMMIT by gen-model-colls.sh script. Message: $commit_message" model-collect
159 fi
160
161 echo
162 echo "*********************"
163 echo "Done svn-deleting rebuilt model-collection: $collection"
164 echo "*********************"
165 echo
166}
167
168# To undo the changes made by svndelete, run the following manually
169# svn revert --depth infinity <model-collect/$collection/archives/*
170# svn revert --depth infinity <model-collect/$collection/archives/*
171# then remove both the local archives and index, and do an svn up to get original checkout back
172
173# svn delete this collection's archives and index folders
174# (The commit will be done when in one step for all collections on which this function was called)
175function _del_col_archives_index () {
176 collection=$1
177
178 #escape the filename (in case of space) and get just the basename
179 collection=$(get_col_basename $collection)
180
181 if [ ! -e model-collect/$collection ]; then
182 echo "del_col_archives_index: $collection does not exist in model-collect, will svn add this new collection shortly"
183 return;
184 fi
185
186 # remove the entire archives and index folders from svn
187 if [ "$debug_mode" -eq "0" ]; then
188 svn rm model-collect/$collection/archives
189 svn rm model-collect/$collection/index
190 elif [ "$debug_mode" -eq "1" ]; then
191 rm -rf model-collect/$collection/archives
192 rm -rf model-collect/$collection/index
193 fi
194
195}
196
197
198# copy and then svn add the collection's archives and index folders
199function _add_col_archives_index () {
200 collection=$1
201
202 #escape the filename (in case of space) and get just the basename
203 collection=$(get_col_basename $collection)
204
205 if [ ! -e model-collect/$collection ]; then
206 echo "add_col_archives_index: Adding the new collection $collection to SVN"
207 return;
208 fi
209
210 # remove the entire archives and index folders from svn
211 cp -r collect/$collection/archives model-collect/$collection/.
212 cp -r collect/$collection/index model-collect/$collection/.
213
214 if [ "$debug_mode" -eq "0" ]; then
215 svn add model-collect/$collection/archives
216 svn add model-collect/$collection/index
217 fi
218}
219
220
221# UNUSED, but useful for spotting differences between the collect and model-collect
222# after rebuild, before svn updating/deleting, as opposed to at the end of the script
223function svn_process_single_collection () {
224 collection=$1
225
226 #escape the filename (in case of space) and get just the basename
227 collection=$(get_col_basename $collection)
228
229 if [ ! -e model-collect/$collection ]; then
230 echo "update_single_collection: Adding new collection $collection to SVN"
231 return;
232 fi
233
234# return here if just deleting empty dirs
235#return
236
237 # diff the svn model and rebuilt model collections
238 diff_result=`diff -rq model-collect/$collection collect/$collection | grep -v ".svn"`
239# echo "Diff result for collection $collection: $diff_result"
240
241 # if no differences in the current collection, then we're done
242 if [ "x$diff_result" == "x" ]; then
243 echo "No differences in collection $collection"
244 return;
245 fi
246
247 # check that none of the lines mention files outside the archives or index folders
248 # http://en.gibney.org/tell_the_bash_to_split_by_newline_charac
249 # http://forums.gentoo.org/viewtopic-p-3130541.html
250
251 # http://wi-fizzle.com/article/276
252 # http://stackoverflow.com/questions/918886/how-do-i-split-a-string-on-a-delimiter-in-bash
253 # http://www.linuxquestions.org/questions/programming-9/split-a-string-on-newlines-bash-313206/
254 # http://unix.stackexchange.com/questions/39473/command-substitution-splitting-on-newline-but-not-space
255
256 # store backup of Internal Field Separator value, then set IFS to newline for splitting on newline
257
258 IFS_BAK=$IFS
259# IFS='\n' # splits on all whitespace
260IFS='
261'
262 # in the lines returned from the diff, test for archives or newline
263 # http://stackoverflow.com/questions/229551/string-contains-in-bash
264 for line in `diff -rq model-collect/$collection collect/$collection | grep -v ".svn"`; do
265 # echo "LINE: $line"
266 if [[ "$line" != *archives* && "$line" != *index* ]]; then
267 # the file that is different is neither in index nor in archives, send this diffline to the report
268 echo $line >> report.txt
269 fi
270 done
271
272 IFS=$IFS_BAK
273 IFS_BAK=
274}
275
276# Function that takes care of the --svnupdate flag mode of this script for a single collection
277function update_single_collection () {
278
279 # copy across the contents of the rebuilt model-collection's index and archives to the svn model-collect
280 cp -r collect/$collection/archives/* model-collect/$collection/archives/.
281 cp -r collect/$collection/index/* model-collect/$collection/index/.
282
283 # now svn add any and all the NEW items in model-collect's archives and index
284 # see http://stackoverflow.com/questions/1071857/how-do-i-svn-add-all-unversioned-files-to-svn
285# if [ "$debug_mode" -eq "0" ]; then
286 svn add --force model-collect/$collection/archives/* --auto-props --parents --depth infinity -q
287 svn add --force model-collect/$collection/index/* --auto-props --parents --depth infinity -q
288# fi
289
290 echo "svn model-collect update process complete. CHECK AND COMMIT THE model-collect FOLDER!"
291
292 # if etc/collect.cfg is different, copy it across too?
293
294 echo
295 echo "*********************"
296 echo "Done updating the rebuilt LOCAL model-collection: model-collect/$collection"
297 echo "*********************"
298 echo
299}
300
301
302# re-build a single collection in "collect" which is a copy of model-collect
303function build_single_collection () {
304 collection=$1
305
306 collection=$(get_col_basename $collection)
307
308 import.pl -removeold $collection
309 buildcol.pl -removeold $collection
310 rm -rf collect/$collection/index
311 mv collect/$collection/building collect/$collection/index
312
313 echo
314 echo "*********************"
315 echo "Done rebuilding model collection: $collection"
316 echo "*********************"
317 echo
318}
319
320
321# http://stackoverflow.com/questions/16483119/example-of-how-to-use-getopt-in-bash
322function usage() {
323# usage() { echo "Usage: $0 [-s <45|90>] [-p <string>]" 1>&2; exit 1; }
324
325 echo "*******************************************"
326 echo "Usage: $0 [--svnupdate|--svndelete] [col1, col2, col3,...]";
327 echo "If no collections are provided, all collections will be processed.";
328 echo "If neither svnupdate nor svndelete are provided, svnupdate is assumed.";
329 echo "*******************************************"
330 exit 1;
331}
332
333
334#*******************************MAIN PROGRAM***************************
335
336# process optional command line arguments
337# http://blog.onetechnical.com/2012/07/16/bash-getopt-versus-getopts/
338# Execute getopt
339ARGS=$(getopt -o m:uxd -l "message:,svnupdate,svndelete,debug" -n "$0" -- "$@");
340
341#Bad arguments
342if [ $? -ne 0 ];then
343 usage
344 exit 1
345fi
346
347eval set -- "$ARGS";
348
349
350# -n: http://tldp.org/LDP/abs/html/testconstructs.html
351while true; do
352 case "$1" in
353 -x|--svndelete)
354 shift;
355 if [ "x$mode" == "xsvnupdate" ]; then
356 echo
357 echo "Can't use both svndelete and svnupdate"
358 usage
359 exit 1
360 else
361 mode=svndelete
362 fi
363 ;;
364 -u|--svnupdate)
365 shift;
366 if [ "x$mode" == "xsvndelete" ]; then
367 echo
368 echo "Can't use both svndelete and svnupdate"
369 usage
370 exit 1
371 else
372 mode=svnupdate
373 fi
374 ;;
375 -d|--debug)
376 shift;
377 debug_mode=1
378 ;;
379 -m|--message)
380 shift;
381 if [ -n "$1" ]; then
382 commit_message=$1
383 shift;
384 fi
385 ;;
386 --)
387 shift;
388 break;
389 ;;
390 esac
391done
392
393#echo "commit message: $commit_message"
394#echo "Debug mode is: $debug_mode"
395#exit
396
397# If no mode provided (svndelete|svnupdate) as cmd line arg, then don't modify
398# the svn model-collect folder. Then this script stops after rebuilding the model-copy in collect
399
400# the remaining arguments to the script are assumed to be collections
401
402# debugging
403#for collection in "$@"; do
404# collection=collect/$collection
405# echo "Collection: $collection"
406#done
407
408# finished processing arguments
409
410
411# report will contain the output of the diff for
412if [ -f report.txt ]; then
413 rm report.txt
414fi
415
416# Need pdfbox for the PDFBox tutorial
417if [ ! -e ext/pdf-box ]; then
418 cd ext
419 if [ ! -e ext/pdf-box-java.tar.gz ]; then
420 wget http://trac.greenstone.org/export/head/gs2-extensions/pdf-box/trunk/pdf-box-java.tar.gz
421 tar -xvzf pdf-box-java.tar.gz
422 fi
423 cd ..
424fi
425
426
427# move the existing collect folder out of the way
428if [ -e collect ] && [ ! -e collect_orig ] ; then
429 mv collect collect_orig
430fi
431
432
433# get model-collect from svn
434# if we already have it, svn update the entire model-collect folder if processing all collections
435# or svn update just any collections specified in the model-collect folder
436if test -e model-collect; then
437 if [ "$1" == "" ]; then
438 svn up model-collect
439 else
440 for collection in "$@"; do
441 svn up model-collect/$collection
442 done
443 fi
444else
445 svn co http://svn.greenstone.org/other-projects/nightly-tasks/diffcol/trunk/model-collect
446fi
447
448# Make a copy of the model-collect named as the new collect
449# (or if collections are specified in the cmdline arguments, copy just these over from model-collect into collect)
450# Then remove the copy's .svn folders
451echo "***********************************************"
452echo "Creating a copy of the model-collect folder as folder collect and removing the .svn subfolders from the copy:"
453echo
454if [ -e collect_orig ]; then
455 if [ ! -e collect ]; then
456 cp -r model-collect collect
457 find collect -name ".svn" -type d -exec rm -rf {} \; #2>&1 > /dev/null
458 else
459 if [ "$1" == "" ]; then
460 rm -rf collect
461 cp -r model-collect collect
462 find collect -name ".svn" -type d -exec rm -rf {} \;
463 else
464 for collection in "$@"; do
465 if [ -e collect/$collection ]; then
466 rm -rf collect/$collection
467 fi
468 cp -r model-collect/$collection collect/$collection
469 find collect/$collection -name ".svn" -type d -exec rm -rf {} \;
470 done
471 fi
472 fi
473fi
474echo "***********************************************"
475
476# Set up the Greenstone environment for building
477source setup.bash
478
479# parse arguments
480# http://stackoverflow.com/questions/12711786/bash-convert-command-line-arguments-into-array
481# http://stackoverflow.com/questions/255898/how-to-iterate-over-arguments-in-bash-script
482
483if [ "$1" == "" ]; then
484
485 # all_collections
486 #for each collection, import, build, move building to index
487 for collection in collect/*; do
488 build_single_collection $collection;
489
490 if [ "x$mode" != "x" ]; then
491 #svn_process_single_collection $collection
492
493 if [ "x$mode" == "xsvnupdate" ]; then
494 update_single_collection $collection
495 fi
496 fi
497 done
498
499 # having rebuilt all the collections, just the processing for svndelete/update remains:
500 if [ "x$mode" == "xsvndelete" ]; then
501 svn_delete
502 fi
503
504else
505 # Command-line args are a list of collections,
506 # process each command-line arg, after confirming such a collection exists
507
508 for collection in "$@"; do
509 collection=collect/$collection
510 if test -e $collection; then
511 build_single_collection $collection;
512
513 if [ "x$mode" != "x" ]; then
514 #svn_process_single_collection $collection
515
516 if [ "x$mode" == "xsvnupdate" ]; then
517 update_single_collection $collection
518 fi
519 fi
520 else
521 echo "Can't find collection $collection. Skipping."
522 fi
523 done
524
525 # having rebuilt the specified collections above, just the processing for svndelete/update remains
526 if [ "x$mode" == "xsvndelete" ]; then
527 svn_delete $@
528 fi
529
530fi
531
532
533echo
534echo "*****************************************"
535echo
536# NO LONGER NECESSARY: WE'RE DOING A DIFF BETWEEN collect AND model-collect AT THIS SCRIPT'S END
537# if we were svn updating/deleting collections, then mode was set
538# if in that case a report was generated with additional differences, point the user to it
539#if [ -f report.txt ] && [ "x$mode" != "x" ]; then
540# echo "Some files or folders outside of archives and index directories were different. See report.txt"
541# echo
542#fi
543
544# if not svnupdating or svndeleting, then inform the user that model-collect is unchanged
545# if svnupdating, then warn the user that model-collect still needs committing
546# if svndeleting, then inform the user that model-collect has been changed and committed
547if [ "x$mode" == "x" ]; then
548 echo "* The model-collect folder has not been altered. Changes have only been made to collect"
549elif [ "x$mode" == "xsvnupdate" ]; then
550 echo "* TO DO: You still need to run svn status and then svn commit on the model-collect folder. Besides that:"
551elif [ "x$mode" == "xsvndelete" ]; then
552 echo "* The model-collect folder's archives and index subfolders have been updated and committed to svn."
553fi
554echo
555
556if [ "x$mode" != "x" ]; then
557 echo "* DIFFERENCES REMAINING BETWEEN model-collect AND collect (skipping .svn folders):"
558 echo
559 if [ "$1" == "" ]; then
560 echo "---START DIFF---"
561 diff -rq model-collect collect | grep -v ".svn"
562 else
563 for collection in "$@"; do
564 echo "--COLLECTION: $collection"
565 diff -rq model-collect/$collection collect/$collection | grep -v ".svn"
566 echo "--"
567 done
568 fi
569 echo "---END DIFF---"
570 echo
571fi
572
573echo "* The original collect directory has been left renamed as collect_orig"
574echo
575
576if [ "$debug_mode" -eq "1" ]; then
577 echo "* This script was run in DEBUG MODE, nothing has been changed in svn"
578fi
579echo
580echo "*****************************************"
581echo
582
583
584# deletes empty dirs
585# find collect/$collection/archives/HASH* -type d -empty -delete
586# find collect/$collection/index/assoc/HASH* -type d -empty -delete
587
588# To recursively delete all empty dirs in the copy of model-collect (since the dirs will not have .svn folders in them anymore)
589# http://www.commandlinefu.com/commands/view/5131/recursively-remove-all-empty-directories
590#find collect -type d -empty -delete
591
592# The following when put in a separate script file will delete all folders from model-collect that are
593# empty in the copied collection (all folders which contain only a .svn subfolder in model-collect)
594# ---------------------------------------------
595#!/bin/bash
596
597#for collection in collect/*; do
598 #escape the filename (in case of space)
599# collection=`echo $collection | sed 's@ @\\\ @g'`
600
601 #get just the basename
602# collection=`basename $collection`
603
604 # HASH dirs that are empty in local collect's archives and index/assoc,
605 # need to be removed from the svn in model-collect
606
607# for line in `find collect/$collection/archives/HASH* -type d -empty`; do
608# modelline="model-$line"
609# echo "LINE: $modelline"
610
611 # remove from svn of model collect
612# svn rm $modelline
613## rm -rf $modelline
614 # remove physically from local collect
615# rm -rf $line
616# done
617
618# for line in `find collect/$collection/index/assoc/HASH* -type d -empty`; do
619# modelline="model-$line"
620# echo "LINE: $modelline"
621
622 # remove from svn of model collect
623# svn rm $modelline
624## rm -rf $modelline
625 # remove physically from local collect
626# rm -rf $line
627# done
628
629#done
630# ---------------------------------------------
Note: See TracBrowser for help on using the repository browser.