source: other-projects/nightly-tasks/diffcol/trunk/gen-model-colls.sh@ 28049

Last change on this file since 28049 was 28049, checked in by ak19, 11 years ago
  1. Running a diff between model-collect (svn) and rebuilt collect at the end of regenerating the model collections to show the differences. 2. Minor fixes. 3. Better display of what's going on. 4. Updated comments. May still want a debug/testing mode in the future to see what's going to happen without actually committing. Useful for when you add a new tutorial for building but don't want to commit it until you've tested that diffcol runs perfectly on it.
File size: 19.6 KB
Line 
1#!/bin/bash
2
3# PURPOSE
4# This is not a nightly script. You use it to regenerate the model-collections
5# if Greenstone has changed fundamentally, such as what HASH OIDs get assigned
6# to documents or something that changes the contents of the index and
7# archives folders. This has happened now with the commits
8# http://trac.greenstone.org/changeset/28022 and
9# http://trac.greenstone.org/changeset/28021
10# These commits generate new stable HASH OIDs for the existing documents.
11
12
13# USAGE
14# Put this file in the toplevel of the Greenstone 2 binary/compiled SVN installation
15# that you want to generate the model collections with.
16# You can provide a list of collection names or none, in which case all the collections
17# are processed.
18
19# Pass in --svnupdate to copy across the contents of archives and index in the
20# rebuilt collection, overwriting their equivalents in the svn model collection,
21# but not removing any extraneous HASH folders already present.
22# !!!!! IMPORTANT: if you pass in svnupdate, it leaves you to do the final commit on
23# the (svn) model-collect folder!
24
25# Pass in --svndelete to remove the archives and index from svn in the model-collect
26# and replace this with the rebuilt archives and index
27# The --svndelete is useful for when the HASH directory naming has changed and everything
28# in archives and index has to be wiped out and moved back in from the rebuilt col.
29# Passing in --svndelete will do the final commits on the model-collect folder.
30
31# If neither flag is passed in, then the collections are rebuilt but the svn model-collect
32# is not updated and the repository is not updated.
33
34# Examples of usage:
35# ./gen-model-colls.sh
36# ./gen-model-colls.sh --svndelete
37# ./gen-model-colls.sh --svnupdate Tudor-Basic Tudor-Enhanced
38
39# The first just rebuilds all the collections in a new folder called collect and stops there
40
41# The second rebuilds all the collections in collect and svn removes the archives and the index
42# folders in model-collect. Then it copies across the rebuilt archives and index into model-collect
43# and svn adds them.
44
45# The third example checks out all the model-collections again, but rebuilds only the 2 collections
46# specified in the new collect folder. Then it copies across the *contents* of the archives and
47# index folders of those 2 collections into their model-collect equivalents. You then still have to
48# do the final svn commit on the model-collect folder after looking over the differences.
49
50# Also valid examples:
51# ./gen-model-colls.sh Tudor-Basic Tudor-Enhanced
52# ./gen-model-colls.sh --svndelete Tudor-Basic Tudor-Enhanced
53# ./gen-model-colls.sh --svnupdate
54
55# PSEUDOCODE
56# This script:
57# Checks out the model-collections folder from SVN
58# Makes a copy
59# In the copy: gets rid of their .svn folders, and builds each collection in turn, moving building to index once done
60# If --svndelete was passed in: svn removes model-collect/archives and model-collect/index, copies over collect/index
61# and collect/archives into model-collect and svn adds model-collect/archives and model-collect/index. Then SVN COMMITS
62# model-collect/archives and model-collect/index.
63# If --svnupdate was passed in: copies collect/archives/* into model-collect/archives/*, and copies collect/index/*
64# into model-collect/index/*, overwriting files that already existed but have now been updated upon rebuild. However,
65# --svnupdate will leave untouched any files and folders unique to model-collect. No SVN commit, that's LEFT UP TO YOU.
66
67# See earlier version of this script:
68# To svn remove what's unique to model-collect and svn add what's been rebuilt in index and archives
69# see http://stackoverflow.com/questions/7502261/delete-folder-content-and-remove-from-version-control
70
71# http://stackoverflow.com/questions/5044214/how-do-i-detect-and-or-delete-empty-subversion-directories
72# http://stackoverflow.com/questions/1301203/removing-svn-files-from-all-directories
73
74
75# DON'T ADD ANY FURTHER ECHO STATEMENTS IN FUNCTION get_col_basename
76# "you have to be really careful on what you have in this function, as having any code which will eventually echo will mean that you get incorrect return string."
77# see http://stackoverflow.com/questions/3236871/how-to-return-a-string-value-from-a-bash-function
78function get_col_basename () {
79 collection=$1
80
81 #escape the filename (in case of space)
82 collection=`echo $collection | sed 's@ @\\\ @g'`
83
84 #get just the basename
85 collection=`basename $collection`
86
87 # returning a string does not work in bash
88 # see http://stackoverflow.com/questions/3236871/how-to-return-a-string-value-from-a-bash-function
89
90 #return $collection
91 echo $collection
92}
93
94
95# Function that handles the --svndelete flag (mode) of this script for a single collection
96function svn_delete () {
97
98 # svn remove archives and index in each collection
99 # commit them all
100 # copy over newly rebuilt archives and index into each model-collection
101 # svn add the new archives and index folders of each collection
102 # commit them all
103
104
105 if [ "x$1" == "x" ]; then
106 for collection in collect/*; do
107 _del_col_archives_index $collection
108 done
109 else
110 for collection in "$@"; do
111 _del_col_archives_index $collection
112 done
113 fi
114
115 # commit all the svn rm statements done above in one go:
116 # don't do `svn up` here, as this will then retrieve all the folders that were svn-removed
117 svn commit -m "AUTOCOMMIT by gen-model-colls.sh script. Clean rebuild of model collections 1/2. Clearing out deprecated archives and index." model-collect
118
119 # do an svn up to locally delete what was svn-removed above, BEFORE copying from the rebuilt archives and index folders
120 svn up model-collect
121
122 if [ "x$1" == "x" ]; then
123 for collection in collect/*; do
124 _add_col_archives_index $collection
125 done
126 else
127 for collection in "$@"; do
128 _add_col_archives_index $collection
129 done
130 fi
131
132 # commit all the svn add statements done just above in one go
133 svn commit -m "AUTOCOMMIT by gen-model-colls.sh script. Clean rebuild of model collections 2/2. Adding rebuilt archives and index." model-collect
134
135 echo
136 echo "*********************"
137 echo "Done svn-deleting rebuilt model-collection: $collection"
138 echo "*********************"
139 echo
140}
141
142# To undo the changes made by svndelete, run the following manually
143# svn revert --depth infinity <model-collect/$collection/archives/*
144# svn revert --depth infinity <model-collect/$collection/archives/*
145# then remove both the local archives and index, and do an svn up to get original checkout back
146
147# svn delete this collection's archives and index folders
148# (The commit will be done when in one step for all collections on which this function was called)
149function _del_col_archives_index () {
150 collection=$1
151
152 #escape the filename (in case of space) and get just the basename
153 collection=$(get_col_basename $collection)
154
155 if [ ! -e model-collect/$collection ]; then
156 echo "del_col_archives_index: $collection does not exist in model-collect, will svn add this new collection shortly"
157 return;
158 fi
159
160 # remove the entire archives and index folders from svn
161 svn rm model-collect/$collection/archives
162 svn rm model-collect/$collection/index
163
164# for TESTING purposes:
165# rm -rf model-collect/$collection/archives
166# rm -rf model-collect/$collection/index
167
168}
169
170
171# copy and then svn add the collection's archives and index folders
172function _add_col_archives_index () {
173 collection=$1
174
175 #escape the filename (in case of space) and get just the basename
176 collection=$(get_col_basename $collection)
177
178 if [ ! -e model-collect/$collection ]; then
179 echo "add_col_archives_index: Adding the new collection $collection to SVN"
180 return;
181 fi
182
183 # remove the entire archives and index folders from svn
184 cp -r collect/$collection/archives model-collect/$collection/.
185 cp -r collect/$collection/index model-collect/$collection/.
186
187 svn add model-collect/$collection/archives
188 svn add model-collect/$collection/index
189}
190
191
192# UNUSED, but useful for spotting differences between the collect and model-collect
193# after rebuild, before svn updating/deleting, as opposed to at the end of the script
194function svn_process_single_collection () {
195 collection=$1
196
197 #escape the filename (in case of space) and get just the basename
198 collection=$(get_col_basename $collection)
199
200 if [ ! -e model-collect/$collection ]; then
201 echo "update_single_collection: Adding new collection $collection to SVN"
202 return;
203 fi
204
205# return here if just deleting empty dirs
206#return
207
208 # diff the svn model and rebuilt model collections
209 diff_result=`diff -rq model-collect/$collection collect/$collection | grep -v ".svn"`
210# echo "Diff result for collection $collection: $diff_result"
211
212 # if no differences in the current collection, then we're done
213 if [ "x$diff_result" == "x" ]; then
214 echo "No differences in collection $collection"
215 return;
216 fi
217
218 # check that none of the lines mention files outside the archives or index folders
219 # http://en.gibney.org/tell_the_bash_to_split_by_newline_charac
220 # http://forums.gentoo.org/viewtopic-p-3130541.html
221
222 # http://wi-fizzle.com/article/276
223 # http://stackoverflow.com/questions/918886/how-do-i-split-a-string-on-a-delimiter-in-bash
224 # http://www.linuxquestions.org/questions/programming-9/split-a-string-on-newlines-bash-313206/
225 # http://unix.stackexchange.com/questions/39473/command-substitution-splitting-on-newline-but-not-space
226
227 # store backup of Internal Field Separator value, then set IFS to newline for splitting on newline
228
229 IFS_BAK=$IFS
230# IFS='\n' # splits on all whitespace
231IFS='
232'
233 # in the lines returned from the diff, test for archives or newline
234 # http://stackoverflow.com/questions/229551/string-contains-in-bash
235 for line in `diff -rq model-collect/$collection collect/$collection | grep -v ".svn"`; do
236 # echo "LINE: $line"
237 if [[ "$line" != *archives* && "$line" != *index* ]]; then
238 # the file that is different is neither in index nor in archives, send this diffline to the report
239 echo $line >> report.txt
240 fi
241 done
242
243 IFS=$IFS_BAK
244 IFS_BAK=
245}
246
247# Function that takes care of the --svnupdate flag mode of this script for a single collection
248function update_single_collection () {
249
250 # copy across the contents of the rebuilt model-collection's index and archives to the svn model-collect
251 cp -r collect/$collection/archives/* model-collect/$collection/archives/.
252 cp -r collect/$collection/index/* model-collect/$collection/index/.
253
254 echo "svn model-collect update process complete. CHECK AND COMMIT THE model-collect FOLDER!"
255
256 # if etc/collect.cfg is different, copy it across too?
257
258 echo
259 echo "*********************"
260 echo "Done updating the rebuilt LOCAL model-collection: model-collect/$collection"
261 echo "*********************"
262 echo
263}
264
265
266# re-build a single collection in "collect" which is a copy of model-collect
267function build_single_collection () {
268 collection=$1
269
270 collection=$(get_col_basename $collection)
271
272 import.pl -removeold $collection
273 buildcol.pl -removeold $collection
274 rm -rf collect/$collection/index
275 mv collect/$collection/building collect/$collection/index
276
277 echo
278 echo "*********************"
279 echo "Done rebuilding model collection: $collection"
280 echo "*********************"
281 echo
282}
283
284
285# http://stackoverflow.com/questions/16483119/example-of-how-to-use-getopt-in-bash
286function usage() {
287# usage() { echo "Usage: $0 [-s <45|90>] [-p <string>]" 1>&2; exit 1; }
288
289 echo "*******************************************"
290 echo "Usage: $0 [--svnupdate|--svndelete] [col1, col2, col3,...]";
291 echo "If no collections are provided, all collections will be processed.";
292 echo "If neither svnupdate nor svndelete are provided, svnupdate is assumed.";
293 echo "*******************************************"
294 exit 1;
295}
296
297
298# The program starts here
299
300# process optional command line arguments
301# http://blog.onetechnical.com/2012/07/16/bash-getopt-versus-getopts/
302# Execute getopt
303ARGS=$(getopt -o ud -l "svnupdate,svndelete" -n "$0" -- "$@");
304
305#Bad arguments
306if [ $? -ne 0 ];then
307 usage
308 exit 1
309fi
310
311eval set -- "$ARGS";
312
313# mode can be svndelete or svnupdate
314mode=
315
316# -n: http://tldp.org/LDP/abs/html/testconstructs.html
317while true; do
318 case "$1" in
319 -d|--svndelete)
320 shift;
321 if [ "x$mode" == "xsvnupdate" ]; then
322 echo
323 echo "Can't use both svndelete and svnupdate"
324 usage
325 exit 1
326 else
327 mode=svndelete
328 fi
329 ;;
330 -u|--svnupdate)
331 shift;
332 if [ "x$mode" == "xsvndelete" ]; then
333 echo
334 echo "Can't use both svndelete and svnupdate"
335 usage
336 exit 1
337 else
338 mode=svnupdate
339 fi
340 ;;
341 --)
342 shift;
343 break;
344 ;;
345 esac
346done
347
348# If no mode provided (svndelete|svnupdate) as cmd line arg, then don't modify
349# the svn model-collect folder. Then this script stops after rebuilding the model-copy in collect
350
351# the remaining arguments to the script are assumed to be collections
352
353# debugging
354#for collection in "$@"; do
355# collection=collect/$collection
356# echo "Collection: $collection"
357#done
358
359# finished processing arguments
360
361
362# report will contain the output of the diff for
363if [ -f report.txt ]; then
364 rm report.txt
365fi
366
367# Need pdfbox for the PDFBox tutorial
368if [ ! -e ext/pdf-box ]; then
369 cd ext
370 if [ ! -e ext/pdf-box-java.tar.gz ]; then
371 wget http://trac.greenstone.org/export/head/gs2-extensions/pdf-box/trunk/pdf-box-java.tar.gz
372 tar -xvzf pdf-box-java.tar.gz
373 fi
374 cd ..
375fi
376
377
378# move the existing collect folder out of the way
379if [ -e collect ] && [ ! -e collect_orig ] ; then
380 mv collect collect_orig
381fi
382
383
384# get model-collect from svn
385# if we already have it, svn update the entire model-collect folder if processing all collections
386# or svn update just any collections specified in the model-collect folder
387if test -e model-collect; then
388 if [ "$1" == "" ]; then
389 svn up model-collect
390 else
391 for collection in "$@"; do
392 if [ -e model-collect/$collection ]; then
393 svn up model-collect/$collection
394 else
395 svn up model-collect
396 fi
397 done
398 fi
399else
400 svn co http://svn.greenstone.org/other-projects/nightly-tasks/diffcol/trunk/model-collect
401fi
402
403# Make a copy of the model-collect named as the new collect
404# (or if collections are specified in the cmdline arguments, copy just these over from model-collect into collect)
405# Then remove the copy's .svn folders
406echo "***********************************************"
407echo "Creating a copy of the model-collect folder as folder collect and removing the .svn subfolders from the copy:"
408echo
409if [ -e collect_orig ]; then
410 if [ ! -e collect ]; then
411 cp -r model-collect collect
412 find collect -name ".svn" -type d -exec rm -rf {} \; #2>&1 > /dev/null
413 else
414 if [ "$1" == "" ]; then
415 rm -rf collect
416 cp -r model-collect collect
417 find collect -name ".svn" -type d -exec rm -rf {} \;
418 else
419 for collection in "$@"; do
420 if [ -e collect/$collection ]; then
421 rm -rf collect/$collection
422 fi
423 cp -r model-collect/$collection collect/$collection
424 find collect/$collection -name ".svn" -type d -exec rm -rf {} \;
425 done
426 fi
427 fi
428fi
429echo "***********************************************"
430
431# Set up the Greenstone environment for building
432source setup.bash
433
434# parse arguments
435# http://stackoverflow.com/questions/12711786/bash-convert-command-line-arguments-into-array
436# http://stackoverflow.com/questions/255898/how-to-iterate-over-arguments-in-bash-script
437
438if [ "$1" == "" ]; then
439
440 # all_collections
441 #for each collection, import, build, move building to index
442 for collection in collect/*; do
443 build_single_collection $collection;
444
445 if [ "x$mode" != "x" ]; then
446 #svn_process_single_collection $collection
447
448 if [ "x$mode" == "xsvnupdate" ]; then
449 update_single_collection $collection
450 fi
451 fi
452 done
453
454 # having rebuilt all the collections, just the processing for svndelete/update remains:
455 if [ "x$mode" == "xsvndelete" ]; then
456 svn_delete
457 fi
458
459else
460 # Command-line args are a list of collections,
461 # process each command-line arg, after confirming such a collection exists
462
463 for collection in "$@"; do
464 collection=collect/$collection
465 if test -e $collection; then
466 build_single_collection $collection;
467
468 if [ "x$mode" != "x" ]; then
469 #svn_process_single_collection $collection
470
471 if [ "x$mode" == "xsvnupdate" ]; then
472 update_single_collection $collection
473 fi
474 fi
475 else
476 echo "Can't find collection $collection. Skipping."
477 fi
478 done
479
480 # having rebuilt the specified collections above, just the processing for svndelete/update remains
481 if [ "x$mode" == "xsvndelete" ]; then
482 svn_delete $@
483 fi
484
485fi
486
487
488echo
489echo "*****************************************"
490echo
491# NO LONGER NECESSARY: WE'RE DOING A DIFF BETWEEN collect AND model-collect AT THIS SCRIPT'S END
492# if we were svn updating/deleting collections, then mode was set
493# if in that case a report was generated with additional differences, point the user to it
494#if [ -f report.txt ] && [ "x$mode" != "x" ]; then
495# echo "Some files or folders outside of archives and index directories were different. See report.txt"
496# echo
497#fi
498
499# if not svnupdating or svndeleting, then inform the user that model-collect is unchanged
500# if svnupdating, then warn the user that model-collect still needs committing
501# if svndeleting, then inform the user that model-collect has been changed and committed
502if [ "x$mode" == "x" ]; then
503 echo "* The model-collect folder has not been altered. Changes have only been made to collect"
504elif [ "x$mode" == "xsvnupdate" ]; then
505 echo "* TO DO: You still need to run svn status and then svn commit on the model-collect folder. Besides that:"
506elif [ "x$mode" == "xsvndelete" ]; then
507 echo "* The model-collect folder's archives and index subfolders have been updated and committed to svn."
508fi
509echo
510
511if [ "x$mode" != "x" ]; then
512 echo "* DIFFERENCES REMAINING BETWEEN model-collect AND collect (skipping .svn folders):"
513 echo
514 echo "---START DIFF---"
515 diff -rq model-collect collect | grep -v ".svn"
516 echo "---END DIFF---"
517 echo
518fi
519
520echo "* The original collect directory has been left renamed as collect_orig"
521echo
522echo "*****************************************"
523echo
524
525
526# deletes empty dirs
527# find collect/$collection/archives/HASH* -type d -empty -delete
528# find collect/$collection/index/assoc/HASH* -type d -empty -delete
529
530# To recursively delete all empty dirs in the copy of model-collect (since the dirs will not have .svn folders in them anymore)
531# http://www.commandlinefu.com/commands/view/5131/recursively-remove-all-empty-directories
532#find collect -type d -empty -delete
533
534# The following when put in a separate script file will delete all folders from model-collect that are
535# empty in the copied collection (all folders which contain only a .svn subfolder in model-collect)
536# ---------------------------------------------
537#!/bin/bash
538
539#for collection in collect/*; do
540 #escape the filename (in case of space)
541# collection=`echo $collection | sed 's@ @\\\ @g'`
542
543 #get just the basename
544# collection=`basename $collection`
545
546 # HASH dirs that are empty in local collect's archives and index/assoc,
547 # need to be removed from the svn in model-collect
548
549# for line in `find collect/$collection/archives/HASH* -type d -empty`; do
550# modelline="model-$line"
551# echo "LINE: $modelline"
552
553 # remove from svn of model collect
554# svn rm $modelline
555## rm -rf $modelline
556 # remove physically from local collect
557# rm -rf $line
558# done
559
560# for line in `find collect/$collection/index/assoc/HASH* -type d -empty`; do
561# modelline="model-$line"
562# echo "LINE: $modelline"
563
564 # remove from svn of model collect
565# svn rm $modelline
566## rm -rf $modelline
567 # remove physically from local collect
568# rm -rf $line
569# done
570
571#done
572# ---------------------------------------------
Note: See TracBrowser for help on using the repository browser.