source: main/trunk/model-sites-dev/twso/README.txt@ 34297

Last change on this file since 34297 was 34297, checked in by kjdon, 4 years ago

added some more useful instructions

File size: 8.4 KB
Line 
1Updated 30 June 2020.
2
3TWSO - runs on commdev, in /greenstone/greenstone3/web/sites/twso. All mods contained in the site.
4
5Setting up TWSO
6###################
7
8In a recent greenstone 3. In sites folder, checkout twso:
9svn co http://svn.greenstone.org/main/trunk/model-sites-dev/twso
10
11cd into the collection:
12cd twso/collect/twso
13
14Populate the import folder by copying the contents of import/Programmes from either the existing collection on commdev (/greenstone/greenstone3/web/sites/twso/collect/twso) or from storage on /nzdl-storage/TWSO-Backup/twso-site/collect/twso)
15
16Populate the videos folder from either of the same two places.
17
18Updating TWSO collection
19##########################
20
21I have Greenstone 3 installed locally, with twso site and collection.
22
23Ian sends PDF files of the programmes, plus text/word doc for the list of players, and the pieces metadata.
24
25Players list: Should be in the following form - edit it if not. A text file.
26
27conductor Lastname, Firstname
28soloist Lastname, Firstname
29Lastname, Firstname (all the players listed here)
30Lastname, Firstname
31....
32
33(If you don't get this text file, see instructions at the end of this file for how to generate it.)
34
35Once you have the eg Aug_2019_name_list.txt file, then you need to check names against the master_name_list, to make sure that we use consistent spelling and notation across all programmes.
36
37The easiest way to do this:
38
39Make a backup of the master_name_list.txt - copy to master_name_list.backup
40
41Run
42
43python masternamecreator2.py Aug_2019_name_list.txt
44
45This will add any new names from Aug_2019_name_list into the master_name_list.
46
47Now, do a diff:
48diff master_name_list.txt master_name_list.backup
49
50The differences will be any new names. Check the master for these new names and make sure that there are not alternate spellings. Also, some names have maiden names.
51eg Oliver, Bev and Oliver (Formerly Nation), Beverly
52
53NOTE: master_list_notes.txt has some info about people and different spellings etc. Look through this first before you check the differences - helps you to know what to do when you find some.
54
55If there is a new variant which is wrong, remove the new variant from the master list, and change the name in the programme name list.
56
57Once you have all the names listed correctly, then generate the metadata.xml file:
58
59python metadatacreator.py Aug_2019_name_list.txt Aug_2019
60The last argument is the name of the PDF file, without the .pdf file extension.
61
62This will create a metadata.xml file called Aug_2019-metadata.xml.
63
64It will list all the players, plus conductor and soloist if these were included in the name_list file.
65
66Open up this metadata file and add the extra metadata:
67Copy and paste a list of empty elements from metadata-skeleton.xml, to save on typing.
68
69The first three may have been done for you depending on how much of the programme text was in the text file.
70pd.Player - format LastName, FirstName. or LastName (nee MaidenName), FirstName. or LastName (formerly PreviousName), FirstName (for change of name that is not due to marriage).
71pd.Soloist (same format) include orchestral and vocal soloists, narrators (but not MC)
72pd.Conductor (same format as player name)
73
74pd.Location format Location 1 & Location 2 & Location 3...
75pd.Date format yyyymmdd. Add multiple dates separately
76pd.formatDate if there are multiple dates, then add this, format like 21/22 November 2014
77pd.Composer - surname only, unless there are composers with same surname.
78pd.Piece - format: composer - piece title, opus number
79pd.Title - concert title
80pd.SubTitle - if concert has a subtitle
81pd.CoPerformer - if other groups are part of the concert. eg Cantando Choir.
82
83If the composer has done an arrangement, for Piece put
84Composer (Arranged) - Piece name
85Or if both composer and arranger are listed, put eg
86"Narro arr. Isaac" for both composer and piece.
87
88MCs are not added.
89
90Add these two files (pdf and metadata.xml) to the import/Programmes/year folder, Create the year if it is a new year.
91 Add into the collection using
92incremental-rebuild.pl -site twso twso.
93Or you can rebuild the entire collection using
94full-rebuild.pl -site twso twso
95
96
97Notes:
98* All the scripts are doing is trying to identify player names. If you can't get these to work properly, you can just manually create the metadata.xml file from scratch, and add all the players in by hand.
99* the collection uses unknownPLugin to import the PDF files, so no conversion is done. Therefore doesn't take very long to do a full rebuild.
100
101
102Uploading to commdev.
103##########################
104
105 * ssh to commdev
106
107ssh commdev.nzdl.org
108
109 * sudo to nzdl-gs3 user.
110
111sudo su - nzdl-gs3
112
113 * update import folder:
114
115cd /greenstone/greenstone3/web/sites/twso/collect/twso/import/Programmes
116
117rsync -pavHt [email protected]:/Scratch/kjdon/gs3-pei-jones-plus-twso/web/sites/twso/collect/twso/import/Programmes/ .
118(use appropriate user and paths)
119
120 * update index
121
122delete old backup index (eg index.jun2016).
123rename current index to a backup, eg index.may2018
124
125rysnc the new one
126
127rsync -pavHt [email protected]:/Scratch/kjdon/gs3-pei-jones-plus-twso/web/sites/twso/collect/twso/index .
128
129 * restart tomcat
130
131logout of nzdl-gs3 user. As yourself:
132
133restart tomcat:
134sudo systemctl restart greenstone3
135
136 * backup the new import files
137
138Back as nzdl-gs3 user, backup the new import files to /nzdl-storage/TWSO-Backup/twso-site/collect/twso/import
139
140If you need to go back and change all occurrences of a name
141###########################################################
142Sometimes it turns out that the master name variant you have been using is no longer correct - eg if someone changes their name, so now all the old occurences need to be changed to X (formerly Y), etc.
143
144You can make a backup of import folder first if you like (in case you muck up the commands).
145cp -r import import.save
146
147* cd into Programmes folder:
148cd import/Programmes
149
150* find all the metadata files:
151find . -name *metadata.xml
152
153* test with a grep
154find . -name *metadata.xml -exec grep Garcia-Gil {} \;
155 - this should list all the lines that you want changing
156
157* do the replacement
158find . -name *metadata.xml -exec sed -i 's/Garcia-Gil/Garcia Gil/g' {} \;
159
160* list all the tilda files
161find . -name *metadata.xml~
162
163* then remove them
164find . -name *metadata.xml~ -exec rm {} \;
165
166* check that you have only changed the bits you wanted
167cd ../../
168diff -r import import.save
169
170*****************************
171Legacy instructions:
172*****************************
173
174Extracting the name list
175###########################
176
177If you don't get the players list in the right form, here's how to extract them:
178
179Generate a text file of the PDF. This is used to extract player names. The easiest way is to copy and paste the list of players from the programme into a text file.
180
181If the PDF is old, and you can't cut and paste: then you will either need to type the players names out manually, or you can OCR the file and copy out the list of players.
182
183 Scan the file using Abbyy Fine Reader on Katherine's laptop. PDF -> Word.
184 Save the Word file, then in Word do a SaveAs plain text, unicode encoding.
185 Edit the text file so that just the section with players names is left.
186 Or, you can open up the word doc, and copy and paste the list of players section into a text file.
187
188You may need to modify the formatting.
189It should look like
190Instrument
191player
192player
193
194Instrument
195player
196player
197
198etc.
199
200Notes on format:
201* there must be a blank line between each section of instrument + players
202* the actual instrument doesn't matter, as long as it is recognised as an instrument so that the list of names gets added correctly. In fact, if you are having trouble with the formatting, just put all under a single instrument name. This actually makes the manual part of namefinder processing below much easier.
203* it won't like things like harp / keyboard - just change to a single instrument
204* if you get a new instrument name/format, you can add it into the roles file for next time.
205
206Once you have this text version of the players, in the OCRPrograms folder, run:
207
208python namefinder1.py nameoftxtfile
209
210This will prompt you to add names.
211 * y = yes,
212 * e = edit, if you need to change the format,
213 * space = don't add + move to the next name. (Don't use n = no as it will ignore any more in that section). Have the pdf programme open as you go so you can check off names.
214Keep an eye out for missed sections, eg if it doesn't recognise the instrument.
215And names with more than two words will get processed wrongly.
216
217The output is _name_list.txt. Rename to match the input pdf file. eg Aug_2019_name_list.txt
Note: See TracBrowser for help on using the repository browser.