source: main/trunk/model-sites-dev/twso/README.txt@ 34229

Last change on this file since 34229 was 34229, checked in by kjdon, 4 years ago

more notes

File size: 7.2 KB
Line 
1Updated 30 June 2020.
2
3TWSO - runs on commdev, in /greenstone/greenstone3/web/sites/twso. All mods contained in the site.
4
5Setting up TWSO
6###################
7
8In a recent greenstone 3. In sites folder, checkout twso:
9svn co http://svn.greenstone.org/main/trunk/model-sites-dev/twso
10
11cd into the collection.
12cd twso/collect/twso
13
14Populate the import folder by copying the contents of import/Programmes from either the existing collection on commdev (/greenstone/greenstone3/web/sites/twso/collect/twso) or from storage on /nzdl-storage/TWSO-Backup/twso-site/collect/twso)
15
16Populate the videos folder from either of the same two places.
17
18Updating TWSO collection
19##########################
20
21I have Greenstone 3 installed locally, with twso site and collection.
22
23Ian sends PDF files of the programmes, plus text/word doc for the list of players, and the pieces metadata.
24
25Players list: Should be in the following form - edit it if not. A text file.
26
27conductor Lastname, Firstname
28soloist Lastname, Firstname
29Lastname, Firstname (all the players listed here)
30Lastname, Firstname
31....
32
33(If you don't get this text file, see instructions at the end of this file for how to generate it.)
34
35Once you have the eg Aug_2019_name_list.txt file, then you need to check names against the master_name_list, to make sure that we use consistent spelling and notation across all programmes.
36
37The easiest way to do this:
38
39Make a backup of the master_name_list.txt - copy to master_name_list.backup
40
41Run
42
43python masternamecreator2.py Aug_2019_name_list.txt
44
45This will add any new names from Aug_2019_name_list into the master_name_list.
46
47Now, do a diff:
48diff master_name_list.txt master_name_list.backup
49
50The differences will be any new names. Check the master for these new names and make sure that there are not alternate spellings. Also, some names have maiden names.
51eg Oliver, Bev and Oliver (Formerly Nation), Beverly
52NOTE: master_list_notes.txt has some info about people and different spellings etc. Look through this first before you check the differences - helps you to know what to do when you find some.
53
54If there is a new variant which is wrong, remove the new variant from the master list, and change the name in the programme name list.
55
56Once you have all the names listed correctly, then generate the metadata.xml file:
57
58python metadatacreator.py Aug_2019_name_list.txt Aug_2019
59The last argument is the name of the PDF file, without the .pdf file extension.
60
61This will create a metadata.xml file called Aug_2019-metadata.xml.
62
63It will list all the players, plus conductor and soloist if these were included in the name_list file.
64
65Open up this metadata file and add the extra metadata:
66Copy and paste a list of empty elements from metadata-skeleton.xml, to save on typing.
67
68The first three may have been done for you depending on how much of the programme text was in the text file.
69pd.Player - format LastName, FirstName. or LastName (nee MaidenName), FirstName. or LastName (formerly PreviousName), FirstName (for change of name that is not due to marriage).
70pd.Soloist (same format) include orchestral and vocal soloists, narrators (but not MC)
71pd.Conductor (same format as player name)
72
73pd.Location format Location 1 & Location 2 & Location 3...
74pd.Date format yyyymmdd. Add multiple dates separately
75pd.formatDate if there are multiple dates, then add this, format like 21/22 November 2014
76pd.Composer - surname only, unless there are composers with same surname.
77pd.Piece - format: composer - piece title, opus number
78pd.Title - concert title
79pd.SubTitle - if concert has a subtitle
80pd.CoPerformer - if other groups are part of the concert. eg Cantando Choir.
81
82If the composer has done an arrangement, for Piece put
83Composer (Arranged) - Piece name
84Or if both composer and arranger are listed, put eg
85"Narro arr. Isaac" for both composer and piece.
86
87MCs are not added.
88
89Add these two files (pdf and metadata.xml) to the import/Programmes/year folder, Create the year if it is a new year.
90 Add into the collection using
91incremental-rebuild.pl -site twso twso.
92Or you can rebuild the entire collection using
93full-rebuild.pl -site twso twso
94
95
96Notes:
97* All the scripts are doing is trying to identify player names. If you can't get these to work properly, you can just manually create the metadata.xml file from scratch, and add all the players in by hand.
98* the collection uses unknownPLugin to import the PDF files, so no conversion is done. Therefore doesn't take very long to do a full rebuild.
99
100
101Uploading to commdev.
102##########################
103
104On commdev, sudo to nzdl-gs3 user.
105
106 * update import folder:
107
108cd /greenstone/greenstone3/web/sites/twso/collect/twso/import/Programmes
109
110rsync -pavHt [email protected]:/Scratch/kjdon/gs3-pei-jones-plus-twso/web/sites/twso/collect/twso/import/Programmes/ .
111(use appropriate user and paths)
112
113delete old backup index (eg index.jun2016).
114rename current index to a backup, eg index.may2018
115
116rysnc the new one
117
118rsync -pavHt [email protected]:/Scratch/kjdon/gs3-pei-jones-plus-twso/web/sites/twso/collect/twso/index .
119
120logout of nzdl-gs3 user. As yourself:
121
122restart tomcat:
123sudo systemctl restart greenstone3
124
125TODO: back up import to /nzdl-storage where???
126
127***************************
128Other instructions:
129
130Extracting the name list
131###########################
132
133If you don't get the players list in the right form, here's how to extract them:
134
135Generate a text file of the PDF. This is used to extract player names. The easiest way is to copy and paste the list of players from the programme into a text file.
136
137If the PDF is old, and you can't cut and paste: then you will either need to type the players names out manually, or you can OCR the file and copy out the list of players.
138
139 Scan the file using Abbyy Fine Reader on Katherine's laptop. PDF -> Word.
140 Save the Word file, then in Word do a SaveAs plain text, unicode encoding.
141 Edit the text file so that just the section with players names is left.
142 Or, you can open up the word doc, and copy and paste the list of players section into a text file.
143
144You may need to modify the formatting.
145It should look like
146Instrument
147player
148player
149
150Instrument
151player
152player
153
154etc.
155
156Notes on format:
157* there must be a blank line between each section of instrument + players
158* the actual instrument doesn't matter, as long as it is recognised as an instrument so that the list of names gets added correctly. In fact, if you are having trouble with the formatting, just put all under a single instrument name. This actually makes the manual part of namefinder processing below much easier.
159* it won't like things like harp / keyboard - just change to a single instrument
160* if you get a new instrument name/format, you can add it into the roles file for next time.
161
162Once you have this text version of the players, in the OCRPrograms folder, run:
163
164python namefinder1.py nameoftxtfile
165
166This will prompt you to add names.
167 * y = yes,
168 * e = edit, if you need to change the format,
169 * space = don't add + move to the next name. (Don't use n = no as it will ignore any more in that section). Have the pdf programme open as you go so you can check off names.
170Keep an eye out for missed sections, eg if it doesn't recognise the instrument.
171And names with more than two words will get processed wrongly.
172
173The output is _name_list.txt. Rename to match the input pdf file. eg Aug_2019_name_list.txt
Note: See TracBrowser for help on using the repository browser.