source: main/trunk/model-sites-dev/twso/README.txt@ 34228

Last change on this file since 34228 was 34228, checked in by kjdon, 4 years ago

a readme

File size: 6.7 KB
Line 
1Updated 30 June 2020.
2
3TWSO - runs on commdev, in /greenstone/greenstone3/web/sites/twso. All mods contained in the site.
4
5Updating TWSO.
6
7I have Greenstone 3 installed locally, with twso site and collection.
8
9Ian sends PDF files of the programmes, plus text/word doc for the list of players, and the pieces metadata.
10
11Players list: Should be in the following form - edit it if not. A text file.
12
13conductor Lastname, Firstname
14soloist Lastname, Firstname
15Lastname, Firstname (all the players listed here)
16Lastname, Firstname
17....
18
19(If you don't get this text file, see instructions at the end of this file for how to generate it.)
20
21Once you have the eg Aug_2019_name_list.txt file, then you need to check names against the master_name_list, to make sure that we use consistent spelling and notation across all programmes.
22
23The easiest way to do this:
24
25Make a backup of the master_name_list.txt - copy to master_name_list.backup
26
27Run
28
29python masternamecreator2.py Aug_2019_name_list.txt
30
31This will add any new names from Aug_2019_name_list into the master_name_list.
32
33Now, do a diff:
34diff master_name_list.txt master_name_list.backup
35
36The differences will be any new names. Check the master for these new names and make sure that there are not alternate spellings. Also, some names have maiden names.
37eg Oliver, Bev and Oliver (Formerly Nation), Beverly
38NOTE: master_list_notes.txt has some info about people and different spellings etc. Look through this first before you check the differences - helps you to know what to do when you find some.
39
40If there is a new variant which is wrong, remove the new variant from the master list, and change the name in the programme name list.
41
42Once you have all the names listed correctly, then generate the metadata.xml file:
43
44python metadatacreator.py Aug_2019_name_list.txt Aug_2019
45The last argument is the name of the PDF file, without the .pdf file extension.
46
47This will create a metadata.xml file called Aug_2019-metadata.xml.
48
49It will list all the players, plus conductor and soloist if these were included in the name_list file.
50
51Open up this metadata file and add the extra metadata:
52Copy and paste a list of empty elements from metadata-skeleton.xml, to save on typing.
53
54The first three may have been done for you depending on how much of the programme text was in the text file.
55pd.Player - format LastName, FirstName. or LastName (nee MaidenName), FirstName. or LastName (formerly PreviousName), FirstName (for change of name that is not due to marriage).
56pd.Soloist (same format) include orchestral and vocal soloists, narrators (but not MC)
57pd.Conductor (same format as player name)
58
59pd.Location format Location 1 & Location 2 & Location 3...
60pd.Date format yyyymmdd. Add multiple dates separately
61pd.formatDate if there are multiple dates, then add this, format like 21/22 November 2014
62pd.Composer - surname only, unless there are composers with same surname.
63pd.Piece - format: composer - piece title, opus number
64pd.Title - concert title
65pd.SubTitle - if concert has a subtitle
66pd.CoPerformer - if other groups are part of the concert. eg Cantando Choir.
67
68If the composer has done an arrangement, for Piece put
69Composer (Arranged) - Piece name
70Or if both composer and arranger are listed, put eg
71"Narro arr. Isaac" for both composer and piece.
72
73MCs are not added.
74
75Add these two files (pdf and metadata.xml) to the import/Programmes/year folder, Create the year if it is a new year.
76 Add into the collection using
77incremental-rebuild.pl -site twso twso.
78Or you can rebuild the entire collection using
79full-rebuild.pl -site twso twso
80
81
82Notes:
83* All the scripts are doing is trying to identify player names. If you can't get these to work properly, you can just manually create the metadata.xml file from scratch, and add all the players in by hand.
84* the collection uses unknownPLugin to import the PDF files, so no conversion is done. Therefore doesn't take very long to do a full rebuild.
85
86
87Uploading to commdev.
88##########################
89
90On commdev, sudo to nzdl-gs3 user.
91
92 * update import folder:
93
94cd /greenstone/greenstone3/web/sites/twso/collect/twso/import/Programmes
95
96rsync -pavHt [email protected]:/Scratch/kjdon/gs3-pei-jones-plus-twso/web/sites/twso/collect/twso/import/Programmes/ .
97(use appropriate user and paths)
98
99delete old backup index (eg index.jun2016).
100rename current index to a backup, eg index.may2018
101
102rysnc the new one
103
104rsync -pavHt [email protected]:/Scratch/kjdon/gs3-pei-jones-plus-twso/web/sites/twso/collect/twso/index .
105
106logout of nzdl-gs3 user. As yourself:
107
108restart tomcat:
109sudo systemctl restart greenstone3
110
111TODO: back up import to /nzdl-storage where???
112
113***************************
114Other instructions:
115
116Extracting the name list
117###########################
118
119If you don't get the players list in the right form, here's how to extract them:
120
121Generate a text file of the PDF. This is used to extract player names. The easiest way is to copy and paste the list of players from the programme into a text file.
122
123If the PDF is old, and you can't cut and paste: then you will either need to type the players names out manually, or you can OCR the file and copy out the list of players.
124
125 Scan the file using Abbyy Fine Reader on Katherine's laptop. PDF -> Word.
126 Save the Word file, then in Word do a SaveAs plain text, unicode encoding.
127 Edit the text file so that just the section with players names is left.
128 Or, you can open up the word doc, and copy and paste the list of players section into a text file.
129
130You may need to modify the formatting.
131It should look like
132Instrument
133player
134player
135
136Instrument
137player
138player
139
140etc.
141
142Notes on format:
143* there must be a blank line between each section of instrument + players
144* the actual instrument doesn't matter, as long as it is recognised as an instrument so that the list of names gets added correctly. In fact, if you are having trouble with the formatting, just put all under a single instrument name. This actually makes the manual part of namefinder processing below much easier.
145* it won't like things like harp / keyboard - just change to a single instrument
146* if you get a new instrument name/format, you can add it into the roles file for next time.
147
148Once you have this text version of the players, in the OCRPrograms folder, run:
149
150python namefinder1.py nameoftxtfile
151
152This will prompt you to add names.
153 * y = yes,
154 * e = edit, if you need to change the format,
155 * space = don't add + move to the next name. (Don't use n = no as it will ignore any more in that section). Have the pdf programme open as you go so you can check off names.
156Keep an eye out for missed sections, eg if it doesn't recognise the instrument.
157And names with more than two words will get processed wrongly.
158
159The output is _name_list.txt. Rename to match the input pdf file. eg Aug_2019_name_list.txt
Note: See TracBrowser for help on using the repository browser.