source: main/trunk/greenstone3/web/sites/localsite/collect/gberg/README@ 32489

Last change on this file since 32489 was 32489, checked in by kjdon, 6 years ago

updated the README

  • Property svn:keywords set to Author Date Id Revision
File size: 3.0 KB
Line 
1The Sample XML texts collection.
2
3This README provides directions on how to rebuild the collection if necessary,
4and how to create a new collection using this method.
5
6Building the gberg collection (linux):
7
8Go to top level Greenstone and run:
9source gs3-setup.sh
10Starting from the gberg directory (where this README file is located):
11cd java
12javac *.java
13cd ..
14java -classpath $CLASSPATH:./java BuildXMLColl $GSDL3HOME/sites/localsite gberg
15mv building index
16cp etc/buildConfig.xml index
17cp import/*.dtd $GSDL3SRCHOME/resources/dtd/ (this step is already done)
18
19The building process:
20
21There are two stages, import and build.
22Importing goes through the xml documents in the import directory and adds
23gs3:id attributes to indexable nodes. Building makes another pass though the
24documents, indexing any indexable nodes, and assigning them node ids. Node
25ids are made up of several parts: document id, scope, tag name, gs3:id.
26Document id refers to the work, and is generated from the original document
27filename. For example, origin.xml becomes origin. The scope part is optional.
28Tag name is the name of the node that is to be indexed, and gs3:id is its id.
29
30The class XMLTagInfo provides two methods, isIndexable and isScopable, and is
31used to determine which tags provide scope, and which tags should be indexed.
32This is the only part of the build code that is specific to the documents in
33the collection.
34
35Indexable nodes are those that should be indexed individually. When a search
36is done, these are the units that will be returned. For instance, suppose a
37collection contains books with chapters and sections of chapters. To have
38searching and retrieval at chapter level, the chapter nodes would be made
39indexable. Alternatively, making the section nodes indexable would provide
40searching at the smaller section level instead.
41
42Scopable nodes are not necessarily indexed, but are recorded as part of the
43id as a scope. This is to speed up searching for the appropriate tag during
44retrieval. Scopeable tags should only occur once per document.
45
46Testing the index:
47
48You can run the command line Search program, and query the index. From the gberg folder:
49java -classpath $CLASSPATH:./java Search ./index/idx
50
51Building other collections:
52
53To use the java building stuff for other XML document collections, you can
54just modify the XMLTagInfo.java file to include the appropriate tag names for
55your XML documents. Then put import documents into the import directory, and
56run the BuildXMLColl program as above. Configuration files will need to be
57created for the collection (etc/collectionConfig.xml and
58index/buildConfig.xml).
59
60The Greenstone runtime software has problems locating DTDs, so any DTDs for
61your collection should be placed in the collection's resources directory.
62If DTD files are shared between collections, they can go into the
63WEB-INF/classes directory (in gsdl3/web or tomcat/webapps/gsdl3, depending on
64your setup).
65
66NOTE: Sept 2018. Tomcat cannot find gutbook DTD when trying to transform the file to produce table of contents :-(
Note: See TracBrowser for help on using the repository browser.