1 | The Sample XML texts collection.
|
---|
2 |
|
---|
3 | This README provides directions on how to rebuild the collection if necessary,
|
---|
4 | and how to create a new collection using this method.
|
---|
5 |
|
---|
6 | Building the gberg collection (linux):
|
---|
7 |
|
---|
8 | Go to top level Greenstone and run:
|
---|
9 | source gs3-setup.sh
|
---|
10 | Starting from the gberg directory (where this README file is located):
|
---|
11 | cd java
|
---|
12 | javac *.java
|
---|
13 | cd ..
|
---|
14 | java -classpath $CLASSPATH:./java BuildXMLColl $GSDL3HOME/sites/localsite gberg
|
---|
15 | mv building index
|
---|
16 | cp etc/buildConfig.xml index
|
---|
17 | cp import/*.dtd $GSDL3SRCHOME/resources/dtd/ (this step is already done)
|
---|
18 |
|
---|
19 | The building process:
|
---|
20 |
|
---|
21 | There are two stages, import and build.
|
---|
22 | Importing goes through the xml documents in the import directory and adds
|
---|
23 | gs3:id attributes to indexable nodes. Building makes another pass though the
|
---|
24 | documents, indexing any indexable nodes, and assigning them node ids. Node
|
---|
25 | ids are made up of several parts: document id, scope, tag name, gs3:id.
|
---|
26 | Document id refers to the work, and is generated from the original document
|
---|
27 | filename. For example, origin.xml becomes origin. The scope part is optional.
|
---|
28 | Tag name is the name of the node that is to be indexed, and gs3:id is its id.
|
---|
29 |
|
---|
30 | The class XMLTagInfo provides two methods, isIndexable and isScopable, and is
|
---|
31 | used to determine which tags provide scope, and which tags should be indexed.
|
---|
32 | This is the only part of the build code that is specific to the documents in
|
---|
33 | the collection.
|
---|
34 |
|
---|
35 | Indexable nodes are those that should be indexed individually. When a search
|
---|
36 | is done, these are the units that will be returned. For instance, suppose a
|
---|
37 | collection contains books with chapters and sections of chapters. To have
|
---|
38 | searching and retrieval at chapter level, the chapter nodes would be made
|
---|
39 | indexable. Alternatively, making the section nodes indexable would provide
|
---|
40 | searching at the smaller section level instead.
|
---|
41 |
|
---|
42 | Scopable nodes are not necessarily indexed, but are recorded as part of the
|
---|
43 | id as a scope. This is to speed up searching for the appropriate tag during
|
---|
44 | retrieval. Scopeable tags should only occur once per document.
|
---|
45 |
|
---|
46 | Testing the index:
|
---|
47 |
|
---|
48 | You can run the command line Search program, and query the index. From the gberg folder:
|
---|
49 | java -classpath $CLASSPATH:./java Search ./index/idx
|
---|
50 |
|
---|
51 | Building other collections:
|
---|
52 |
|
---|
53 | To use the java building stuff for other XML document collections, you can
|
---|
54 | just modify the XMLTagInfo.java file to include the appropriate tag names for
|
---|
55 | your XML documents. Then put import documents into the import directory, and
|
---|
56 | run the BuildXMLColl program as above. Configuration files will need to be
|
---|
57 | created for the collection (etc/collectionConfig.xml and
|
---|
58 | index/buildConfig.xml).
|
---|
59 |
|
---|
60 | The Greenstone runtime software has problems locating DTDs, so any DTDs for
|
---|
61 | your collection should be placed in the collection's resources directory.
|
---|
62 | If DTD files are shared between collections, they can go into the
|
---|
63 | WEB-INF/classes directory (in gsdl3/web or tomcat/webapps/gsdl3, depending on
|
---|
64 | your setup).
|
---|
65 |
|
---|
66 | NOTE: Sept 2018. Tomcat cannot find gutbook DTD when trying to transform the file to produce table of contents :-( |
---|