1 | The Sample XML texts collection.
|
---|
2 |
|
---|
3 | This README provides directions on how to rebuild the collection if necessary,
|
---|
4 | and how to create a new collection using this method.
|
---|
5 |
|
---|
6 | Building the gberg collection (linux):
|
---|
7 |
|
---|
8 | Starting from the gberg directory (where this README file is located):
|
---|
9 | cd java
|
---|
10 | javac *.java
|
---|
11 | cd ..
|
---|
12 | java -classpath $CLASSPATH:./java BuildXMLColl $GSDL3HOME/web/sites/localsite gberg
|
---|
13 | mv building index
|
---|
14 | cp etc/buildConfig.xml index
|
---|
15 | cp import/*.dtd $GSDL3HOME/resources/dtd/
|
---|
16 |
|
---|
17 | The building process:
|
---|
18 |
|
---|
19 | There are two stages, import and build.
|
---|
20 | Importing goes through the xml documents in the import directory and adds
|
---|
21 | gs3:id attributes to indexable nodes. Building makes another pass though the
|
---|
22 | documents, indexing any indexable nodes, and assigning them node ids. Node
|
---|
23 | ids are made up of several parts: document id, scope, tag name, gs3:id.
|
---|
24 | Document id refers to the work, and is generated from the original document
|
---|
25 | filename. For example, origin.xml becomes origin. The scope part is optional.
|
---|
26 | Tag name is the name of the node that is to be indexed, and gs3:id is its id.
|
---|
27 |
|
---|
28 | The class XMLTagInfo provides two methods, isIndexable and isScopable, and is
|
---|
29 | used to determine which tags provide scope, and which tags should be indexed.
|
---|
30 | This is the only part of the build code that is specific to the documents in
|
---|
31 | the collection.
|
---|
32 |
|
---|
33 | Indexable nodes are those that should be indexed individually. When a search
|
---|
34 | is done, these are the units that will be returned. For instance, suppose a
|
---|
35 | collection contains books with chapters and sections of chapters. To have
|
---|
36 | searching and retrieval at chapter level, the chapter nodes would be made
|
---|
37 | indexable. Alternatively, making the section nodes indexable would provide
|
---|
38 | searching at the smaller section level instead.
|
---|
39 |
|
---|
40 | Scopable nodes are not necessarily indexed, but are recorded as part of the
|
---|
41 | id as a scope. This is to speed up searching for the appropriate tag during
|
---|
42 | retrieval. Scopeable tags should only occur once per document.
|
---|
43 |
|
---|
44 | Building other collections:
|
---|
45 |
|
---|
46 | To use the java building stuff for other XML document collections, you can
|
---|
47 | just modify the XMLTagInfo.java file to include the appropriate tag names for
|
---|
48 | your XML documents. Then put import documents into the import directory, and
|
---|
49 | run the BuildXMLColl program as above. Configuration files will need to be
|
---|
50 | created for the collection (etc/collectionConfig.xml and
|
---|
51 | index/buildConfig.xml).
|
---|
52 |
|
---|
53 | The Greenstone runtime software has problems locating DTDs, so any DTDs for
|
---|
54 | your collection should be placed in the collection's resources directory.
|
---|
55 | If DTD files are shared between collections, they can go into the
|
---|
56 | WEB-INF/classes directory (in gsdl3/web or tomcat/webapps/gsdl3, depending on
|
---|
57 | your setup).
|
---|