source: trunk/gsdl3/web/sites/localsite/collect/gberg/README@ 10085

Last change on this file since 10085 was 10085, checked in by kjdon, 19 years ago

modified the info about where to put dtd files

  • Property svn:keywords set to Author Date Id Revision
File size: 2.6 KB
Line 
1The Sample XML texts collection.
2
3This README provides directions on how to rebuild the collection if necessary,
4and how to create a new collection using this method.
5
6Building the gberg collection (linux):
7
8Starting from the gberg directory (where this README file is located):
9cd java
10javac *.java
11cd ..
12java -classpath $CLASSPATH:./java BuildXMLColl $GSDL3HOME/web/sites/localsite gberg
13mv building index
14cp etc/buildConfig.xml index
15cp import/*.dtd $GSDL3HOME/resources/dtd/
16
17The building process:
18
19There are two stages, import and build.
20Importing goes through the xml documents in the import directory and adds
21gs3:id attributes to indexable nodes. Building makes another pass though the
22documents, indexing any indexable nodes, and assigning them node ids. Node
23ids are made up of several parts: document id, scope, tag name, gs3:id.
24Document id refers to the work, and is generated from the original document
25filename. For example, origin.xml becomes origin. The scope part is optional.
26Tag name is the name of the node that is to be indexed, and gs3:id is its id.
27
28The class XMLTagInfo provides two methods, isIndexable and isScopable, and is
29used to determine which tags provide scope, and which tags should be indexed.
30This is the only part of the build code that is specific to the documents in
31the collection.
32
33Indexable nodes are those that should be indexed individually. When a search
34is done, these are the units that will be returned. For instance, suppose a
35collection contains books with chapters and sections of chapters. To have
36searching and retrieval at chapter level, the chapter nodes would be made
37indexable. Alternatively, making the section nodes indexable would provide
38searching at the smaller section level instead.
39
40Scopable nodes are not necessarily indexed, but are recorded as part of the
41id as a scope. This is to speed up searching for the appropriate tag during
42retrieval. Scopeable tags should only occur once per document.
43
44Building other collections:
45
46To use the java building stuff for other XML document collections, you can
47just modify the XMLTagInfo.java file to include the appropriate tag names for
48your XML documents. Then put import documents into the import directory, and
49run the BuildXMLColl program as above. Configuration files will need to be
50created for the collection (etc/collectionConfig.xml and
51index/buildConfig.xml).
52
53The Greenstone runtime software has problems locating DTDs, so any DTDs for
54your collection should be placed in the collection's resources directory.
55If DTD files are shared between collections, they can go into the
56WEB-INF/classes directory (in gsdl3/web or tomcat/webapps/gsdl3, depending on
57your setup).
Note: See TracBrowser for help on using the repository browser.