Context Navigation

README@ 32489

Last change on this file since 32489 was 32489, checked in by kjdon, 6 years ago
updated the README
Property svn:keywords set to `Author Date Id Revision`
File size: 3.0 KB

Line
1	The Sample XML texts collection.
2
3	This README provides directions on how to rebuild the collection if necessary,
4	and how to create a new collection using this method.
5
6	Building the gberg collection (linux):
7
8	Go to top level Greenstone and run:
9	source gs3-setup.sh
10	Starting from the gberg directory (where this README file is located):
11	cd java
12	javac *.java
13	cd ..
14	java -classpath $CLASSPATH:./java BuildXMLColl $GSDL3HOME/sites/localsite gberg
15	mv building index
16	cp etc/buildConfig.xml index
17	cp import/*.dtd $GSDL3SRCHOME/resources/dtd/ (this step is already done)
18
19	The building process:
20
21	There are two stages, import and build.
22	Importing goes through the xml documents in the import directory and adds
23	gs3:id attributes to indexable nodes. Building makes another pass though the
24	documents, indexing any indexable nodes, and assigning them node ids. Node
25	ids are made up of several parts: document id, scope, tag name, gs3:id.
26	Document id refers to the work, and is generated from the original document
27	filename. For example, origin.xml becomes origin. The scope part is optional.
28	Tag name is the name of the node that is to be indexed, and gs3:id is its id.
29
30	The class XMLTagInfo provides two methods, isIndexable and isScopable, and is
31	used to determine which tags provide scope, and which tags should be indexed.
32	This is the only part of the build code that is specific to the documents in
33	the collection.
34
35	Indexable nodes are those that should be indexed individually. When a search
36	is done, these are the units that will be returned. For instance, suppose a
37	collection contains books with chapters and sections of chapters. To have
38	searching and retrieval at chapter level, the chapter nodes would be made
39	indexable. Alternatively, making the section nodes indexable would provide
40	searching at the smaller section level instead.
41
42	Scopable nodes are not necessarily indexed, but are recorded as part of the
43	id as a scope. This is to speed up searching for the appropriate tag during
44	retrieval. Scopeable tags should only occur once per document.
45
46	Testing the index:
47
48	You can run the command line Search program, and query the index. From the gberg folder:
49	java -classpath $CLASSPATH:./java Search ./index/idx
50
51	Building other collections:
52
53	To use the java building stuff for other XML document collections, you can
54	just modify the XMLTagInfo.java file to include the appropriate tag names for
55	your XML documents. Then put import documents into the import directory, and
56	run the BuildXMLColl program as above. Configuration files will need to be
57	created for the collection (etc/collectionConfig.xml and
58	index/buildConfig.xml).
59
60	The Greenstone runtime software has problems locating DTDs, so any DTDs for
61	your collection should be placed in the collection's resources directory.
62	If DTD files are shared between collections, they can go into the
63	WEB-INF/classes directory (in gsdl3/web or tomcat/webapps/gsdl3, depending on
64	your setup).
65
66	NOTE: Sept 2018. Tomcat cannot find gutbook DTD when trying to transform the file to produce table of contents :-(

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: main/trunk/greenstone3/web/sites/localsite/collect/gberg/README@ 32489

Download in other formats: