Changeset 6226
- Timestamp:
- 2003-12-12T11:08:06+13:00 (20 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/gsdl3/web/sites/localsite/collect/gberg/README
r5999 r6226 1 Th is collection is a demonstration collection for Greenstone3 and lucene. The aim is to demonstrate that you can have pretty much whatever you want as a collection as long as you have suitable services for it. The collection has not been built using Greenstone2 or Greenstone3 style building. Instead, simple java programs using the Lucene API have been created to do the indexing, and document retrieval is simply done from the original (actually slightly modified) XML documents.1 The Sample XML texts collection. 2 2 3 BUILDING 3 This README provides directions on how to rebuild the collection if necessary, 4 and how to create a new collection using this method. 4 5 5 To build this collection (assuming you are starting from the gberg directory where this file is located):6 Building the gberg collection (linux): 6 7 8 Starting from the gberg directory (where this README file is located): 7 9 cd java 8 10 javac *.java … … 13 15 cp import/*.dtd $GSDL3HOME/resources/dtd/ 14 16 15 ABOUT THIS COLLECTION 16 17 This collection uses some custom xslt files which are found in the transform directory. Because document display is handled differently, it uses xd for the document action. In the interface config file, the xd action has been added, mapping to XMLDocumentAction, with separate xsl stylesheets for toc and text subactions. 18 19 <action name='xd' class='XMLDocumentAction'> 20 <subaction name='toc' xslt='document-toc.xsl'/> 21 <subaction name='text' xslt='document-text.xsl'/> 22 </action> 23 24 The build process: 17 The building process: 25 18 26 19 There are two stages, import and build. 27 Importing goes through the xml documents and adds gs3:id attributes to indexable nodes. 28 Node ids are made up of several parts: 29 document id, scope, tag name, gs3:id 20 Importing goes through the xml documents in the import directory and adds 21 gs3:id attributes to indexable nodes. Building makes another pass though the 22 documents, indexing any indexable nodes, and assigning them node ids. Node 23 ids are made up of several parts: document id, scope, tag name, gs3:id. 24 Document id refers to the work, and is generated from the original document 25 filename. For example, origin.xml becomes origin. The scope part is optional. 26 Tag name is the name of the node that is to be indexed, and gs3:id is its id. 30 27 31 document id refers to the work, and is generated from the original document filename. eg origin.xml becomes origin. 28 The class XMLTagInfo provides two methods, isIndexable and isScopable, and is 29 used to determine which tags provide scope, and which tags should be indexed. 30 This is the only part of the build code that is specific to the documents in 31 the collection. 32 32 33 XMLTagInfo provides lists of tags that are scopeable and indexable. Scopable tags are not necessarily indexed, but are recorded as part of the id as a scope. This is to speed up searching for the appropriate tag during retrieval. Scopeable tags should only occur once per document. 34 Indexable tags are the divisions that should be indexed individually. When you do a search, these are the units that will be returned. 35 For instance, we want each chapter to be indexed separately, but not each paragraph. 33 Indexable nodes are those that should be indexed individually. When a search 34 is done, these are the units that will be returned. For instance, suppose a 35 collection contains books with chapters and sections of chapters. To have 36 searching and retrieval at chapter level, the chapter nodes would be made 37 indexable. Alternatively, making the section nodes indexable would provide 38 searching at the smaller section level instead. 36 39 40 Scopable nodes are not necessarily indexed, but are recorded as part of the 41 id as a scope. This is to speed up searching for the appropriate tag during 42 retrieval. Scopeable tags should only occur once per document. 37 43 44 Building other collections: 38 45 46 To use the java building stuff for other XML document collections, you can 47 just modify the XMLTagInfo.java file to include the appropriate tag names for 48 your XML documents.
Note:
See TracChangeset
for help on using the changeset viewer.