Context Navigation

← Previous Changeset
Next Changeset →

Changeset 6226

Timestamp:

2003-12-12T11:08:06+13:00 (20 years ago)

Author:

kjdon

Message:

more info

File:

: 1 edited

trunk/gsdl3/web/sites/localsite/collect/gberg/README (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

trunk/gsdl3/web/sites/localsite/collect/gberg/README

-              r5999
+              r6226
 This collection is a demonstration collection for Greenstone3 and lucene. The aim is to demonstrate that you can have pretty much whatever you want as a collection as long as you have suitable services for it. The collection has not been built using Greenstone2 or Greenstone3 style building. Instead, simple java programs using the Lucene API have been created to do the indexing, and document retrieval is simply done from the original (actually slightly modified) XML documents.
+The Sample XML texts collection.
+BUILDING
+This README provides directions on how to rebuild the collection if necessary,
+and how to create a new collection using this method.
 To build this collection (assuming you are starting from the gberg directory where this file is located):
+Building the gberg collection (linux):
+Starting from the gberg directory (where this README file is located):
 cd java
 javac *.java
 …
 cp import/*.dtd $GSDL3HOME/resources/dtd/
+ABOUT THIS COLLECTION
+This collection uses some custom xslt files which are found in the transform directory. Because document display is handled differently, it uses xd for the document action. In the interface config file, the xd action has been added, mapping to XMLDocumentAction, with separate xsl stylesheets for toc and text subactions.
+<action name='xd' class='XMLDocumentAction'>
+  <subaction name='toc' xslt='document-toc.xsl'/>
+  <subaction name='text' xslt='document-text.xsl'/>
+</action>
+The build process:
+The building process:
 There are two stages, import and build.
+Importing goes through the xml documents and adds gs3:id attributes to indexable nodes.
+Node ids are made up of several parts:
+document id, scope, tag name, gs3:id
+Importing goes through the xml documents in the import directory and adds
+gs3:id attributes to indexable nodes. Building makes another pass though the
+documents, indexing any indexable nodes, and assigning them node ids. Node
+ids are made up of several parts: document id, scope, tag name, gs3:id.
+Document id refers to the work, and is generated from the original document
+filename. For example, origin.xml becomes origin. The scope part is optional.
+Tag name is the name of the node that is to be indexed, and gs3:id is its id.
+document id refers to the work, and is generated from the original document filename. eg origin.xml becomes origin.
+The class XMLTagInfo provides two methods, isIndexable and isScopable, and is
+used to determine which tags provide scope, and which tags should be indexed.
+This is the only part of the build code that is specific to the documents in
+the collection.
+XMLTagInfo provides lists of tags that are scopeable and indexable. Scopable tags are not necessarily indexed, but are recorded as part of the id as a scope. This is to speed up searching for the appropriate tag during retrieval. Scopeable tags should only occur once per document.
+Indexable tags are the divisions that should be indexed individually. When you do a search, these are the units that will be returned.
+For instance, we want each chapter to be indexed separately, but not each paragraph.
+Indexable nodes are those that should be indexed individually. When a search
+is done, these are the units that will be returned. For instance, suppose a
+collection contains books with chapters and sections of chapters. To have
+searching and retrieval at chapter level, the chapter nodes would be made
+indexable. Alternatively, making the section nodes indexable would provide
+searching at the smaller section level instead.
+Scopable nodes are not necessarily indexed, but are recorded as part of the
+id as a scope. This is to speed up searching for the appropriate tag during
+retrieval. Scopeable tags should only occur once per document.
+Building other collections:
+To use the java building stuff for other XML document collections, you can
+just modify the XMLTagInfo.java file to include the appropriate tag names for
+your XML documents.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 6226

Legend:

trunk/gsdl3/web/sites/localsite/collect/gberg/README

Download in other formats: