Changeset 6226


Ignore:
Timestamp:
2003-12-12T11:08:06+13:00 (20 years ago)
Author:
kjdon
Message:

more info

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl3/web/sites/localsite/collect/gberg/README

    r5999 r6226  
    1 This collection is a demonstration collection for Greenstone3 and lucene. The aim is to demonstrate that you can have pretty much whatever you want as a collection as long as you have suitable services for it. The collection has not been built using Greenstone2 or Greenstone3 style building. Instead, simple java programs using the Lucene API have been created to do the indexing, and document retrieval is simply done from the original (actually slightly modified) XML documents.
     1The Sample XML texts collection.
    22
    3 BUILDING
     3This README provides directions on how to rebuild the collection if necessary,
     4and how to create a new collection using this method.
    45
    5 To build this collection (assuming you are starting from the gberg directory where this file is located):
     6Building the gberg collection (linux):
    67
     8Starting from the gberg directory (where this README file is located):
    79cd java
    810javac *.java
     
    1315cp import/*.dtd $GSDL3HOME/resources/dtd/
    1416
    15 ABOUT THIS COLLECTION
    16 
    17 This collection uses some custom xslt files which are found in the transform directory. Because document display is handled differently, it uses xd for the document action. In the interface config file, the xd action has been added, mapping to XMLDocumentAction, with separate xsl stylesheets for toc and text subactions.
    18 
    19 <action name='xd' class='XMLDocumentAction'>
    20   <subaction name='toc' xslt='document-toc.xsl'/>
    21   <subaction name='text' xslt='document-text.xsl'/>
    22 </action>
    23 
    24 The build process:
     17The building process:
    2518
    2619There are two stages, import and build.
    27 Importing goes through the xml documents and adds gs3:id attributes to indexable nodes.
    28 Node ids are made up of several parts:
    29 document id, scope, tag name, gs3:id
     20Importing goes through the xml documents in the import directory and adds
     21gs3:id attributes to indexable nodes. Building makes another pass though the
     22documents, indexing any indexable nodes, and assigning them node ids. Node
     23ids are made up of several parts: document id, scope, tag name, gs3:id.
     24Document id refers to the work, and is generated from the original document
     25filename. For example, origin.xml becomes origin. The scope part is optional.
     26Tag name is the name of the node that is to be indexed, and gs3:id is its id.
    3027
    31 document id refers to the work, and is generated from the original document filename. eg origin.xml becomes origin.
     28The class XMLTagInfo provides two methods, isIndexable and isScopable, and is
     29used to determine which tags provide scope, and which tags should be indexed.
     30This is the only part of the build code that is specific to the documents in
     31the collection.
    3232
    33 XMLTagInfo provides lists of tags that are scopeable and indexable. Scopable tags are not necessarily indexed, but are recorded as part of the id as a scope. This is to speed up searching for the appropriate tag during retrieval. Scopeable tags should only occur once per document.
    34 Indexable tags are the divisions that should be indexed individually. When you do a search, these are the units that will be returned.
    35 For instance, we want each chapter to be indexed separately, but not each paragraph.
     33Indexable nodes are those that should be indexed individually. When a search
     34is done, these are the units that will be returned. For instance, suppose a
     35collection contains books with chapters and sections of chapters. To have
     36searching and retrieval at chapter level, the chapter nodes would be made
     37indexable. Alternatively, making the section nodes indexable would provide
     38searching at the smaller section level instead.
    3639
     40Scopable nodes are not necessarily indexed, but are recorded as part of the
     41id as a scope. This is to speed up searching for the appropriate tag during
     42retrieval. Scopeable tags should only occur once per document.
    3743
     44Building other collections:
    3845
     46To use the java building stuff for other XML document collections, you can
     47just modify the XMLTagInfo.java file to include the appropriate tag names for
     48your XML documents.
Note: See TracChangeset for help on using the changeset viewer.