BUILDING Greenstone can build collections using mg or mgpp. The default is mg, but you can use mgpp by editing the collection configuration file. First, add the line 'buildtype mgpp' Second, the way indexes are described is different. mg uses a line like: indexes document:text section:text,Title This builds two indexes, one of all the text, at document level, the second one of all the text and Title metadata, at section level. The document and section tags determine the granularity of the results of a search. The first index returns document numbers, while the second index returns section numbers. mgpp does things differently. By default it builds a word level index. Then you specify levels at which you want results returned. For example, in the one index, you might want to be able to retrieve whole documents, and sections. The greenstone building code builds a word level index, with Document level granularity. To add other levels (Section and Paragraph are permitted), you add a line like levels Section Paragraph Note that Paragraph level indexes can be used for searching, but you cant retrieve Paragraph level documents, only Section and Document. To specify what goes into the index, we use an indexes line, similar to mg but without the level information (it is specified separately by the levels info). eg: indexes text This will index all the text at word level. To add metadata fields to the index, you can say indexes text,Title,Subject for example, or indexes text,metadata The first one builds one index, with tagged entries for Title and Subject metadata. Unlike levels, metadata names can be anything - obviously they should match the names in your documents though. The second one builds one index with tagged entries for all the metadata it finds - this is useful if you dont know in advance what metadata are available, or want all of it indexed anyway. After the building has finished, the build.cfg file in the building directory has a list of what metadata it has found and indexed, for example indexfields Subject TextOnly Title indexfieldmap TextOnly->TX Subject->SU Title->TI The metadata names are passed to mgpp during building as two letter codes - indexfieldmap specifies what codes were used. By default, only the text is compressed, not the metadata. To change this, you can add a line to the config file like textcompress text,Title this will add Title metadata to the text that gets passed to the compressor. QUERYING A collection built with mgpp can be searched in the usual way through greenstone. Search terms can be combined with & and |, phrases are specified using "". Because it uses a word level index, it has some extended searching capability over mg. If metadata has been specified in the index, fielded search can also be done. The current query syntax involves the following: boolean operators: & AND | OR ! NOT, with () for precedence term modifiers: #icus /x - this is stemming, casefolding and weighting like in gsdl #i = case insensitive, #c = case sensitive #u = unstemmed, #s = stemmed /x = term weight (default = 1). eg computer#is/10 is computer, stemmed and casefolded, with a weight of 10 compared to other terms in the same query Proximity searching: NEARx this is used to specify the maximum distance apart two words must be to match eg dog NEAR4 cat - cat must be within 4 words either side of dog. NEAR by itself defaults to 20(??). fielded searching: [ terms]:Field eg [Witten]:CR the field names need to be the names of the metadata elements in your collection. If the collection was built with greenstone, these names are the two letter codes found in the build.cfg file. Multiple terms inside the [] are ANDed together. Different fields can be combined using normal boolean stuff, eg [Witten]:CR & [Gigabytes]:TI Term modifiers can be included inside the []. This syntax can be entered into the standard greenstone search box. For mgpp collections, however, there are additional query pages using forms. These can be accessed through the preferences page - select form query, then simple/ advanced. hopefully the forms are fairly self explanatory.