source: trunk/mgpp/docs/mgpp_in_greenstone.txt@ 3365

Last change on this file since 3365 was 3365, checked in by kjdon, 22 years ago

Initial revision

  • Property svn:keywords set to Author Date Id Revision
File size: 4.1 KB
Line 
1BUILDING
2
3Greenstone can build collections using mg or mgpp. The default is mg, but you
4can use mgpp by editing the collection configuration file.
5
6First, add the line 'buildtype mgpp'
7
8Second, the way indexes are described is different.
9
10mg uses a line like:
11
12indexes document:text section:text,Title
13
14This builds two indexes, one of all the text, at document level, the second one
15of all the text and Title metadata, at section level.
16
17The document and section tags determine the granularity of the results of a
18search. The first index returns document numbers, while the second index
19returns section numbers.
20
21mgpp does things differently. By default it builds a word level index. Then
22you specify levels at which you want results returned. For example, in the
23one index, you might want to be able to retrieve whole documents, and sections.
24
25The greenstone building code builds a word level index, with Document level
26granularity. To add other levels (Section and Paragraph are permitted), you add
27a line like
28
29levels Section Paragraph
30
31Note that Paragraph level indexes can be used for searching, but you cant
32retrieve Paragraph level documents, only Section and Document.
33
34To specify what goes into the index, we use an indexes line, similar to mg but
35without the level information (it is specified separately by the levels info).
36eg:
37
38indexes text
39
40This will index all the text at word level.
41
42To add metadata fields to the index, you can say
43
44indexes text,Title,Subject for example, or
45indexes text,metadata
46
47The first one builds one index, with tagged entries for Title and Subject
48metadata. Unlike levels, metadata names can be anything - obviously they
49should match the names in your documents though.
50
51The second one builds one index with tagged entries for all the metadata it
52finds - this is useful if you dont know in advance what metadata are available,
53or want all of it indexed anyway.
54
55After the building has finished, the build.cfg file in the building directory
56has a list of what metadata it has found and indexed, for example
57
58indexfields Subject TextOnly Title
59indexfieldmap TextOnly->TX Subject->SU Title->TI
60
61The metadata names are passed to mgpp during building as two letter codes -
62indexfieldmap specifies what codes were used.
63
64By default, only the text is compressed, not the metadata. To change this, you
65 can add a line to the config file like
66
67textcompress text,Title
68
69this will add Title metadata to the text that gets passed to the compressor.
70
71QUERYING
72
73A collection built with mgpp can be searched in the usual way through
74greenstone. Search terms can be combined with & and |, phrases are specified using "". Because it uses a word level index, it has some extended searching capability over mg. If metadata has been specified in the index, fielded search can also be done.
75
76The current query syntax involves the following:
77
78boolean operators: & AND | OR ! NOT, with () for precedence
79
80term modifiers: #icus /x - this is stemming, casefolding and weighting like
81in gsdl
82
83 #i = case insensitive, #c = case sensitive
84 #u = unstemmed, #s = stemmed
85 /x = term weight (default = 1).
86
87eg computer#is/10 is computer, stemmed and casefolded, with a weight of 10
88compared to other terms in the same query
89
90Proximity searching: NEARx
91this is used to specify the maximum distance apart two words must be to match
92eg dog NEAR4 cat - cat must be within 4 words either side of dog.
93NEAR by itself defaults to 20(??).
94
95fielded searching: [ terms]:Field
96
97eg [Witten]:CR
98
99the field names need to be the names of the metadata elements in your
100collection. If the collection was built with greenstone, these names are the two letter codes found in the build.cfg file.
101
102Multiple terms inside the [] are ANDed together.
103
104Different fields can be combined using normal boolean stuff, eg
105
106[Witten]:CR & [Gigabytes]:TI
107
108Term modifiers can be included inside the [].
109
110
111This syntax can be entered into the standard greenstone search box. For mgpp
112collections, however, there are additional query pages using forms. These can
113be accessed through the preferences page - select form query, then simple/
114advanced.
115hopefully the forms are fairly self explanatory.
116
Note: See TracBrowser for help on using the repository browser.