1 | BUILDING
|
---|
2 |
|
---|
3 | Greenstone can build collections using mg or mgpp. The default is mg, but you
|
---|
4 | can use mgpp by editing the collection configuration file.
|
---|
5 |
|
---|
6 | First, add the line 'buildtype mgpp'
|
---|
7 |
|
---|
8 | Second, the way indexes are described is different.
|
---|
9 |
|
---|
10 | mg uses a line like:
|
---|
11 |
|
---|
12 | indexes document:text section:text,Title
|
---|
13 |
|
---|
14 | This builds two indexes, one of all the text, at document level, the second one
|
---|
15 | of all the text and Title metadata, at section level.
|
---|
16 |
|
---|
17 | The document and section tags determine the granularity of the results of a
|
---|
18 | search. The first index returns document numbers, while the second index
|
---|
19 | returns section numbers.
|
---|
20 |
|
---|
21 | mgpp does things differently. By default it builds a word level index. Then
|
---|
22 | you specify levels at which you want results returned. For example, in the
|
---|
23 | one index, you might want to be able to retrieve whole documents, and sections.
|
---|
24 |
|
---|
25 | The greenstone building code builds a word level index, with Document level
|
---|
26 | granularity. To add other levels (Section and Paragraph are permitted), you add
|
---|
27 | a line like
|
---|
28 |
|
---|
29 | levels Section Paragraph
|
---|
30 |
|
---|
31 | Note that Paragraph level indexes can be used for searching, but you cant
|
---|
32 | retrieve Paragraph level documents, only Section and Document.
|
---|
33 |
|
---|
34 | To specify what goes into the index, we use an indexes line, similar to mg but
|
---|
35 | without the level information (it is specified separately by the levels info).
|
---|
36 | eg:
|
---|
37 |
|
---|
38 | indexes text
|
---|
39 |
|
---|
40 | This will index all the text at word level.
|
---|
41 |
|
---|
42 | To add metadata fields to the index, you can say
|
---|
43 |
|
---|
44 | indexes text,Title,Subject for example, or
|
---|
45 | indexes text,metadata
|
---|
46 |
|
---|
47 | The first one builds one index, with tagged entries for Title and Subject
|
---|
48 | metadata. Unlike levels, metadata names can be anything - obviously they
|
---|
49 | should match the names in your documents though.
|
---|
50 |
|
---|
51 | The second one builds one index with tagged entries for all the metadata it
|
---|
52 | finds - this is useful if you dont know in advance what metadata are available,
|
---|
53 | or want all of it indexed anyway.
|
---|
54 |
|
---|
55 | After the building has finished, the build.cfg file in the building directory
|
---|
56 | has a list of what metadata it has found and indexed, for example
|
---|
57 |
|
---|
58 | indexfields Subject TextOnly Title
|
---|
59 | indexfieldmap TextOnly->TX Subject->SU Title->TI
|
---|
60 |
|
---|
61 | The metadata names are passed to mgpp during building as two letter codes -
|
---|
62 | indexfieldmap specifies what codes were used.
|
---|
63 |
|
---|
64 | By default, only the text is compressed, not the metadata. To change this, you
|
---|
65 | can add a line to the config file like
|
---|
66 |
|
---|
67 | textcompress text,Title
|
---|
68 |
|
---|
69 | this will add Title metadata to the text that gets passed to the compressor.
|
---|
70 |
|
---|
71 | QUERYING
|
---|
72 |
|
---|
73 | A collection built with mgpp can be searched in the usual way through
|
---|
74 | greenstone. Search terms can be combined with & and |, phrases are specified using "". Because it uses a word level index, it has some extended searching capability over mg. If metadata has been specified in the index, fielded search can also be done.
|
---|
75 |
|
---|
76 | The current query syntax involves the following:
|
---|
77 |
|
---|
78 | boolean operators: & AND | OR ! NOT, with () for precedence
|
---|
79 |
|
---|
80 | term modifiers: #icus /x - this is stemming, casefolding and weighting like
|
---|
81 | in gsdl
|
---|
82 |
|
---|
83 | #i = case insensitive, #c = case sensitive
|
---|
84 | #u = unstemmed, #s = stemmed
|
---|
85 | /x = term weight (default = 1).
|
---|
86 |
|
---|
87 | eg computer#is/10 is computer, stemmed and casefolded, with a weight of 10
|
---|
88 | compared to other terms in the same query
|
---|
89 |
|
---|
90 | Proximity searching: NEARx
|
---|
91 | this is used to specify the maximum distance apart two words must be to match
|
---|
92 | eg dog NEAR4 cat - cat must be within 4 words either side of dog.
|
---|
93 | NEAR by itself defaults to 20(??).
|
---|
94 |
|
---|
95 | fielded searching: [ terms]:Field
|
---|
96 |
|
---|
97 | eg [Witten]:CR
|
---|
98 |
|
---|
99 | the field names need to be the names of the metadata elements in your
|
---|
100 | collection. If the collection was built with greenstone, these names are the two letter codes found in the build.cfg file.
|
---|
101 |
|
---|
102 | Multiple terms inside the [] are ANDed together.
|
---|
103 |
|
---|
104 | Different fields can be combined using normal boolean stuff, eg
|
---|
105 |
|
---|
106 | [Witten]:CR & [Gigabytes]:TI
|
---|
107 |
|
---|
108 | Term modifiers can be included inside the [].
|
---|
109 |
|
---|
110 |
|
---|
111 | This syntax can be entered into the standard greenstone search box. For mgpp
|
---|
112 | collections, however, there are additional query pages using forms. These can
|
---|
113 | be accessed through the preferences page - select form query, then simple/
|
---|
114 | advanced.
|
---|
115 | hopefully the forms are fairly self explanatory.
|
---|
116 |
|
---|