Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

source: trunk/indexers/mgpp/docs/mgpp_in_greenstone.txt@ 3365

Last change on this file since 3365 was 3365, checked in by kjdon, 22 years ago
Initial revision
Property svn:keywords set to `Author Date Id Revision`
File size: 4.1 KB

Line
1	BUILDING
2
3	Greenstone can build collections using mg or mgpp. The default is mg, but you
4	can use mgpp by editing the collection configuration file.
5
6	First, add the line 'buildtype mgpp'
7
8	Second, the way indexes are described is different.
9
10	mg uses a line like:
11
12	indexes document:text section:text,Title
13
14	This builds two indexes, one of all the text, at document level, the second one
15	of all the text and Title metadata, at section level.
16
17	The document and section tags determine the granularity of the results of a
18	search. The first index returns document numbers, while the second index
19	returns section numbers.
20
21	mgpp does things differently. By default it builds a word level index. Then
22	you specify levels at which you want results returned. For example, in the
23	one index, you might want to be able to retrieve whole documents, and sections.
24
25	The greenstone building code builds a word level index, with Document level
26	granularity. To add other levels (Section and Paragraph are permitted), you add
27	a line like
28
29	levels Section Paragraph
30
31	Note that Paragraph level indexes can be used for searching, but you cant
32	retrieve Paragraph level documents, only Section and Document.
33
34	To specify what goes into the index, we use an indexes line, similar to mg but
35	without the level information (it is specified separately by the levels info).
36	eg:
37
38	indexes text
39
40	This will index all the text at word level.
41
42	To add metadata fields to the index, you can say
43
44	indexes text,Title,Subject for example, or
45	indexes text,metadata
46
47	The first one builds one index, with tagged entries for Title and Subject
48	metadata. Unlike levels, metadata names can be anything - obviously they
49	should match the names in your documents though.
50
51	The second one builds one index with tagged entries for all the metadata it
52	finds - this is useful if you dont know in advance what metadata are available,
53	or want all of it indexed anyway.
54
55	After the building has finished, the build.cfg file in the building directory
56	has a list of what metadata it has found and indexed, for example
57
58	indexfields Subject TextOnly Title
59	indexfieldmap TextOnly->TX Subject->SU Title->TI
60
61	The metadata names are passed to mgpp during building as two letter codes -
62	indexfieldmap specifies what codes were used.
63
64	By default, only the text is compressed, not the metadata. To change this, you
65	can add a line to the config file like
66
67	textcompress text,Title
68
69	this will add Title metadata to the text that gets passed to the compressor.
70
71	QUERYING
72
73	A collection built with mgpp can be searched in the usual way through
74	greenstone. Search terms can be combined with & and \|, phrases are specified using "". Because it uses a word level index, it has some extended searching capability over mg. If metadata has been specified in the index, fielded search can also be done.
75
76	The current query syntax involves the following:
77
78	boolean operators: & AND \| OR ! NOT, with () for precedence
79
80	term modifiers: #icus /x - this is stemming, casefolding and weighting like
81	in gsdl
82
83	#i = case insensitive, #c = case sensitive
84	#u = unstemmed, #s = stemmed
85	/x = term weight (default = 1).
86
87	eg computer#is/10 is computer, stemmed and casefolded, with a weight of 10
88	compared to other terms in the same query
89
90	Proximity searching: NEARx
91	this is used to specify the maximum distance apart two words must be to match
92	eg dog NEAR4 cat - cat must be within 4 words either side of dog.
93	NEAR by itself defaults to 20(??).
94
95	fielded searching: [ terms]:Field
96
97	eg [Witten]:CR
98
99	the field names need to be the names of the metadata elements in your
100	collection. If the collection was built with greenstone, these names are the two letter codes found in the build.cfg file.
101
102	Multiple terms inside the [] are ANDed together.
103
104	Different fields can be combined using normal boolean stuff, eg
105
106	[Witten]:CR & [Gigabytes]:TI
107
108	Term modifiers can be included inside the [].
109
110
111	This syntax can be entered into the standard greenstone search box. For mgpp
112	collections, however, there are additional query pages using forms. These can
113	be accessed through the preferences page - select form query, then simple/
114	advanced.
115	hopefully the forms are fairly self explanatory.
116

Note: See TracBrowser for help on using the repository browser.

Download in other formats: