Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

source: main/trunk/greenstone2/common-src/indexers/mg/src/text/help.mg.src@ 30203

Last change on this file since 30203 was 16583, checked in by davidb, 16 years ago
Undoing change commited in r16582
Property svn:executable set to ``* Property svn:keywords set to `Author Date Id Revision`
File size: 14.1 KB

Line
1	###########################################################################
2	#
3	# help.mg.src -- Source for the help command
4	# Copyright (C) 1994 Neil Sharman
5	#
6	# This program is free software; you can redistribute it and/or modify
7	# it under the terms of the GNU General Public License as published by
8	# the Free Software Foundation; either version 2 of the License, or
9	# (at your option) any later version.
10	#
11	# This program is distributed in the hope that it will be useful,
12	# but WITHOUT ANY WARRANTY; without even the implied warranty of
13	# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14	# GNU General Public License for more details.
15	#
16	# You should have received a copy of the GNU General Public License
17	# along with this program; if not, write to the Free Software
18	# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
19	#
20	# @(#)help.mg.src 1.8 21 Mar 1994
21	#
22	###########################################################################
23	#
24	# The help file for mgquery.
25	# Lines starting with '#' are treated as comments and are discarded when
26	# the 'help.mg.h' file is produced
27	#
28	###########################################################################
29
30	HELP for mgquery
31	================================
32
33	The text is a summary of the information in the "mgquery" manual pages.
34
35	The input to 'mgquery' consists of a series of input lines. The backslash
36	character ("\") is used at the end of lines to indicate that input
37	continues on the next line.
38
39	Input lines on which the first character is a dot (".") are commands to
40	the mgquery program. Input lines that do not start with a dot are queries.
41
42	A query consists of two parts. One part is a boolean or ranked query that
43	identifies documents. The second part is a post-processing pattern matching
44	operation. Any text between the first speech mark (") and the last speech
45	mark is considered to the a post-processing pattern.
46
47
48	The following command are available :-
49
50	.help - displays this text.
51	.quit - quits the program.
52	.set name value - sets parameter "name" to "value" . If the parameter
53	is a boolean parameter and value is omitted the
54	parameter will be inverted (i.e. if it is true it
55	will change to false, if it is false it will change
56	to true).
57	.unset name - deletes parameter "name"
58	.reset - sets all the parameters to their initial state.
59	.display - displays the values of all the current parameters.
60	.push - pushes the current parameters on to a stack.
61	.pop - destroys the current parameters and pops a new set
62	of parameters off the stack.
63	.output arg - This is used to specify where to send the documents.
64	Arg may one be of the following:
65	> filename : Send output to the specified file.
66	>> filename : Append output to the specified file.
67	\| command : The output is piped into command,
68	which is executed by sh.
69	.input arg - This is used to specify where input comes from.
70	Arg may one be of the following:
71	< filename : Get the input from the specified
72	file.
73	\| command : The input comes from the standard
74	output of command, which is executed
75	by sh.
76
77	On startup the mgquery program reads from the file .mgrc a sequence
78	of commands (NOTE: The .mgrc file may not contain any queries). mgquery
79	first looks for .mgrc in the current directory and then in the users home
80	directory. Lines starting with a '#' in the .mgrc file are considered to
81	be comments and are ignored.
82
83	The following parameters (used in the .set and .unset commands) are
84	predefined and have special significance :-
85
86	accumulator_method = `array'
87	This parameter is used during ranking, and specifies how the
88	weight for each document should be accumulated. The following
89	methods are available `array', `splay_tree', `hash_table', and
90	`list'.
91
92	briefstats = `off'
93	This is a boolean parameter that determines whether the the
94	totals for disk, memory and time usage statistics will be
95	displayed. at the end of each query.
96	NOTE: this takes precedence over the parameters "diskstats",
97	"memstats" and "timestats". This parameter may take the values
98	`yes', `no', `true', `false', `on' or `off'.
99
100	buffer = `1048576'
101	When the documents are being read in they are read into a
102	buffer of this size and then displayed from this buffer. If
103	the documents are larger than this buffer the buffer is
104	expanded automatically. Having a large buffer gives a very
105	slight performance improvement because it allows the order of
106	disk operations to be optimised. The buffer size is measured
107	in bytes.
108
109	diskstats = `off'
110	This is a boolean parameter that determines whether the disk
111	usage statistics for the preceding query will be displayed
112	after each query. This parameter may take the values `yes',
113	`no', `true', `false', `on' or `off'.
114
115	doc_sepstr = `---------------------------------- %n\n'
116	This specifies the string that will be used to separate
117	documents when they are displayed for `boolean' or `docnums'
118	queries. The standard C escape character sequences (see the
119	man page) may be used to place special characters in the
120	string. For example, a newline would the `\n'. To include a `%'
121	use the sequence `%%'. To include the MG document number use
122	the sequence `%n'.
123
124	expert = `false'
125	If this is true then a lot of the waffle that the program
126	spits out is suppressed. This parameter may take the values
127	`yes', `no', `true', `false', `on' or `off'.
128
129	hash_tbl_size = `1000'
130	One of the options during ranking queries is to use a hash
131	table to accumulate the weights for each document. The hash
132	table is a simple chained type. This parameter specifies the
133	size of the hash table and may take any value between 8 and
134	268435456.
135
136	heads_length = `50'
137	When the mode is `heads' this specifies the number of
138	characters that will be output for each document.
139
140	maxdocs = `all'
141	The maximum number of documents to display in response to a
142	query. This parameter may take on a numeric value between 1
143	and 429467295 or the word `all'.
144
145	maxparas = `1000'
146	The maximum number of paragraphs to identify during a ranked
147	query with paragraph indexing. After the paragraphs have been
148	identified the paragraphs are converted into documents, and
149	because some of the paragraphs may refer to the same documents
150	the final number of answers may be less that maxparas. The
151	maxdocs parameter will then be applied. This parameter may
152	take on a numeric value between 1 and 429467295.
153
154	max_accumulators = `50000'
155	This parameter limits the number of different paragraph/
156	document numbers to be accumulated during ranked queries when
157	the parameter `accumulator_method` is set to `splay_tree',
158	`hash_table', or `list'. This parameter may take any value
159	between 8 and 268435456.
160
161	max_terms = `all'
162	This parameter limits the number of terms that will actually
163	be used during a ranked query. If more terms than the number
164	specified by max_terms are entered, then the extra terms will
165	be discarded. If `sorted_terms' is on then the limiting will
166	be done after the terms have been sorted. This parameter may
167	take any value between 1 and 429467295 or the word `all'.
168
169	memstats = `off'
170	This is a boolean parameter that determines whether the memory
171	usage statistics for the preceding query will be displayed
172	after each query. This parameter may take the values `yes',
173	`no', `true', `false', `on' or `off'.
174
175	mgdir = `.'
176	This specifies the directory where the MG files may be found.
177	If the environment variable `MGDATA' is set then `mgdir' is
178	initialised to the value in `MGDATA'.
179
180	mgname = `'
181	This specifies the name of the MG database to process.
182
183	mode = `text'
184	This specifies how documents should be displayed when they
185	are retrieved it may take four different values `text',
186	`docnums', `silent', `heads' or `count'. `text' displays
187	the contents of the document. `docnums' displays only the
188	document numbers. `Silent' retrieves all the documents but
189	displays nothing except how many documents were retrieved.
190	This mode is intended to be used in timing experiments.
191	`Heads` is used to print out the head of each document.
192	`Count' does the minimum amount of work required to determine
193	how many documents would be retrieved, but does not retrieve
194	them.
195
196	pager = `more'
197	This is the name of the program that will be used to display
198	the help and the retrieved documents. If the environment
199	variable "PAGER" is defined then `pager' takes on that value.
200
201	para_sepstr = `\n######## PARAGRAPH %n ########\n'
202	This specifies the string that will be used to separate
203	paragraphs. The standard C escape character sequences (see the
204	man page) may be used to place special characters in the
205	string. For example, a newline would the `\n'. To include a `%'
206	use the sequence `%%'. To include the paragraph number within
207	the document use the sequence `%n'.
208
209	para_start = `*** Weight = %w ***\n'
210	This specifies the string that will be used at the head of
211	paragraphs for a paraghaph level index following a ranked query.
212	The standard C escape character sequences (see the man page)
213	may be used to place special characters in the string. For
214	example, a newline would the `\n'. To include a `%' use the
215	sequence `%%'. To include the paragraph weight use the
216	sequence `%w'.
217
218	qfreq = `true'
219	This determine whether the ranked queries will take into
220	account the number of times each query term is specified.
221	When this is `true' the number of times a term appears in
222	the query is used in the ranking. When this is `false' all
223	query term are assumed to occur only once. This parameter
224	may take the values `yes', `no', `true', `false', `on' or
225	`off'.
226
227	query = `boolean'
228	This specifies the type of queries that are to be specified.
229	It can take four different values `boolean', `ranked',
230	`docnums' or `approx-ranked'.
231
232	`boolean' is for boolean queries.
233	The yacc grammar for boolean queries is as follows :-
234
235	query : or;
236
237	or : or '\|' and
238	\| and ;
239
240	and : and '&' not
241	\| and not
242	\| not ;
243
244	not : term
245	\| '!' not ;
246
247	term : TERM
248	\| '(' or ')' ;
249
250
251
252
253	`ranked' and `approx-ranked' are for queries ranked by the
254	cosine measure. `approx-ranked' uses only the low
255	precision document lengths, and therefore only
256	produces an approximation to full cosine ranking.
257
258	query : TERM
259	\| query TERM ;
260
261	`docnums' allows the entry of document numbers. Multiple
262	numbers separated by spaces may be specified
263	or ranges separated by hyphens.
264
265
266	query : range
267	\| query range ;
268
269
270	range : num
271	\| num '-' num ;
272
273
274	ranked_doc_sepstr = `---------------------------------- %n %w\n'
275	This specifies the string that will be used to separate
276	documents when they are displayed for `ranked' or
277	`approx-ranked' queries. The standard C escape character
278	sequences (see the man page) may be used to place special
279	characters in the string. For example, a newline would the
280	`\n'. To include a `%' use the sequence `%%'. To include the
281	MG document number use the sequence `%n'. To include the
282	document weight use the sequence `%w'.
283
284	sizestats = `false'
285	If this is true then various numbers are output at the end
286	of each query indicating what went on during the query.
287	This parameter may take the values `yes', `no', `true',
288	`false', `on' or `off'.
289
290	skip_dump = `skips.%d'
291	If this parameter is set then during ranked queries on skipped
292	inverted files when `accumulator_method' is set to `splay_tree',
293	`hash_table', or `list` a file will be produced in the current
294	directory. The name of the file is the value of this parameter,
295	a `%d' in the file name will be replaced with the process id of
296	mgquery. This file will contain information about the usage of
297	skips during the query processing. This option is expensive;
298	use `.unset skip_dump' to obtain optimal performance.
299
300	sorted_terms = `on'
301	This specifies whether of not the terms should be sorted into
302	decreasing occurrence in documents so that the least often
303	occurring terms are processed first when ranked queries are
304	being done. When this is true the terms are sorted. When this
305	is false the terms are not sorted and are instead processed in
306	order of occurrence. This parameter may take the values `yes',
307	`no', `true', `false', `on' or `off'.
308
309
310	stop_at_max_accum = `on'
311	This specifies what should happen when the maximum number of
312	accumulators set by `max_accumulators' is reached. When this
313	is true the the processing of terms is stopped at the completion
314	of the current term. When this is false processing continues but
315	no new accumulators are created. This parameter may take the
316	values `yes', `no', `true', `false', `on' or `off'.
317
318	terminator = `'
319	This specifies the string that will be output after the last
320	document from the previous query has been output. The standard
321	C escape character sequences (see the man page) may be used to
322	place special characters in the string. For example, a newline
323	would the `\n'. To include a `%' use the sequence `%%'.
324
325
326	timestats = `false'
327	If this is true then the time to process a query is displayed
328	in both real time and CPU time. This parameter may take the
329	values `yes', `no', `true', `false', `on' or `off'.
330
331	verbatim = `off'
332	This is a boolean parameter that determines whether the program
333	should attempt to do a regular expression match on the retrieved
334	text. If verbatim is `on' and a post-processing strng is specified
335	with the query then the post-processing string will be searched for
336	in the documents just before they are displayed. If the string is
337	found the document will be displayed, if not the document will not
338	be displayed. If verbatim is `off' the post-processing string will
339	be considered a regular expression like in `vi' or `egrep'.
340	E.G. If verbatim is `on', "and.*the" will look for the 8 character
341	sequence "and.the". If verbatim is `off', "and.the" will
342	look for the sequence "and" followed somewhere later in the
343	document by the sequence "the".
344	This parameter may take the values `yes', `no', `true', `false',
345	`on' or `off'.
346
347

Note: See TracBrowser for help on using the repository browser.

Download in other formats: