source: gsdl/trunk/trunk/mg/src/text/help.mg.src@ 16583

Last change on this file since 16583 was 16583, checked in by davidb, 16 years ago

Undoing change commited in r16582

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 14.1 KB
Line 
1###########################################################################
2#
3# help.mg.src -- Source for the help command
4# Copyright (C) 1994 Neil Sharman
5#
6# This program is free software; you can redistribute it and/or modify
7# it under the terms of the GNU General Public License as published by
8# the Free Software Foundation; either version 2 of the License, or
9# (at your option) any later version.
10#
11# This program is distributed in the hope that it will be useful,
12# but WITHOUT ANY WARRANTY; without even the implied warranty of
13# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14# GNU General Public License for more details.
15#
16# You should have received a copy of the GNU General Public License
17# along with this program; if not, write to the Free Software
18# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
19#
20# @(#)help.mg.src 1.8 21 Mar 1994
21#
22###########################################################################
23#
24# The help file for mgquery.
25# Lines starting with '#' are treated as comments and are discarded when
26# the 'help.mg.h' file is produced
27#
28###########################################################################
29
30 HELP for mgquery
31 ================================
32
33The text is a summary of the information in the "mgquery" manual pages.
34
35The input to 'mgquery' consists of a series of input lines. The backslash
36character ("\") is used at the end of lines to indicate that input
37continues on the next line.
38
39Input lines on which the first character is a dot (".") are commands to
40the mgquery program. Input lines that do not start with a dot are queries.
41
42A query consists of two parts. One part is a boolean or ranked query that
43identifies documents. The second part is a post-processing pattern matching
44operation. Any text between the first speech mark (") and the last speech
45mark is considered to the a post-processing pattern.
46
47
48The following command are available :-
49
50 .help - displays this text.
51 .quit - quits the program.
52 .set name value - sets parameter "name" to "value" . If the parameter
53 is a boolean parameter and value is omitted the
54 parameter will be inverted (i.e. if it is true it
55 will change to false, if it is false it will change
56 to true).
57 .unset name - deletes parameter "name"
58 .reset - sets all the parameters to their initial state.
59 .display - displays the values of all the current parameters.
60 .push - pushes the current parameters on to a stack.
61 .pop - destroys the current parameters and pops a new set
62 of parameters off the stack.
63 .output arg - This is used to specify where to send the documents.
64 Arg may one be of the following:
65 > filename : Send output to the specified file.
66 >> filename : Append output to the specified file.
67 | command : The output is piped into command,
68 which is executed by sh.
69 .input arg - This is used to specify where input comes from.
70 Arg may one be of the following:
71 < filename : Get the input from the specified
72 file.
73 | command : The input comes from the standard
74 output of command, which is executed
75 by sh.
76
77On startup the mgquery program reads from the file .mgrc a sequence
78of commands (NOTE: The .mgrc file may not contain any queries). mgquery
79first looks for .mgrc in the current directory and then in the users home
80directory. Lines starting with a '#' in the .mgrc file are considered to
81be comments and are ignored.
82
83The following parameters (used in the .set and .unset commands) are
84predefined and have special significance :-
85
86accumulator_method = `array'
87 This parameter is used during ranking, and specifies how the
88 weight for each document should be accumulated. The following
89 methods are available `array', `splay_tree', `hash_table', and
90 `list'.
91
92briefstats = `off'
93 This is a boolean parameter that determines whether the the
94 totals for disk, memory and time usage statistics will be
95 displayed. at the end of each query.
96 NOTE: this takes precedence over the parameters "diskstats",
97 "memstats" and "timestats". This parameter may take the values
98 `yes', `no', `true', `false', `on' or `off'.
99
100buffer = `1048576'
101 When the documents are being read in they are read into a
102 buffer of this size and then displayed from this buffer. If
103 the documents are larger than this buffer the buffer is
104 expanded automatically. Having a large buffer gives a very
105 slight performance improvement because it allows the order of
106 disk operations to be optimised. The buffer size is measured
107 in bytes.
108
109diskstats = `off'
110 This is a boolean parameter that determines whether the disk
111 usage statistics for the preceding query will be displayed
112 after each query. This parameter may take the values `yes',
113 `no', `true', `false', `on' or `off'.
114
115doc_sepstr = `---------------------------------- %n\n'
116 This specifies the string that will be used to separate
117 documents when they are displayed for `boolean' or `docnums'
118 queries. The standard C escape character sequences (see the
119 man page) may be used to place special characters in the
120 string. For example, a newline would the `\n'. To include a `%'
121 use the sequence `%%'. To include the MG document number use
122 the sequence `%n'.
123
124expert = `false'
125 If this is true then a lot of the waffle that the program
126 spits out is suppressed. This parameter may take the values
127 `yes', `no', `true', `false', `on' or `off'.
128
129hash_tbl_size = `1000'
130 One of the options during ranking queries is to use a hash
131 table to accumulate the weights for each document. The hash
132 table is a simple chained type. This parameter specifies the
133 size of the hash table and may take any value between 8 and
134 268435456.
135
136heads_length = `50'
137 When the mode is `heads' this specifies the number of
138 characters that will be output for each document.
139
140maxdocs = `all'
141 The maximum number of documents to display in response to a
142 query. This parameter may take on a numeric value between 1
143 and 429467295 or the word `all'.
144
145maxparas = `1000'
146 The maximum number of paragraphs to identify during a ranked
147 query with paragraph indexing. After the paragraphs have been
148 identified the paragraphs are converted into documents, and
149 because some of the paragraphs may refer to the same documents
150 the final number of answers may be less that maxparas. The
151 maxdocs parameter will then be applied. This parameter may
152 take on a numeric value between 1 and 429467295.
153
154max_accumulators = `50000'
155 This parameter limits the number of different paragraph/
156 document numbers to be accumulated during ranked queries when
157 the parameter `accumulator_method` is set to `splay_tree',
158 `hash_table', or `list'. This parameter may take any value
159 between 8 and 268435456.
160
161max_terms = `all'
162 This parameter limits the number of terms that will actually
163 be used during a ranked query. If more terms than the number
164 specified by max_terms are entered, then the extra terms will
165 be discarded. If `sorted_terms' is on then the limiting will
166 be done after the terms have been sorted. This parameter may
167 take any value between 1 and 429467295 or the word `all'.
168
169memstats = `off'
170 This is a boolean parameter that determines whether the memory
171 usage statistics for the preceding query will be displayed
172 after each query. This parameter may take the values `yes',
173 `no', `true', `false', `on' or `off'.
174
175mgdir = `.'
176 This specifies the directory where the MG files may be found.
177 If the environment variable `MGDATA' is set then `mgdir' is
178 initialised to the value in `MGDATA'.
179
180mgname = `'
181 This specifies the name of the MG database to process.
182
183mode = `text'
184 This specifies how documents should be displayed when they
185 are retrieved it may take four different values `text',
186 `docnums', `silent', `heads' or `count'. `text' displays
187 the contents of the document. `docnums' displays only the
188 document numbers. `Silent' retrieves all the documents but
189 displays nothing except how many documents were retrieved.
190 This mode is intended to be used in timing experiments.
191 `Heads` is used to print out the head of each document.
192 `Count' does the minimum amount of work required to determine
193 how many documents would be retrieved, but does not retrieve
194 them.
195
196pager = `more'
197 This is the name of the program that will be used to display
198 the help and the retrieved documents. If the environment
199 variable "PAGER" is defined then `pager' takes on that value.
200
201para_sepstr = `\n######## PARAGRAPH %n ########\n'
202 This specifies the string that will be used to separate
203 paragraphs. The standard C escape character sequences (see the
204 man page) may be used to place special characters in the
205 string. For example, a newline would the `\n'. To include a `%'
206 use the sequence `%%'. To include the paragraph number within
207 the document use the sequence `%n'.
208
209para_start = `***** Weight = %w *****\n'
210 This specifies the string that will be used at the head of
211 paragraphs for a paraghaph level index following a ranked query.
212 The standard C escape character sequences (see the man page)
213 may be used to place special characters in the string. For
214 example, a newline would the `\n'. To include a `%' use the
215 sequence `%%'. To include the paragraph weight use the
216 sequence `%w'.
217
218qfreq = `true'
219 This determine whether the ranked queries will take into
220 account the number of times each query term is specified.
221 When this is `true' the number of times a term appears in
222 the query is used in the ranking. When this is `false' all
223 query term are assumed to occur only once. This parameter
224 may take the values `yes', `no', `true', `false', `on' or
225 `off'.
226
227query = `boolean'
228 This specifies the type of queries that are to be specified.
229 It can take four different values `boolean', `ranked',
230 `docnums' or `approx-ranked'.
231
232 `boolean' is for boolean queries.
233 The yacc grammar for boolean queries is as follows :-
234
235 query : or;
236
237 or : or '|' and
238 | and ;
239
240 and : and '&' not
241 | and not
242 | not ;
243
244 not : term
245 | '!' not ;
246
247 term : TERM
248 | '(' or ')' ;
249
250
251
252
253 `ranked' and `approx-ranked' are for queries ranked by the
254 cosine measure. `approx-ranked' uses only the low
255 precision document lengths, and therefore only
256 produces an approximation to full cosine ranking.
257
258 query : TERM
259 | query TERM ;
260
261 `docnums' allows the entry of document numbers. Multiple
262 numbers separated by spaces may be specified
263 or ranges separated by hyphens.
264
265
266 query : range
267 | query range ;
268
269
270 range : num
271 | num '-' num ;
272
273
274ranked_doc_sepstr = `---------------------------------- %n %w\n'
275 This specifies the string that will be used to separate
276 documents when they are displayed for `ranked' or
277 `approx-ranked' queries. The standard C escape character
278 sequences (see the man page) may be used to place special
279 characters in the string. For example, a newline would the
280 `\n'. To include a `%' use the sequence `%%'. To include the
281 MG document number use the sequence `%n'. To include the
282 document weight use the sequence `%w'.
283
284sizestats = `false'
285 If this is true then various numbers are output at the end
286 of each query indicating what went on during the query.
287 This parameter may take the values `yes', `no', `true',
288 `false', `on' or `off'.
289
290skip_dump = `skips.%d'
291 If this parameter is set then during ranked queries on skipped
292 inverted files when `accumulator_method' is set to `splay_tree',
293 `hash_table', or `list` a file will be produced in the current
294 directory. The name of the file is the value of this parameter,
295 a `%d' in the file name will be replaced with the process id of
296 mgquery. This file will contain information about the usage of
297 skips during the query processing. This option is expensive;
298 use `.unset skip_dump' to obtain optimal performance.
299
300sorted_terms = `on'
301 This specifies whether of not the terms should be sorted into
302 decreasing occurrence in documents so that the least often
303 occurring terms are processed first when ranked queries are
304 being done. When this is true the terms are sorted. When this
305 is false the terms are not sorted and are instead processed in
306 order of occurrence. This parameter may take the values `yes',
307 `no', `true', `false', `on' or `off'.
308
309
310stop_at_max_accum = `on'
311 This specifies what should happen when the maximum number of
312 accumulators set by `max_accumulators' is reached. When this
313 is true the the processing of terms is stopped at the completion
314 of the current term. When this is false processing continues but
315 no new accumulators are created. This parameter may take the
316 values `yes', `no', `true', `false', `on' or `off'.
317
318terminator = `'
319 This specifies the string that will be output after the last
320 document from the previous query has been output. The standard
321 C escape character sequences (see the man page) may be used to
322 place special characters in the string. For example, a newline
323 would the `\n'. To include a `%' use the sequence `%%'.
324
325
326timestats = `false'
327 If this is true then the time to process a query is displayed
328 in both real time and CPU time. This parameter may take the
329 values `yes', `no', `true', `false', `on' or `off'.
330
331verbatim = `off'
332 This is a boolean parameter that determines whether the program
333 should attempt to do a regular expression match on the retrieved
334 text. If verbatim is `on' and a post-processing strng is specified
335 with the query then the post-processing string will be searched for
336 in the documents just before they are displayed. If the string is
337 found the document will be displayed, if not the document will not
338 be displayed. If verbatim is `off' the post-processing string will
339 be considered a regular expression like in `vi' or `egrep'.
340 E.G. If verbatim is `on', "and.*the" will look for the 8 character
341 sequence "and.*the". If verbatim is `off', "and.*the" will
342 look for the sequence "and" followed somewhere later in the
343 document by the sequence "the".
344 This parameter may take the values `yes', `no', `true', `false',
345 `on' or `off'.
346
347
Note: See TracBrowser for help on using the repository browser.