Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

mg_passes.1@ 856

Last change on this file since 856 was 856, checked in by sjboddie, 24 years ago
Rodgers new C++ mg
Property svn:executable set to ``* Property svn:keywords set to `Author Date Id Revision`
File size: 6.7 KB

Line
1	.\"------------------------------------------------------------
2	.\" Id - set Rv,revision, and Dt, Date using rcs-Id tag.
3	.de Id
4	.ds Rv \\$3
5	.ds Dt \\$4
6	..
7	.Id $Id: mg_passes.1 856 2000-01-14 02:26:25Z sjboddie $
8	.\"------------------------------------------------------------
9	.TH mg_passes 1 \*(Dt CITRI
10	.SH NAME
11	mg_passes \- builds mg databases
12	.SH SYNOPSIS
13	.B mg_passes
14	[
15	.BI \-J " doc-tag"
16	]
17	[
18	.BI \-K " level-tag"
19	]
20	.if n .ti +10n
21	[
22	.BI \-L " index-level"
23	]
24	[
25	.BI \-m " invf-mem-buffer"
26	]
27	.if n .ti +10n
28	[
29	.B \-T1
30	]
31	[
32	.B \-T2
33	]
34	[
35	.B \-I1
36	]
37	[
38	.B \-I2
39	]
40	[
41	.B \-S
42	]
43	[
44	.B \-C
45	]
46	.if n .ti +10n
47	[
48	.B \-h
49	]
50	[
51	.BI \-d " directory"
52	]
53	.BI \-f " name"
54	[
55	.I filename(s)
56	]
57	.SH DESCRIPTION
58	.B mg_passes
59	is the program that does most of the work when building mg
60	database systems. The input documents can come from either
61	.I stdin
62	or from a list of files on the command line. In general,
63	.B mg_passes
64	must be run twice to build a database, first with the
65	.B \-T1
66	and
67	.B \-I1
68	options, and second with the
69	.B \-T2
70	and
71	.B \-I2
72	options. Several other programs must be run in order to get an
73	mg database. The
74	.SB EXAMPLE
75	section below gives an example of how to build a complete
76	mg database.
77	.SH OPTIONS
78	Options may appear in any order, but the
79	.IR filename(s) ,
80	if specified, must be last.
81	.TP "\w'\fB\-C\fP \fIcompstatpointt\fP'u+2n"
82	.BI \-J " doc-tag"
83	Specifies the SGML tag that encloses each document. Text appearing
84	outside this tag is ignored. The document tag defines the highest
85	level document that can be queried and printed. The default document
86	tag is 'Document'.
87	.TP
88	.BI \-K " level-tag"
89	Specifies the SGML tag of a sub document level. A level tag must
90	enclose all text enclosed by the document tag. Levels can be
91	queried and printed as if they were separate documents. Multiple
92	document levels can be specified (the document tag is always
93	added as a document level).
94	.TP
95	.BI \-L " index-level"
96	Specifies the SGML tag enclosing the smallest indexed element. The
97	index level should be no larger than the smallest document
98	level. An empty string can be used to specify a word level index
99	(which is the default).
100	.TP
101	.BI \-m " invf-mem-buffer"
102	Maximum amount of memory to use for the pass-2 file inversion in
103	megabytes. This option is only useful when used in conjunction with
104	the option
105	.BR \-I1 .
106	The larger this value, the faster the pass-2 inversion will proceed.
107	The default value is 5 MB.
108	.TP
109	.B \-T1
110	Generate the
111	.I *.text.stats
112	file.
113	.TP
114	.B \-T2
115	Generate the
116	.IR *.text ,
117	.IR *.text.idx ,
118	.IR *.text.level ,
119	and possibly the
120	.I *.text.dict.aux
121	files. Using this option requires that the
122	.I *.text.dict
123	file be present.
124	.TP
125	.B \-I1
126	Generate the
127	.IR *.invf.dict ,
128	.IR *.invf.level ,
129	.IR *.invf.chunk ,
130	and
131	.I *.invf.chunk.trans
132	files.
133	.TP
134	.B \-I2
135	Generate the
136	.I *.invf
137	and
138	.I *.invf.idx
139	files. Using this option requires
140	that the
141	.IR *.invf.dict.hash ,
142	.IR *.invf.level ,
143	.IR *.invf.chunk ,
144	and
145	.I *.invf.chunk.trans
146	files be present. The
147	.I *.invf.dict.hash
148	file is generated by
149	.BR mg_perf_hash_build (1)
150	from the
151	.I *.invf.dict
152	file.
153	.TP
154	.B \-S
155	This option causes a special pass to be executed. It is up to a user
156	to modify
157	.I mg.special.c
158	in the source code to do something with the documents it is given.
159	.TP
160	.B \-C
161	This activates the compatibility parsing mode. When using this
162	mode documents are separated by control-B and paragraphs are separated
163	by control-C. Internally these are converted to documents surrounded
164	by 'Document' tags and paragraphs surrounded by 'Paragraph' tags.
165	.TP
166	.B \-h
167	This displays a usage line on
168	.IR stderr .
169	.TP
170	.BI \-d " directory"
171	This specifies the directory where the document collection is to be
172	written.
173	.TP
174	.BI \-f " name"
175	This specifies the base name of the document collection that will be
176	created.
177	.TP
178	.I filename(s)
179	This specifies the source text. If this is not specified, then the
180	program expects the source text from
181	.IR stdin .
182	.SH EXAMPLE
183	What follows is a UNIX
184	.BR csh (1)
185	script as an example of how to build an mg document collection.
186	.LP
187	.nf
188	.DT
189	.ft B
190	.I #! /bin/csh
191	.I
192	# The first argument on the command line specifies the
193	.I
194	# source of the text
195	set source = ($1)
196	.PP
197	.I
198	# The second argument is the name of the collection
199	set text = ($2)
200	.PP
201	.I
202	# Create .text.stats, .invf.dict, *.invf.level
203	.I
204	# .invf.chunk and .invf.chunks.trans
205	${source} \| mg_passes -T1 -I1 -f ${text}
206	.PP
207	.I
208	# Create *.text.dict
209	mg_compression_dict -f ${text}
210	.PP
211	.I
212	# Create *.invf.dict.hash
213	mg_perf_hash_build -f ${text}
214	.PP
215	.I
216	# Create .text, .text.idx, *.text.level
217	.I
218	# .invf and .invf.idx
219	${source} \| mg_passes -T2 -I2 -f ${text}
220	.PP
221	.I
222	# Create .text.weight and .weight.approx
223	mg_weights_build -f ${text}
224	.PP
225	.I
226	# Create *.invf.dict.blocked
227	mg_invf_dict -f ${text}
228	.PP
229	.I
230	# Create *.invf.dict.blocked.1
231	mg_stem_idx -s 1 -f ${text}
232	.PP
233	.I
234	# Create *.invf.dict.blocked.2
235	mg_stem_idx -s 2 -f ${text}
236	.PP
237	.I
238	# Create *.invf.dict.blocked.3
239	mg_stem_idx -s 3 -f ${text}
240	.PP
241	.I
242	# Create *.text.dict.fast
243	mg_fast_comp_dict -f ${text}
244	.ft R
245	.fi
246	.SH ENVIRONMENT
247	.TP "\w'\fBMGDATA\fP'u+2n"
248	.SB MGDATA
249	If this environment variable exists, then its value is used as the
250	default directory where the mg
251	collection files are. If this variable does not exist, then the
252	directory \(lq\fB.\fP\(rq is used by default. The command line
253	option
254	.BI \-d " directory"
255	overrides the directory in
256	.BR MGDATA .
257	.SH FILES
258	.TP 22
259	.B *.invf
260	Inverted file.
261	.TP
262	.B *.invf.chunk
263	Inverted file chunk descriptor file. When the inverted file is
264	created it is created in chunks that use no more than a set amount of
265	memory. This file describes those chunks.
266	.TP
267	.B *.invf.chunk.trans
268	Word-occurrence-order to lexical-order translation file. The
269	.B *.invf.chunk
270	file is written in word-occurrence order but is required by
271	.B \-I2
272	to be in lexical order.
273	.TP
274	.B *.invf.dict
275	Compressed stemmed dictionary.
276	.TP
277	.B *.invf.dict.blocked
278	Compressed stemmed dictionary with index into the dictionary.
279	.TP
280	.B *.invf.dict.blocked.n
281	Transformation dictionary from words stemmed with method
282	.B n
283	to unstemmed words.
284	.TP
285	.B *.invf.dict.hash
286	Data for an order-preserving perfect hash function.
287	.TP
288	.B *.invf.idx
289	The index into the inverted file.
290	.TP
291	.B *.invf.level
292	Information about the document levels needed for querying.
293	.TP
294	.B *.text
295	Compressed text.
296	.TP
297	.B *.text.dict
298	Compressed compression dictionary.
299	.TP
300	.B *.text.dict.fast
301	A fast loading version of the compressed compression dictionary.
302	.TP
303	.B *.text.idx
304	Index into the compressed documents.
305	.TP
306	.B *.text.level
307	Information about the document levels needed for text decompression.
308	.TP
309	.B *.text.stats
310	Statistics about the text.
311	.TP
312	.B *.weight
313	The exact weights file.
314	.TP
315	.B *.weight.approx
316	The approximate weights file.
317	.SH "SEE ALSO"
318	.na
319	.BR mg_compression_dict (1),
320	.BR mg_fast_comp_dict (1),
321	.BR mg_invf_dict (1),
322	.BR mg_perf_hash_build (1),
323	.BR mg_stem_idx (1),
324	.BR mg_weights_build (1)

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format