Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

mgpp_passes.1@ 28439

Last change on this file since 28439 was 16583, checked in by davidb, 16 years ago
Undoing change commited in r16582
Property svn:keywords set to `Author Date Id Revision`
File size: 6.7 KB

Line
1	.\"------------------------------------------------------------
2	.\" Id - set Rv,revision, and Dt, Date using rcs-Id tag.
3	.de Id
4	.ds Rv \\$3
5	.ds Dt \\$4
6	..
7	.\"------------------------------------------------------------
8	.TH mgpp_passes 1 \*(Dt CITRI
9	.SH NAME
10	mgpp_passes \- builds mgpp databases
11	.SH SYNOPSIS
12	.B mgpp_passes
13	[
14	.BI \-J " doc-tag"
15	]
16	[
17	.BI \-K " level-tag"
18	]
19	.if n .ti +10n
20	[
21	.BI \-L " index-level"
22	]
23	[
24	.BI \-m " invf-mem-buffer"
25	]
26	.if n .ti +10n
27	[
28	.B \-T1
29	]
30	[
31	.B \-T2
32	]
33	[
34	.B \-I1
35	]
36	[
37	.B \-I2
38	]
39	[
40	.B \-S
41	]
42	[
43	.B \-C
44	]
45	.if n .ti +10n
46	[
47	.B \-h
48	]
49	[
50	.BI \-d " directory"
51	]
52	.BI \-f " name"
53	[
54	.I filename(s)
55	]
56	.SH DESCRIPTION
57	.B mgpp_passes
58	is the program that does most of the work when building mgpp
59	database systems. The input documents can come from either
60	.I stdin
61	or from a list of files on the command line. In general,
62	.B mgpp_passes
63	must be run twice to build a database, first with the
64	.B \-T1
65	and
66	.B \-I1
67	options, and second with the
68	.B \-T2
69	and
70	.B \-I2
71	options. Several other programs must be run in order to get an
72	mgpp database. The
73	.SB EXAMPLE
74	section below gives an example of how to build a complete
75	mgpp database.
76	.SH OPTIONS
77	Options may appear in any order, but the
78	.IR filename(s) ,
79	if specified, must be last.
80	.TP "\w'\fB\-C\fP \fIcompstatpointt\fP'u+2n"
81	.BI \-J " doc-tag"
82	Specifies the SGML tag that encloses each document. Text appearing
83	outside this tag is ignored. The document tag defines the highest
84	level document that can be queried and printed. The default document
85	tag is 'Document'.
86	.TP
87	.BI \-K " level-tag"
88	Specifies the SGML tag of a sub document level. A level tag must
89	enclose all text enclosed by the document tag. Levels can be
90	queried and printed as if they were separate documents. Multiple
91	document levels can be specified (the document tag is always
92	added as a document level).
93	.TP
94	.BI \-L " index-level"
95	Specifies the SGML tag enclosing the smallest indexed element. The
96	index level should be no larger than the smallest document
97	level. An empty string can be used to specify a word level index
98	(which is the default).
99	.TP
100	.BI \-m " invf-mem-buffer"
101	Maximum amount of memory to use for the pass-2 file inversion in
102	megabytes. This option is only useful when used in conjunction with
103	the option
104	.BR \-I1 .
105	The larger this value, the faster the pass-2 inversion will proceed.
106	The default value is 5 MB.
107	.TP
108	.B \-T1
109	Generate the
110	.I *.text.stats
111	file.
112	.TP
113	.B \-T2
114	Generate the
115	.IR *.text ,
116	.IR *.text.idx ,
117	.IR *.text.level ,
118	and possibly the
119	.I *.text.dict.aux
120	files. Using this option requires that the
121	.I *.text.dict
122	file be present.
123	.TP
124	.B \-I1
125	Generate the
126	.IR *.invf.dict ,
127	.IR *.invf.level ,
128	.IR *.invf.chunk ,
129	and
130	.I *.invf.chunk.trans
131	files.
132	.TP
133	.B \-I2
134	Generate the
135	.I *.invf
136	and
137	.I *.invf.idx
138	files. Using this option requires
139	that the
140	.IR *.invf.dict.hash ,
141	.IR *.invf.level ,
142	.IR *.invf.chunk ,
143	and
144	.I *.invf.chunk.trans
145	files be present. The
146	.I *.invf.dict.hash
147	file is generated by
148	.BR mgpp_perf_hash_build (1)
149	from the
150	.I *.invf.dict
151	file.
152	.TP
153	.B \-S
154	This option causes a special pass to be executed. It is up to a user
155	to modify
156	.I mg.special.c
157	in the source code to do something with the documents it is given.
158	.TP
159	.B \-C
160	This activates the compatibility parsing mode. When using this
161	mode documents are separated by control-B and paragraphs are separated
162	by control-C. Internally these are converted to documents surrounded
163	by 'Document' tags and paragraphs surrounded by 'Paragraph' tags.
164	.TP
165	.B \-h
166	This displays a usage line on
167	.IR stderr .
168	.TP
169	.BI \-d " directory"
170	This specifies the directory where the document collection is to be
171	written.
172	.TP
173	.BI \-f " name"
174	This specifies the base name of the document collection that will be
175	created.
176	.TP
177	.I filename(s)
178	This specifies the source text. If this is not specified, then the
179	program expects the source text from
180	.IR stdin .
181	.SH EXAMPLE
182	What follows is a UNIX
183	.BR csh (1)
184	script as an example of how to build an mgpp document collection.
185	.LP
186	.nf
187	.DT
188	.ft B
189	.I #! /bin/csh
190	.I
191	# The first argument on the command line specifies the
192	.I
193	# source of the text
194	set source = ($1)
195	.PP
196	.I
197	# The second argument is the name of the collection
198	set text = ($2)
199	.PP
200	.I
201	# Create .text.stats, .invf.dict, *.invf.level
202	.I
203	# .invf.chunk and .invf.chunks.trans
204	${source} \| mgpp_passes -T1 -I1 -f ${text}
205	.PP
206	.I
207	# Create *.text.dict
208	mgpp_compression_dict -f ${text}
209	.PP
210	.I
211	# Create *.invf.dict.hash
212	mgpp_perf_hash_build -f ${text}
213	.PP
214	.I
215	# Create .text, .text.idx, *.text.level
216	.I
217	# .invf and .invf.idx
218	${source} \| mgpp_passes -T2 -I2 -f ${text}
219	.PP
220	.I
221	# Create .text.weight and .weight.approx
222	mgpp_weights_build -f ${text}
223	.PP
224	.I
225	# Create *.invf.dict.blocked
226	mgpp_invf_dict -f ${text}
227	.PP
228	.I
229	# Create *.invf.dict.blocked.1
230	mgpp_stem_idx -s 1 -f ${text}
231	.PP
232	.I
233	# Create *.invf.dict.blocked.2
234	mgpp_stem_idx -s 2 -f ${text}
235	.PP
236	.I
237	# Create *.invf.dict.blocked.3
238	mgpp_stem_idx -s 3 -f ${text}
239	.PP
240	.I
241	# Create *.text.dict.fast
242	mgpp_fast_comp_dict -f ${text}
243	.ft R
244	.fi
245	.SH ENVIRONMENT
246	.TP "\w'\fBMGDATA\fP'u+2n"
247	.SB MGDATA
248	If this environment variable exists, then its value is used as the
249	default directory where the mgpp
250	collection files are. If this variable does not exist, then the
251	directory \(lq\fB.\fP\(rq is used by default. The command line
252	option
253	.BI \-d " directory"
254	overrides the directory in
255	.BR MGDATA .
256	.SH FILES
257	.TP 22
258	.B *.invf
259	Inverted file.
260	.TP
261	.B *.invf.chunk
262	Inverted file chunk descriptor file. When the inverted file is
263	created it is created in chunks that use no more than a set amount of
264	memory. This file describes those chunks.
265	.TP
266	.B *.invf.chunk.trans
267	Word-occurrence-order to lexical-order translation file. The
268	.B *.invf.chunk
269	file is written in word-occurrence order but is required by
270	.B \-I2
271	to be in lexical order.
272	.TP
273	.B *.invf.dict
274	Compressed stemmed dictionary.
275	.TP
276	.B *.invf.dict.blocked
277	Compressed stemmed dictionary with index into the dictionary.
278	.TP
279	.B *.invf.dict.blocked.n
280	Transformation dictionary from words stemmed with method
281	.B n
282	to unstemmed words.
283	.TP
284	.B *.invf.dict.hash
285	Data for an order-preserving perfect hash function.
286	.TP
287	.B *.invf.idx
288	The index into the inverted file.
289	.TP
290	.B *.invf.level
291	Information about the document levels needed for querying.
292	.TP
293	.B *.text
294	Compressed text.
295	.TP
296	.B *.text.dict
297	Compressed compression dictionary.
298	.TP
299	.B *.text.dict.fast
300	A fast loading version of the compressed compression dictionary.
301	.TP
302	.B *.text.idx
303	Index into the compressed documents.
304	.TP
305	.B *.text.level
306	Information about the document levels needed for text decompression.
307	.TP
308	.B *.text.stats
309	Statistics about the text.
310	.TP
311	.B *.weight
312	The exact weights file.
313	.TP
314	.B *.weight.approx
315	The approximate weights file.
316	.SH "SEE ALSO"
317	.na
318	.BR mgpp_compression_dict (1),
319	.BR mgpp_fast_comp_dict (1),
320	.BR mgpp_invf_dict (1),
321	.BR mgpp_perf_hash_build (1),
322	.BR mgpp_stem_idx (1),
323	.BR mgpp_weights_build (1)

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format