Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

mgpp_passes.1@ 13620

Last change on this file since 13620 was 3365, checked in by kjdon, 22 years ago
Initial revision
Property svn:keywords set to `Author Date Id Revision`
File size: 6.7 KB

Rev	Line
[3365]	1	.\"------------------------------------------------------------
	2	.\" Id - set Rv,revision, and Dt, Date using rcs-Id tag.
	3	.de Id
	4	.ds Rv \\$3
	5	.ds Dt \\$4
	6	..
	7	.\"------------------------------------------------------------
	8	.TH mgpp_passes 1 \*(Dt CITRI
	9	.SH NAME
	10	mgpp_passes \- builds mgpp databases
	11	.SH SYNOPSIS
	12	.B mgpp_passes
	13	[
	14	.BI \-J " doc-tag"
	15	]
	16	[
	17	.BI \-K " level-tag"
	18	]
	19	.if n .ti +10n
	20	[
	21	.BI \-L " index-level"
	22	]
	23	[
	24	.BI \-m " invf-mem-buffer"
	25	]
	26	.if n .ti +10n
	27	[
	28	.B \-T1
	29	]
	30	[
	31	.B \-T2
	32	]
	33	[
	34	.B \-I1
	35	]
	36	[
	37	.B \-I2
	38	]
	39	[
	40	.B \-S
	41	]
	42	[
	43	.B \-C
	44	]
	45	.if n .ti +10n
	46	[
	47	.B \-h
	48	]
	49	[
	50	.BI \-d " directory"
	51	]
	52	.BI \-f " name"
	53	[
	54	.I filename(s)
	55	]
	56	.SH DESCRIPTION
	57	.B mgpp_passes
	58	is the program that does most of the work when building mgpp
	59	database systems. The input documents can come from either
	60	.I stdin
	61	or from a list of files on the command line. In general,
	62	.B mgpp_passes
	63	must be run twice to build a database, first with the
	64	.B \-T1
	65	and
	66	.B \-I1
	67	options, and second with the
	68	.B \-T2
	69	and
	70	.B \-I2
	71	options. Several other programs must be run in order to get an
	72	mgpp database. The
	73	.SB EXAMPLE
	74	section below gives an example of how to build a complete
	75	mgpp database.
	76	.SH OPTIONS
	77	Options may appear in any order, but the
	78	.IR filename(s) ,
	79	if specified, must be last.
	80	.TP "\w'\fB\-C\fP \fIcompstatpointt\fP'u+2n"
	81	.BI \-J " doc-tag"
	82	Specifies the SGML tag that encloses each document. Text appearing
	83	outside this tag is ignored. The document tag defines the highest
	84	level document that can be queried and printed. The default document
	85	tag is 'Document'.
	86	.TP
	87	.BI \-K " level-tag"
	88	Specifies the SGML tag of a sub document level. A level tag must
	89	enclose all text enclosed by the document tag. Levels can be
	90	queried and printed as if they were separate documents. Multiple
	91	document levels can be specified (the document tag is always
	92	added as a document level).
	93	.TP
	94	.BI \-L " index-level"
	95	Specifies the SGML tag enclosing the smallest indexed element. The
	96	index level should be no larger than the smallest document
	97	level. An empty string can be used to specify a word level index
	98	(which is the default).
	99	.TP
	100	.BI \-m " invf-mem-buffer"
	101	Maximum amount of memory to use for the pass-2 file inversion in
	102	megabytes. This option is only useful when used in conjunction with
	103	the option
	104	.BR \-I1 .
	105	The larger this value, the faster the pass-2 inversion will proceed.
	106	The default value is 5 MB.
	107	.TP
	108	.B \-T1
	109	Generate the
	110	.I *.text.stats
	111	file.
	112	.TP
	113	.B \-T2
	114	Generate the
	115	.IR *.text ,
	116	.IR *.text.idx ,
	117	.IR *.text.level ,
	118	and possibly the
	119	.I *.text.dict.aux
	120	files. Using this option requires that the
	121	.I *.text.dict
	122	file be present.
	123	.TP
	124	.B \-I1
	125	Generate the
	126	.IR *.invf.dict ,
	127	.IR *.invf.level ,
	128	.IR *.invf.chunk ,
	129	and
	130	.I *.invf.chunk.trans
	131	files.
	132	.TP
	133	.B \-I2
	134	Generate the
	135	.I *.invf
	136	and
	137	.I *.invf.idx
	138	files. Using this option requires
	139	that the
	140	.IR *.invf.dict.hash ,
	141	.IR *.invf.level ,
	142	.IR *.invf.chunk ,
	143	and
	144	.I *.invf.chunk.trans
	145	files be present. The
	146	.I *.invf.dict.hash
	147	file is generated by
	148	.BR mgpp_perf_hash_build (1)
	149	from the
	150	.I *.invf.dict
	151	file.
	152	.TP
	153	.B \-S
	154	This option causes a special pass to be executed. It is up to a user
	155	to modify
	156	.I mg.special.c
	157	in the source code to do something with the documents it is given.
	158	.TP
	159	.B \-C
	160	This activates the compatibility parsing mode. When using this
	161	mode documents are separated by control-B and paragraphs are separated
	162	by control-C. Internally these are converted to documents surrounded
	163	by 'Document' tags and paragraphs surrounded by 'Paragraph' tags.
	164	.TP
	165	.B \-h
	166	This displays a usage line on
	167	.IR stderr .
	168	.TP
	169	.BI \-d " directory"
	170	This specifies the directory where the document collection is to be
	171	written.
	172	.TP
	173	.BI \-f " name"
	174	This specifies the base name of the document collection that will be
	175	created.
	176	.TP
	177	.I filename(s)
	178	This specifies the source text. If this is not specified, then the
	179	program expects the source text from
	180	.IR stdin .
	181	.SH EXAMPLE
	182	What follows is a UNIX
	183	.BR csh (1)
	184	script as an example of how to build an mgpp document collection.
	185	.LP
	186	.nf
	187	.DT
	188	.ft B
	189	.I #! /bin/csh
	190	.I
	191	# The first argument on the command line specifies the
	192	.I
	193	# source of the text
	194	set source = ($1)
	195	.PP
	196	.I
	197	# The second argument is the name of the collection
	198	set text = ($2)
	199	.PP
	200	.I
	201	# Create .text.stats, .invf.dict, *.invf.level
	202	.I
	203	# .invf.chunk and .invf.chunks.trans
	204	${source} \| mgpp_passes -T1 -I1 -f ${text}
	205	.PP
	206	.I
	207	# Create *.text.dict
	208	mgpp_compression_dict -f ${text}
	209	.PP
	210	.I
	211	# Create *.invf.dict.hash
	212	mgpp_perf_hash_build -f ${text}
	213	.PP
	214	.I
	215	# Create .text, .text.idx, *.text.level
	216	.I
	217	# .invf and .invf.idx
	218	${source} \| mgpp_passes -T2 -I2 -f ${text}
	219	.PP
	220	.I
	221	# Create .text.weight and .weight.approx
	222	mgpp_weights_build -f ${text}
	223	.PP
	224	.I
	225	# Create *.invf.dict.blocked
	226	mgpp_invf_dict -f ${text}
	227	.PP
	228	.I
	229	# Create *.invf.dict.blocked.1
	230	mgpp_stem_idx -s 1 -f ${text}
	231	.PP
	232	.I
	233	# Create *.invf.dict.blocked.2
	234	mgpp_stem_idx -s 2 -f ${text}
	235	.PP
	236	.I
	237	# Create *.invf.dict.blocked.3
	238	mgpp_stem_idx -s 3 -f ${text}
	239	.PP
	240	.I
	241	# Create *.text.dict.fast
	242	mgpp_fast_comp_dict -f ${text}
	243	.ft R
	244	.fi
	245	.SH ENVIRONMENT
	246	.TP "\w'\fBMGDATA\fP'u+2n"
	247	.SB MGDATA
	248	If this environment variable exists, then its value is used as the
	249	default directory where the mgpp
	250	collection files are. If this variable does not exist, then the
	251	directory \(lq\fB.\fP\(rq is used by default. The command line
	252	option
	253	.BI \-d " directory"
	254	overrides the directory in
	255	.BR MGDATA .
	256	.SH FILES
	257	.TP 22
	258	.B *.invf
	259	Inverted file.
	260	.TP
	261	.B *.invf.chunk
	262	Inverted file chunk descriptor file. When the inverted file is
	263	created it is created in chunks that use no more than a set amount of
	264	memory. This file describes those chunks.
	265	.TP
	266	.B *.invf.chunk.trans
	267	Word-occurrence-order to lexical-order translation file. The
	268	.B *.invf.chunk
	269	file is written in word-occurrence order but is required by
	270	.B \-I2
	271	to be in lexical order.
	272	.TP
	273	.B *.invf.dict
	274	Compressed stemmed dictionary.
	275	.TP
	276	.B *.invf.dict.blocked
	277	Compressed stemmed dictionary with index into the dictionary.
	278	.TP
	279	.B *.invf.dict.blocked.n
	280	Transformation dictionary from words stemmed with method
	281	.B n
	282	to unstemmed words.
	283	.TP
	284	.B *.invf.dict.hash
	285	Data for an order-preserving perfect hash function.
	286	.TP
	287	.B *.invf.idx
	288	The index into the inverted file.
	289	.TP
	290	.B *.invf.level
	291	Information about the document levels needed for querying.
	292	.TP
	293	.B *.text
	294	Compressed text.
	295	.TP
	296	.B *.text.dict
	297	Compressed compression dictionary.
	298	.TP
	299	.B *.text.dict.fast
	300	A fast loading version of the compressed compression dictionary.
	301	.TP
	302	.B *.text.idx
	303	Index into the compressed documents.
	304	.TP
	305	.B *.text.level
	306	Information about the document levels needed for text decompression.
	307	.TP
	308	.B *.text.stats
	309	Statistics about the text.
	310	.TP
	311	.B *.weight
	312	The exact weights file.
	313	.TP
	314	.B *.weight.approx
	315	The approximate weights file.
	316	.SH "SEE ALSO"
	317	.na
	318	.BR mgpp_compression_dict (1),
	319	.BR mgpp_fast_comp_dict (1),
	320	.BR mgpp_invf_dict (1),
	321	.BR mgpp_perf_hash_build (1),
	322	.BR mgpp_stem_idx (1),
	323	.BR mgpp_weights_build (1)

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format