source: gsdl/trunk/trunk/mgpp/text/mgpp_stem_idx.1@ 16583

Last change on this file since 16583 was 16583, checked in by davidb, 16 years ago

Undoing change commited in r16582

  • Property svn:keywords set to Author Date Id Revision
File size: 2.2 KB
Line 
1.\"------------------------------------------------------------
2.\" Id - set Rv,revision, and Dt, Date using rcs-Id tag.
3.de Id
4.ds Rv \\$3
5.ds Dt \\$4
6..
7.\"------------------------------------------------------------
8.TH mgpp_stem_idx 1 \*(Dt CITRI
9.SH NAME
10mgpp_stem_idx \- builds a stem index file
11.SH SYNOPSIS
12.B mgpp_stem_idx
13[
14.B \-h
15]
16[
17.BI \-b " entries-per-block"
18]
19.if n .ti +12n
20[
21.BI \-a " stemmer"
22]
23[
24.BI \-d " directory"
25]
26.if n .ti +12n
27.B \-s 1|2|3
28.BI \-f " name"
29.SH DESCRIPTION
30.B mgpp_stem_idx
31generates a stem index file for a collection.
32This program should be called three times: once for each
33.B -s
34parameter. It uses the stemmed dictionary to create the stem index which
35contains pointers into the stemmed dictionary.
36.SH OPTIONS
37Options may appear in any order.
38.TP "\w'\fB\-d\fP \fIdirectoryyyyyyyy\fP'u+2n"
39.B \-h
40This displays a usage line on
41.IR stderr .
42.TP
43.BI \-b " entries-per-block"
44The dictionary is stored in blocks on disk; this option is used to set
45the number of entries per block. The default is 16.
46.TP
47.BI \-a " stemmer"
48The name of the stemmer to use, the default is the Lovin stemmer.
49.TP
50.B -s 1|2|3
51The stem method to apply for the stem index.
52.br
531 = casefolded and non-stemmed
54.br
552 = non-casefolded and stemmed
56.br
573 = casefolded and stemmed
58.TP
59.BI \-d " directory"
60This specifies the directory where the document collection can be found.
61.TP
62.BI \-f " name"
63This specifies the base name of the document collection.
64.SH ENVIRONMENT
65.TP "\w'\fBMGDATA\fP'u+2n"
66.SB MGDATA
67If this environment variable exists, then its value is used as the
68default directory where the mgpp
69collection files are. If this variable does not exist, then the
70directory \*(lq\fB.\fP\*(rq is used by default. The command line
71option
72.BI \-d " directory"
73overrides the directory in
74.BR MGDATA .
75.SH FILES
76.TP 22
77.B *.invf.dict
78Compressed stemmed dictionary.
79.TP
80.B *.invf.dict.blocked.1
81Stem index with stem index method 1.
82.TP
83.B *.invf.dict.blocked.2
84Stem index with stem index method 2.
85.TP
86.B *.invf.dict.blocked.3
87Stem index with stem index method 3.
88.SH "SEE ALSO"
89.na
90.BR mgpp_compression_dict (1),
91.BR mgpp_fast_comp_dict (1),
92.BR mgpp_invf_dict (1),
93.BR mgpp_passes (1),
94.BR mgpp_perf_hash_build (1),
95.BR mgpp_weights_build (1)
Note: See TracBrowser for help on using the repository browser.