.\"------------------------------------------------------------ .\" Id - set Rv,revision, and Dt, Date using rcs-Id tag. .de Id .ds Rv \\$3 .ds Dt \\$4 .. .Id $Id: mgstat.1 3745 2003-02-20 21:20:24Z mdewsnip $ .\"------------------------------------------------------------ .TH mgstat 1 \*(Dt CITRI .SH NAME mgstat \- print out statistics about a document collection .SH SYNOPSIS .B mgstat [ .B \-h ] [ .B \-E ] [ .BI \-d " directory" ] .BI \-f " name" .SH DESCRIPTION .B mgstat prints out various statistics about an existing .BR mg (1) document collection. Depending on the size of the collection, sizes will be printed in either kilobytes or megabytes. .SH OPTIONS Options may appear in any order. .TP "\w'\fB\-d\fP \fIdirectory\fP'u+2n" .B \-h This displays a usage line on .IR stdout . .TP .B \-E This option forces sizes to be printed in bytes rather than kilobytes or megabytes. .TP .BI \-d " directory" This specifies the directory where the document collection can be found. .TP .BI \-f " name" This specifies the base name of the document collection. .SH ENVIRONMENT .TP "\w'\fBMGDATA\fP'u+2n" .SB MGDATA If this environment variable exists, then its value is used as the default directory where the .BR mg (1) collection files are. If this variable does not exist, then the directory \*(lq\fB.\fP\*(rq is used by default. The command line option .BI \-d " directory" overrides the directory in .BR MGDATA . .SH FILES .TP 20 .B *.text Compressed documents. .TP .B *.invf Inverted file. .TP .B *.text.idx.wgt Interleaved index into the compressed documents and document weights. .TP .B *.weight.approx Approximate document weights. .TP .B *.invf.dict.blocked Compressed stemmed dictionary and index into the inverted file merged into an inverted file. .TP .B *.text.dict.fast Fast loading compression dictionary. .TP .B *.text.dict Compressed compression dictionary. .TP .B *.invf.dict Compressed stemmed dictionary. .TP .B *.invf.idx The index into the inverted file. .TP .B *.text.stats Statistics about the text. .TP .B *.text.dict.aux Auxiliary compression dictionary. .TP .B *.text.idx Index into the compressed documents. .TP .B *.weight The exact weights file. .TP .B *.invf.chunk Maps stemmed terms from occurrence order to lexical order. .TP .B *.invf.chunk.trans Describes where the source text is broken up into chunks for the inversion pass. .TP .B *.invf.dict.hash A perfect hash function for the terms in the stemmed dictionary. .SH "SEE ALSO" .na .BR mg (1), .BR mg_compression_dict (1), .BR mg_fast_comp_dict (1), .BR mg_get (1), .BR mg_invf_dict (1), .BR mg_invf_dump (1), .BR mg_invf_rebuild (1), .BR mg_passes (1), .BR mg_perf_hash_build (1), .BR mg_text_estimate (1), .BR mg_weights_build (1), .BR mgbilevel (1), .BR mgbuild (1), .BR mgdictlist (1), .BR mgfelics (1), .BR mgquery (1), .BR mgtic (1), .BR mgticbuild (1), .BR mgticdump (1), .BR mgticprune (1), .BR mgticstat (1).