.\"------------------------------------------------------------ .\" Id - set Rv,revision, and Dt, Date using rcs-Id tag. .de Id .ds Rv \\$3 .ds Dt \\$4 .. .Id $Id: mgbuild.1 16583 2008-07-29 10:20:36Z davidb $ .\"------------------------------------------------------------ .TH mgbuild 1 \*(Dt CITRI .SH NAME mgbuild \- build an mg system database .SH SYNOPSIS .B mgbuild [ .B \-c ] [ .BI \-g " get" ] [ .BI \-s " source" ] [ .BI \-d " mgdata dir" ] .I collection-name .SH DESCRIPTION .B mgbuild is a .B csh script that executes all the appropriate programs in the correct order to completely build an .BR mg (1) system database ready for queries to be made by .BR mgquery. This program makes use of the .BR mg_get (1) script to obtain the text of the collection. .SH OPTIONS Options can occur in any order, but the collection name must be last. .TP "\w'\fIcollection-name\fP'u+2n" .BI \-c This specifies whether the .I get program is \*(lqcomplex\*(rq. If a .I get program is \*(lqcomplex\*(rq, then it requires initialisation and cleanup with the .B \-i and .B \-c options. .TP .BI \-g " get" This specifies the program to use for getting the source text for the build. If no .B \-g option is given, the default program .BR mg_get (1) is used. .TP .BI \-s " source" The .B mgbuild program consists of two parts. The first part initializes variables to default values. The second part uses these variables to control how the .BR mg (1) database is built. This option specifies a program to execute between the first and second parts. The details of what the variables are, and how they may be changed, are in comments in the .B mgbuild program. .TP .BI \-d " mgdata dir" Setting this option allows one to override the MGDATA environment variable. .TP .I collection-name This is the collection name, as required by the .BR mg_get (1) program. It serves both as a .I case statement selector, and as the name of a subdirectory that holds the indexing files. .SH ENVIRONMENT .TP "\w'\fBMGDATA\fP'u+2n" .SB MGDATA If this environment variable exists, then its value is used as the default directory where the .BR mg (1) collection files are. If this variable does not exist, then the directory \*(lq\fB.\fP\*(rq is used by default. The command line option .BI \-d " directory" overrides the directory in .BR MGDATA . .SH FILES .TP 20 .B *.invf Inverted file. .TP .B *.invf.chunk Inverted file chunk descriptor file. When the inverted file is created, it is written in chunks that use no more than a set amount of memory. This file describes those chunks. .TP .B *.invf.chunk.trans Word-occurrence-order to lexical-order translation file. The .B *.invf.chunk file is written in word-occurrence order but is required by .B \-N2 to be in lexical order. .TP .B *.invf.dict Compressed stemmed dictionary. .TP .B *.invf.dict.blocked The `on-disk' stemmed dictionary. .TP .B *.invf.dict.hash Data for an order-preserving perfect hash function. .TP .B *.invf.idx The index into the inverted file. .TP .B *.weight The exact weights file. .TP .B *.text Compressed documents. .TP .B *.text.stats Text statistics. .TP .B *.text.dict Compressed compression dictionary. .TP .B *.text.idx Index into the compressed documents. .TP .B *.text.idx.wgt Interleaved index into the compressed documents and document weights. .TP .B *.weight.approx Approximate document weights. .SH "SEE ALSO" .na .BR mg (1), .BR mg_compression_dict (1), .BR mg_fast_comp_dict (1), .BR mg_get (1), .BR mg_invf_dict (1), .BR mg_invf_dump (1), .BR mg_invf_rebuild (1), .BR mg_passes (1), .BR mg_perf_hash_build (1), .BR mg_text_estimate (1), .BR mg_weights_build (1), .BR mgbilevel (1), .BR mgdictlist (1), .BR mgfelics (1), .BR mgquery (1), .BR mgstat (1), .BR mgtic (1), .BR mgticbuild (1), .BR mgticdump (1), .BR mgticprune (1), .BR mgticstat (1).