source: trunk/indexers/mg/docs/mgintro++.1@ 3745

Last change on this file since 3745 was 3745, checked in by mdewsnip, 21 years ago

Addition of MG package for search and retrieval

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 5.5 KB
Line 
1.\"------------------------------------------------------------
2.\" Id - set Rv,revision, and Dt, Date using rcs-Id tag.
3.de Id
4.ds Rv \\$3
5.ds Dt \\$4
6..
7.Id $Id: mgintro++.1 3745 2003-02-20 21:20:24Z mdewsnip $
8.\"------------------------------------------------------------
9.ds r \&\s-1MG\s0
10.if n .ds - \%--
11.if t .ds - \(em
12.\"------------------------------------------------------------
13.am SS
14.LP
15..
16.\"------------------------------------------------------------
17.TH MGINTRO++ 1 \*(Dt CITRI
18.\"--------------------------------------------------------------
19.SH NAME
20mgintro++ \- extended introduction to the MG system
21.\"-------------------------------------------------------------
22.SH DESCRIPTION
23This manual assumes the reader has already read
24.BR mgintro (1).
25.\"-------------------------------------------------------------
26.SS Creating Different Databases
27If a user wants to build databases other than for some
28predefined ones, such as "alice", "davinci", "mailfiles", "allfiles",
29then the user has a couple of choices.
30Ultimately (s)he must produce a text file with control-Bs
31terminating the documents.
32To do this one can produce one or more such files, or write a
33"get" command (typically in the form of a script or c program).
34.\"-------------------------------------------------------------
35.SS Using Input Files for mgbuild
36If you don't want to write a "get" script and just want to use
37one or more text files as input, then you must first generate
38the file with control-Bs. For a simple example, you could take
39any text file(s) such as "test1.txt" and "test2.txt", and use
40.BR vi (1)
41to insert control-Bs by typing "control-V b".
42Next you should create a file with "set" statements
43in the following form:
44.PP
45.IP
46\fBset pipe = 0 # do not use pipe - use file instead
47.br
48\fBset input_files = 'test1.txt test2.txt'
49.LP
50Let's call this file, "build_options".
51Now issue the command:
52.IP
53.B mgbuild -s build_options test
54.LP
55This should build a database called "test" in the $MGDATA directory,
56based on the source data of "test1.txt" and "test2.txt".
57The build_options file is simply sourced by
58.BR mgbuild (1)
59after it has set up its variables.
60Therefore, any settings one makes in the
61build_options file will override the standard settings.
62See
63.BR mgbuild (1)
64for more information.
65.\"-------------------------------------------------------------
66.SS Writing A Get Program
67Instead of using files as input, it is often more convenient to
68write a "get" program. This program is called by
69.BR mgbuild (1)
70to get the text data with control-Bs as document terminators.
71It should take three options:
72.br
73(i) -init; (ii) -text; (iii) -cleanup.
74.br
75Get will be called with "init" first and with "cleanup" at the end.
76It will call get with "text" when it wants the text and it should
77write the text to stdout.
78.br
79See
80.BR mg_get (1)
81for an example.
82.\"-------------------------------------------------------------
83.SS Regular Builds
84The MG system provides a static database;
85there are no update commands.
86So if one wants to keep one's database reasonably up-to-date
87then one can have this done automatically on a regular basis by
88.BR cron (1).
89A crontab file can be created using:
90crontab -e
91A crontab file contains lines of the form:
92.nf
93.IP
94\fBminute hour day-of-month month day-of-week shell-command.
95.LP
96.fi
97See
98.BR crontab (1)
99for more information.
100.nf
101An example crontab entry is:
102.IP
103\fB15 02 * * * mgbuild allfiles >$MGDATA/allfiles/allfiles.log 2>&1
104.LP
105.fi
106This will build up the mg database for "allfiles", your mail in
107the folders, every morning at 2:15am.
108.\"
109.\"-------------------------------------------------------------
110.SS Command Structure
111There are 22 commands that make up the mg system. However,
112a user may only need to be aware of a few:
113.BR mgbuild (1),
114.BR mgquery (1),
115and perhaps
116.BR mg_get (1).
117Many of the commands are called by
118.BR mgbuild(1).
119The commands can be broken up into a hierarchy.
120.PP
121--------------------------------------
122.br
123MG--+--image compression
124 | |
125 | +--mgbilevel
126 | |
127 | +--mgfelics
128 | |
129 | +--mgtic
130 | |
131 | +--mgticbuild
132 | |
133 | +--mgticdump
134 | |
135 | +--mgticprune
136 | |
137 | +--mgticstat
138 |
139 +--text
140 |
141 +--compression
142 | |
143 | +--mg_passes -T1
144 | |
145 | +--mg_passes -T2
146 | |
147 | +--mg_compression_dict
148 | |
149 | +--mg_fast_comp_dict
150 |
151 +--indexing
152 | |
153 | +--mg_passes -N1
154 | |
155 | +--mg_passes -N2
156 | |
157 | +--mg_perf_hash_build
158 | |
159 | +--mg_invf_dict
160 | |
161 | +--mg_invf_rebuild
162 |
163 +--weights
164 | |
165 | +--mg_weights_build
166 |
167 +--query
168 | |
169 | +--mgquery
170 |
171 +--tools
172 |
173 +--mg_invf_dump
174 |
175 +--mg_text_estimate
176 |
177 +--mgdictlist
178 |
179 +--mgstat
180.br
181--------------------------------------
182.PP
183.nf
184.BR mgbuild (1)
185calls the following commands:
186.RS
187.BR mg_passes (1), mg_compression_dict (1)
188.BR mg_perf_hash_build (1), mg_invf_dict (1), mg_invf_rebuild (1)
189.BR mg_weights_build (1)
190.RE
191.fi
192.\"--------------------------------------------
193.SH SEE ALSO
194.BR mgintro (1),
195.BR mgbuild (1),
196.BR mg_get (1)
197.br
198"Guide To The \*r System", in Appendix A of the book:
199.PP
200.RS
201.nf
202Ian H. Witten, Alistair Moffat, and Timothy C. Bell
203.I "Managing Gigabytes: Compressing and Indexing Documents and Images"
204Van Nostrand Reinhold
2051994
206xiv + 429 pages
207US$54.95
208ISBN 0-442-01863-0
209Library of Congress catalog number TA1637 .W58 1994.
210.fi
211.RE
212
Note: See TracBrowser for help on using the repository browser.