source: trunk/gsdl/src/mgpp/README@ 654

Last change on this file since 654 was 654, checked in by cs025, 25 years ago

Base install of MGPP

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 6.6 KB
Line 
1 MG INFORMATION RETRIEVAL SYSTEM
2 ===============================
3
4The MG system is a suite of programs for compressing and indexing text
5and images. Most of the functionality implemented in the suite is as
6described in the book ``Managing Gigabytes: Compressing and Indexing
7Documents and Images'', second edition, I.H. Witten, A. Moffat, and
8T.C. Bell, Morgan Kaufmann, San Francisco, 1999, ISBN 1-55860-570-3;
9US $54.95. See also the web page <http://www.cs.mu.oz.au/mg/>.
10
11These features include:
12
13-- text compression using a Huffman-coded semi-static word-based scheme
14-- two-level context-based compression of bi-level images
15-- FELICS lossless compression of gray-scale images
16-- combined lossy/lossless compression for textual images
17-- indexing algorithms for large volumes of text in limited main memory
18-- index compression
19-- a retrieval system that processes Boolean and ranked queries
20-- an X windows interface to the retrieval system
21
22As one example, a collection of 2 Gb of text (1,700,000 documents) can
23be indexed (on a 266 MHz Pentium II) in about one hour and compressed
24in a further one hour to make a database that in total occupies less
25than 800 Mb, or 40% of the original size. This includes a full index
26to every word and number in the original text. Boolean queries such as
27``managing AND gigabytes'' run in a few tenths of a second on the same
28hardware, and ranked queries of 30--50 terms are evaluated in 1--3
29seconds.
30
31Details of these methods and further performance results appear in the
32MG book.
33
34The MG system comes with ABSOLUTELY NO WARRANTY; for details see the
35file COPYING.
36
37Instructions on how to build and install mg are in the files INSTALL.mg
38and INSTALL.
39
40
41** CHANGES FROM BOOK
42
43For copyright reasons the stemmer used in this distribution of MG is not
44the same as the one illustrated in Figure 3.9 on page 146 of the MG
45book. (However, the numbers generated by the command ``mgstat alice''
46will match those numbers in Figure A.1 on page 454, unlike in the first
47edition of the MG book.) Another stemmer was initially written as a
48simple stopgap for version 1.0. That stemmer has now been replaced by a
49stemmer based on the Lovin's stemming algorithm.
50
51The output format of ``mgstat'' has changed slightly since Figure A.1
52(page 454) was prepared. The same information is displayed but formatted
53differently.
54
55Some of the on-disk data structures have changed slightly, to accomodate
56larger databases. This is reflected in some increased file sizes -- in
57most cases, the increase is just 8 bytes. This also means that databases
58built with older versions of mg are not compatible with this version.
59
60
61** MG VERSIONS
62
63The current version is mg-1.2.1, August 1999. The changes from earlier
64versions are listed in the file MODIFICATIONS. This can be accessed
65with mg by building a database using ``mgbuild mods'' and can also be
66accessed from the mg web page (see below).
67
68The mg-1.2.1 extensions include:
69
70-- Fixes to compile under Linux (various distributions).
71-- Fixes to avoid some 32-bit integer overflows when building large
72 (> 2Gb) databases.
73
74The mg-1.2 extensions include:
75
76-- Source modifications for use of GNU's autoconf.
77
78The mg-1.1 extensions include:
79
80-- A new highlighting mode.
81 The output mode ``hilite'' will highlight the query terms in the
82 retrieved text documents. The variable ``hilite_style'' can be set
83 to ``bold'' or ``underline''. It works best with the pager
84 ``less''. A .mgrc to use would include:
85 .set pager less
86 .set mode hilite
87 .set hilite_style bold
88
89-- A web site containing manual pages and documentation is at
90 <http://www.mds.rmit.edu.au/mg/>
91 One of these pages ``about_mg.html'' is included in this
92 distribution.
93
94-- A revised mg_get script which uses a .mg_getrc file to map
95 specific collection names to filter types. (Modifications by Bruce
96 McKenzie). See mg_get.1 for more details.
97
98-- Code to perform merging of existing databases. This code
99 was created by Shane Hudson and is documented in the mgmerge.README
100 file found in the docs subdirectory. This code was written by
101 Shane Hudson (Canterbury).
102
103-- Revised man pages, including some new entries (thanks to Nelson
104 Beebe). See mg.1, mgintro.1, mgintro++.1.
105
106-- A real (rather than toy) stemmer.
107
108** PORTABILITY
109
110Please refer to "README.port".
111
112** CREDITS
113
114The MG development is largely the result of research collaboration
115between:
116
117 Tim C. Bell <[email protected]>
118 Ian Witten <[email protected]>
119 Alistair Moffat <[email protected]>
120 Justin Zobel <[email protected]>
121
122The bulk of the programming work has been carried out by:
123
124 Stuart Inglis (Waikato)
125 Craig Nevill-Manning (Waikato)
126 Neil Sharman (Melbourne and RMIT)
127 Tim Shimmin (RMIT)
128
129In addition to these, the following people have contributed to the
130development of the MG software:
131
132 Lachlan Andrew (RMIT)
133 Tim A.H. Bell (Melbourne)
134 Owen de Kretser (Melbourne)
135 Gary Eddy (Melbourne)
136 Hugh Emberson (Canterbury)
137 Kerry Guise (Waikato)
138 Shane Hudson (Canterbury)
139 Linh Huynh (Melbourne and RMIT)
140 Bohdan S. Majewski (Queensland)
141 Bruce McKenzie (Canterbury)
142 William Weber (RMIT)
143
144In addition to these, the following people have submitted bug reports
145and suggestions/fixes:
146
147 Rex Barzee
148 Nelson Beebe
149 Tim A.H. Bell
150 Tim C. Bell
151 Rodney Brown
152 Rok Sosic
153 Carl Staelin
154
155Development of the MG system was supported by the Australian Research
156Council; the Universities of Melbourne, Waikato, Canterbury, and
157Calgary; RMIT; and the Collaborative Information Technology Research
158Institute (Melbourne).
159
160** BUG REPORTS
161
162Send bug reports to <[email protected]>. But do please be aware
163that there is little likelihood of any immediate response apart from a
164"thank you for letting me know", as we have no funded support for MG,
165and any software development is voluntary on the part of my students.
166What I do guarantee is that your mail will be retained against the
167eventuality that one day someone does give us $50,000 for further
168software development. And if you have $50,000, and thought MG was
169wonderful, well, think of us...
170
171
172** FURTHER READING
173
174The bibliography of the MG book lists a wide range of relevant papers.
175Other recent work relevant to the project is listed at
176<http://www.cs.mu.oz.au/~alistair/abstracts/> and at
177<http://www.cs.rmit.edu.au/~jz/Papers.html>. The NZDL project home
178page is at <http://www.nzdl.org>.
Note: See TracBrowser for help on using the repository browser.