source: trunk/indexers/mg/README@ 13663

Last change on this file since 13663 was 3745, checked in by mdewsnip, 21 years ago

Addition of MG package for search and retrieval

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 5.9 KB
Line 
1 MG INFORMATION RETRIEVAL SYSTEM
2 ===============================
3
4The MG system is a suite of programs for compressing and indexing text
5and images. Most of the functionality implemented in the suite is as
6described in the book ``Managing Gigabytes: Compressing and Indexing
7Documents and Images'', I.H. Witten, A. Moffat, and T.C. Bell; Van
8Nostrand Reinhold, New York, 1994, ISBN 0-442-01863-0; US $54.95; call
91 (800) 544-0550 to order.
10
11These features include:
12
13-- text compression using a Huffman-coded semi-static word-based scheme
14-- two-level context-based compression of bi-level images
15-- FELICS lossless compression of gray-scale images
16-- combined lossy/lossless compression for textual images
17-- indexing algorithms for large volumes of text in limited main memory
18-- index compression
19-- a retrieval system that processes Boolean and ranked queries
20-- an X windows interface to the retrieval system
21
22As one example, a collection of 2 Gb of text (1,700,000 documents) can
23be indexed (on a SPARC 10 Model 512) in about four hours and compressed
24in a further four hours to make a database that in total occupies less
25than 800 Mb, or 40% of the original size. This includes a full index to
26every word and number in the original text. Boolean queries such as
27``managing AND gigabytes'' run in a few seconds, and ranked queries of
2830--50 terms are evaluated in 10--30 seconds.
29
30Details of these methods and further performance results appear in the
31MG book.
32
33The MG system comes with ABSOLUTELY NO WARRANTY; for details see the
34file COPYING.
35
36Instructions on how to build and install mg are in the file INSTALL.
37
38
39** CHANGES FROM BOOK
40
41For copyright reasons the stemmer used in this distribution of MG is
42not the same as the one illustrated in Figure 3.8 on page 108 of the MG
43book. This means that the numbers generated by the command ``mgstat
44alice'' will not match those numbers in Figure A.1 on page 394.
45Another stemmer was initially written as a simple stopgap for version
461.0. That stemmer has been replaced by a stemmer based on the Lovin's
47stemming algorithm for mg-1.1.
48
49The output format of ``mgstat'' has changed since Figure A.1 (page 394)
50was prepared. The same information is displayed but formatted
51differently.
52
53** MG VERSIONS
54
55The current version is mg-1.2, September 1995. The changes from earlier
56versions are listed in the file MODIFICATIONS. This can be accessed
57with mg by building a database using ``mgbuild mods'' and can also be
58accessed from the mg web page (see below).
59
60The mg-1.2 extensions include:
61
62-- Source modifications for use of GNU's autoconf.
63
64The mg-1.1 extensions include:
65
66-- A new highlighting mode.
67 The output mode ``hilite'' will highlight the query terms in the
68 retrieved text documents. The variable ``hilite_style'' can be set
69 to ``bold'' or ``underline''. It works best with the pager
70 ``less''. A .mgrc to use would include:
71 .set pager less
72 .set mode hilite
73 .set hilite_style bold
74
75-- A web site containing manual pages, documentation, and a
76 mgquery demo page (utilising cgi scripts).
77 See: http://www.kbs.citri.edu.au/mg
78 One of these pages ``about_mg.html'' is included in this
79 distribution.
80
81-- A revised mg_get script which uses a .mg_getrc file to map
82 specific collection names to filter types. (Modifications by Bruce
83 McKenzie). See mg_get.1 for more details.
84
85-- Code to perform merging of existing databases. This code
86 was created by Shane Hudson and is documented in the mgmerge.README
87 file found in the docs subdirectory. This code is maintained by
88 Shane Hudson ([email protected]).
89
90-- Revised man pages, including some new entries (thanks to Nelson
91 Beebe). See mg.1, mgintro.1, mgintro++.1.
92
93-- A real (rather than toy) stemmer.
94
95** PORTABILITY
96
97Please refer to "README.port".
98
99** CREDITS
100
101The MG development is largely the result of research collaboration
102between:
103
104 Tim C. Bell <[email protected]>
105 Ian Witten <[email protected]>
106 Alistair Moffat <[email protected]>
107 Justin Zobel <[email protected]>
108
109The bulk of the programming work has been carried out by:
110
111 Stuart Inglis <[email protected]>
112 Craig Nevill-Manning <[email protected]>
113 Neil Sharman <[email protected]>
114 Tim Shimmin <[email protected]>
115
116In addition to these, the following people have contributed to the
117development of the MG software:
118
119 Lachlan Andrew <[email protected]>
120 Gary Eddy <[email protected]>
121 Hugh Emberson <[email protected]>
122 Kerry Guise <[email protected]>
123 Shane Hudson <[email protected]>
124 Linh Huynh <[email protected]>
125 Bohdan S. Majewski <[email protected]>
126 Bruce McKenzie <[email protected]>
127
128In addition to these, the following people have submitted bug reports
129and suggestions/fixes:
130
131 Rex Barzee <[email protected]>
132 Tim A.H. Bell <[email protected]>
133 Tim C. Bell <[email protected]>
134 Nelson Beebe <[email protected]>
135 Rodney Brown <[email protected]>
136 Rok Sosic <[email protected]>
137 Carl Staelin <[email protected]>
138
139Development of the MG system was supported by the Australian Research
140Council; the Universities of Melbourne, Waikato, Canterbury, and
141Calgary; RMIT; and the Collaborative Information Technology Research
142Institute (Melbourne).
143
144** BUG REPORTS
145
146Send bug reports to <[email protected]> and <[email protected]>.
147Back-traces from gdb are always welcome but not mandatory :-)
148
149** FURTHER READING
150
151A bibliography of MG related research work appears in the files
152MG.Bibliography.ps and MG.Bibliography.bib on the ftp site and
153is accessible through the Web page.
154
155
156
Note: See TracBrowser for help on using the repository browser.