1 | .TH mg_get 1 "31 January 1994"
|
---|
2 | .SH NAME
|
---|
3 | mg_get \- output source texts for processing
|
---|
4 | .SH SYNOPSIS
|
---|
5 |
|
---|
6 | .B mg_get
|
---|
7 | .I collection-name
|
---|
8 | .RB [ \-init | \-i | \-text\fP|\fB\-t\fP|\fB\-cleanup\fP|\fB\-c\fP]
|
---|
9 | .SH DESCRIPTION
|
---|
10 | .LP
|
---|
11 | This program is the default one used by mgbuild to generate the source text for the MG system. Any program may be used to generate the source text for mgbuild as long as it confirms to the interface specified here.
|
---|
12 | .SH OPTIONS
|
---|
13 | .LP
|
---|
14 | The
|
---|
15 | .I collection\-name
|
---|
16 | must appear before any other option. Only the first option has any significance. If no option is specified
|
---|
17 | .B \-text
|
---|
18 | is assumed.
|
---|
19 | .RS
|
---|
20 | .TP
|
---|
21 | .BR \-init " and " \-i
|
---|
22 | The program is called once with this flag at the start of building a collection.
|
---|
23 | .TP
|
---|
24 | .BR \-text " and " \-t
|
---|
25 | The program is called with this flag multiple times during the building of a collection. The program outputs (on stdout) the text of the collection. \*(lq\fBDocuments\fP\*(rq within the collection are separated with by ctrl-B's (ASCII code 2). \*(lq\fBParagraphs\fP\*(rq within the collection are separated with ctrl-C's (ASCII code 3). A collection need not contain paragraphs unless a level 3 inverted file is being generated (see
|
---|
26 | .B mgbuild
|
---|
27 | and
|
---|
28 | .BR mg_passes ).
|
---|
29 | .TP
|
---|
30 | .BR \-cleanup " and " \-c
|
---|
31 | The program is called once with this flag at the completion of building a collection.
|
---|
32 | .SH ENVIRONMENT
|
---|
33 | .TP
|
---|
34 | .SB MGDATA
|
---|
35 | If this environment variable exists then its value is used a the default
|
---|
36 | directory where the mg collection files are. If this variable does not exist
|
---|
37 | then the directory \*(lq\fB.\fP\*(rq is used by default. The command line
|
---|
38 | option
|
---|
39 | .BI \-d " directory"
|
---|
40 | overrides the directory in
|
---|
41 | .BR MGDATA .
|
---|
42 | .TP
|
---|
43 | .SB MG_GETRC
|
---|
44 | This environment variable specifies where the file containing the users
|
---|
45 | mg source configurations is. If not set, mg_get uses a default of
|
---|
46 | .B \*\~/.mg_getrc.
|
---|
47 |
|
---|
48 | This file contains TAB delimited lines of the form
|
---|
49 | .RS
|
---|
50 | .RS
|
---|
51 | .I CollectionName
|
---|
52 | .I CollectionType
|
---|
53 | .I files or directory
|
---|
54 | .RE
|
---|
55 |
|
---|
56 | .I CollectionName
|
---|
57 | is the name of the collection supplied to mg_get.
|
---|
58 |
|
---|
59 | .I CollectionType
|
---|
60 | specifies how mg_get should process the
|
---|
61 | named files and directories and are descibed below.
|
---|
62 |
|
---|
63 | .I files or directory
|
---|
64 | is either a list of files separted by blanks or a single directory.
|
---|
65 | Some of the
|
---|
66 | .I CollectionTypes
|
---|
67 | deal with files and others with just a single directory.
|
---|
68 | Any files used ending with .gz or .Z are decompressed with gzip before
|
---|
69 | processing. References to '~' expands to the users HOME directory.
|
---|
70 |
|
---|
71 | .I CollectionType
|
---|
72 | is one of
|
---|
73 | .RS
|
---|
74 | .TP
|
---|
75 | .BR PARA
|
---|
76 | For text based documents. The list of files specified are treated as
|
---|
77 | a series of paragraphs separated by blank lines. Each paragraph becomes
|
---|
78 | a seperate document on the indexed collection.
|
---|
79 | .TP
|
---|
80 | .BR MAIL
|
---|
81 | for mail files. The list of files specified are treated as UNIX mail
|
---|
82 | files separated by lines starting with 'From'. Each mail message becomes
|
---|
83 | a seperate document on the indexed collection. As a extra feature any
|
---|
84 | embedded tarmail encoded contents (enclosed by a 'xbtoa Begin' and 'xbtoa End'
|
---|
85 | pair are removed.
|
---|
86 | .TP
|
---|
87 | .BR DIR
|
---|
88 | (and DIR2)
|
---|
89 | For a single directory of files. Each file in the directory (and in any of
|
---|
90 | it's subdirectories) are treated as a single document. With
|
---|
91 | .I DIR
|
---|
92 | the pathname of the file is prefixed to the contents of each file as an
|
---|
93 | extra line while this is not done for
|
---|
94 | .I DIR2
|
---|
95 | collections.
|
---|
96 | .TP
|
---|
97 | .BR BIB
|
---|
98 | for biblography files. The list of files specified are treated as
|
---|
99 | a series of biblography files (eg BIBTEX or TROFF) separated by lines
|
---|
100 | starting with '@'. Each reference becomes a seperate document on the indexed
|
---|
101 | collection.
|
---|
102 | .TP
|
---|
103 | .BR TXTIMG
|
---|
104 | for integrated collections of text and images. The single directory should
|
---|
105 | contain files that have the same prefix if they are related. For example,
|
---|
106 | monaLisa.pgm might be a gray-level image, and monaLisa.txt would be a textual
|
---|
107 | file describing the image. The suffixes recognised are:
|
---|
108 | .BR .txt
|
---|
109 | for ascii text
|
---|
110 | .BR .ptm
|
---|
111 | for scanned text stored as a bilevel image
|
---|
112 | .BR .pbm
|
---|
113 | for a black and white image (typically a line drawing)
|
---|
114 | .BR .pgm
|
---|
115 | for a gray-scale image
|
---|
116 | .BR In addition, if no corresponding ascii text file is found for
|
---|
117 | a .pbm or .pgm file, then one is created with suffix .tmp.txt,
|
---|
118 | and it stores the name of the image file (in principle it could
|
---|
119 | store the OCR of a .txt.pbm file). At present the .tmp.txt files
|
---|
120 | are deleted by the '-cleanup' option.
|
---|
121 |
|
---|
122 | .SH "EXAMPLE"
|
---|
123 |
|
---|
124 | An example ~/.mg_getrc file might look like:
|
---|
125 | alice PARA /users/alice13a.txt.Z
|
---|
126 | mail95 MAIL ~/MAIL/1995/*
|
---|
127 | doc DIR ~/documents
|
---|
128 | bibs BIB ~/etc/REFS/*.bib ~/etc/REFS/CC/*.bib
|
---|
129 | letters DIR ~/LETTERS
|
---|
130 | davinci TXTIMG ~/images/davinci
|
---|
131 |
|
---|
132 | .SH "SEE ALSO"
|
---|
133 | .LP
|
---|
134 | .BR mg_compression_dict (1),
|
---|
135 | .BR mg_invf_dict (1),
|
---|
136 | .BR mg_invf_dump (1),
|
---|
137 | .BR mg_invf_rebuild (1),
|
---|
138 | .BR mg_make_fast_dict (1),
|
---|
139 | .BR mg_passes (1),
|
---|
140 | .BR mg_perf_hash_build (1),
|
---|
141 | .BR mg_text_estimate (1),
|
---|
142 | .BR mg_weights_build (1),
|
---|
143 | .BR mgbilevel (1),
|
---|
144 | .BR mgbuild (1),
|
---|
145 | .BR mgdictlist (1),
|
---|
146 | .BR mgfelics (1),
|
---|
147 | .BR mgquery (1),
|
---|
148 | .BR mgstat (1),
|
---|
149 | .BR mgtic (1),
|
---|
150 | .BR mgticbuild (1),
|
---|
151 | .BR mgticdump (1),
|
---|
152 | .BR mgticprune (1),
|
---|
153 | .BR mgticstat (1)
|
---|
154 |
|
---|