source: gsdl/trunk/trunk/mg/src/scripts/mg_get.1@ 16583

Last change on this file since 16583 was 16583, checked in by davidb, 16 years ago

Undoing change commited in r16582

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 4.9 KB
Line 
1.TH mg_get 1 "31 January 1994"
2.SH NAME
3mg_get \- output source texts for processing
4.SH SYNOPSIS
5
6.B mg_get
7.I collection-name
8.RB [ \-init | \-i | \-text\fP|\fB\-t\fP|\fB\-cleanup\fP|\fB\-c\fP]
9.SH DESCRIPTION
10.LP
11This program is the default one used by mgbuild to generate the source text for the MG system. Any program may be used to generate the source text for mgbuild as long as it confirms to the interface specified here.
12.SH OPTIONS
13.LP
14The
15.I collection\-name
16must appear before any other option. Only the first option has any significance. If no option is specified
17.B \-text
18is assumed.
19.RS
20.TP
21.BR \-init " and " \-i
22The program is called once with this flag at the start of building a collection.
23.TP
24.BR \-text " and " \-t
25The program is called with this flag multiple times during the building of a collection. The program outputs (on stdout) the text of the collection. \*(lq\fBDocuments\fP\*(rq within the collection are separated with by ctrl-B's (ASCII code 2). \*(lq\fBParagraphs\fP\*(rq within the collection are separated with ctrl-C's (ASCII code 3). A collection need not contain paragraphs unless a level 3 inverted file is being generated (see
26.B mgbuild
27and
28.BR mg_passes ).
29.TP
30.BR \-cleanup " and " \-c
31The program is called once with this flag at the completion of building a collection.
32.SH ENVIRONMENT
33.TP
34.SB MGDATA
35If this environment variable exists then its value is used a the default
36directory where the mg collection files are. If this variable does not exist
37then the directory \*(lq\fB.\fP\*(rq is used by default. The command line
38option
39.BI \-d " directory"
40overrides the directory in
41.BR MGDATA .
42.TP
43.SB MG_GETRC
44This environment variable specifies where the file containing the users
45mg source configurations is. If not set, mg_get uses a default of
46.B \*\~/.mg_getrc.
47
48This file contains TAB delimited lines of the form
49.RS
50.RS
51.I CollectionName
52.I CollectionType
53.I files or directory
54.RE
55
56.I CollectionName
57is the name of the collection supplied to mg_get.
58
59.I CollectionType
60specifies how mg_get should process the
61named files and directories and are descibed below.
62
63.I files or directory
64is either a list of files separted by blanks or a single directory.
65Some of the
66.I CollectionTypes
67deal with files and others with just a single directory.
68Any files used ending with .gz or .Z are decompressed with gzip before
69processing. References to '~' expands to the users HOME directory.
70
71.I CollectionType
72is one of
73.RS
74.TP
75.BR PARA
76For text based documents. The list of files specified are treated as
77a series of paragraphs separated by blank lines. Each paragraph becomes
78a seperate document on the indexed collection.
79.TP
80.BR MAIL
81for mail files. The list of files specified are treated as UNIX mail
82files separated by lines starting with 'From'. Each mail message becomes
83a seperate document on the indexed collection. As a extra feature any
84embedded tarmail encoded contents (enclosed by a 'xbtoa Begin' and 'xbtoa End'
85pair are removed.
86.TP
87.BR DIR
88(and DIR2)
89For a single directory of files. Each file in the directory (and in any of
90it's subdirectories) are treated as a single document. With
91.I DIR
92the pathname of the file is prefixed to the contents of each file as an
93extra line while this is not done for
94.I DIR2
95collections.
96.TP
97.BR BIB
98for biblography files. The list of files specified are treated as
99a series of biblography files (eg BIBTEX or TROFF) separated by lines
100starting with '@'. Each reference becomes a seperate document on the indexed
101collection.
102.TP
103.BR TXTIMG
104for integrated collections of text and images. The single directory should
105contain files that have the same prefix if they are related. For example,
106monaLisa.pgm might be a gray-level image, and monaLisa.txt would be a textual
107file describing the image. The suffixes recognised are:
108.BR .txt
109for ascii text
110.BR .ptm
111for scanned text stored as a bilevel image
112.BR .pbm
113for a black and white image (typically a line drawing)
114.BR .pgm
115for a gray-scale image
116.BR In addition, if no corresponding ascii text file is found for
117a .pbm or .pgm file, then one is created with suffix .tmp.txt,
118and it stores the name of the image file (in principle it could
119store the OCR of a .txt.pbm file). At present the .tmp.txt files
120are deleted by the '-cleanup' option.
121
122.SH "EXAMPLE"
123
124An example ~/.mg_getrc file might look like:
125 alice PARA /users/alice13a.txt.Z
126 mail95 MAIL ~/MAIL/1995/*
127 doc DIR ~/documents
128 bibs BIB ~/etc/REFS/*.bib ~/etc/REFS/CC/*.bib
129 letters DIR ~/LETTERS
130 davinci TXTIMG ~/images/davinci
131
132.SH "SEE ALSO"
133.LP
134.BR mg_compression_dict (1),
135.BR mg_invf_dict (1),
136.BR mg_invf_dump (1),
137.BR mg_invf_rebuild (1),
138.BR mg_make_fast_dict (1),
139.BR mg_passes (1),
140.BR mg_perf_hash_build (1),
141.BR mg_text_estimate (1),
142.BR mg_weights_build (1),
143.BR mgbilevel (1),
144.BR mgbuild (1),
145.BR mgdictlist (1),
146.BR mgfelics (1),
147.BR mgquery (1),
148.BR mgstat (1),
149.BR mgtic (1),
150.BR mgticbuild (1),
151.BR mgticdump (1),
152.BR mgticprune (1),
153.BR mgticstat (1)
154
Note: See TracBrowser for help on using the repository browser.