source: gsdl/trunk/trunk/mg/man/man1/mg_text_estimate.1@ 16583

Last change on this file since 16583 was 16583, checked in by davidb, 16 years ago

Undoing change commited in r16582

  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
File size: 3.0 KB
Line 
1.\"------------------------------------------------------------
2.\" Id - set Rv,revision, and Dt, Date using rcs-Id tag.
3.de Id
4.ds Rv \\$3
5.ds Dt \\$4
6..
7.Id $Id: mg_text_estimate.1 16583 2008-07-29 10:20:36Z davidb $
8.\"------------------------------------------------------------
9.TH mg_text_estimate 1 \*(Dt CITRI
10.SH NAME
11mg_text_estimate \- Estimate the size of the compressed text based on the text statistics and a compression dictionary.
12.SH SYNOPSIS
13.B mg_text_estimate
14[
15.B \-h
16]
17[
18.BR \-H " |"
19.BR \-B " |"
20.BR \-D " |"
21.BR \-Y " |"
22.B \-M
23]
24.if n .ti +9n
25.I stats-dict
26.I compression-dict
27.SH DESCRIPTION
28This program estimates the size of the compressed text that would be
29generated if text with certain statistics were compressed with a
30particular dictionary. This program has no real use in the
31.BR mg (1)
32system, except for experimenting with the
33.BR mg_compression_dict (1)
34program.
35.SH OPTIONS
36Options may appear in any order.
37.TP "\w'\fB\-m\fP'u+2n"
38.B \-h
39This displays a usage line on
40.IR stderr .
41.TP
42.B \-H
43This specifies that novel words will be coded character by character
44using Huffman codes.
45.TP
46.B \-B
47This specifies that the size of an auxiliary dictionary will be
48estimated. Each novel word found will be placed at the end of the
49auxiliary dictionary. Novel words will be coded in the compressed text
50using binary codes. The binary code represents their occurrence
51position in the auxiliary dictionary.
52.TP
53.B \-D
54This specifies that the size of an auxiliary dictionary will be
55estimated. Each novel word found will be placed at the end of the
56auxiliary dictionary. Novel words will be coded in the compressed text
57using delta codes. The delta code represents their occurrence position
58in the auxiliary dictionary.
59.TP
60.B \-Y
61This specifies that the size of an auxiliary dictionary will be
62estimated. Each novel word found will be placed at the end of the
63auxiliary dictionary. Novel words will be coded in the compressed text
64using a combination of gamma and binary codes. The code represents
65their occurrence position in the auxiliary dictionary. This generally
66produces better compression than
67.B \-B
68or
69.BR \-D .
70.TP
71.B \-M
72This specifies that the size of an auxiliary dictionary will be
73estimated. Each novel word found will be placed at the end of the
74auxiliary dictionary. Novel words will be coded in the compressed text
75using a combination of gamma and binary codes. The code represents
76their occurrence position in the auxiliary dictionary. This method is
77adaptive within documents, and generally produces better compression
78than
79.BR \-B ,
80.B \-D
81or
82.BR \-Y .
83.SH FILES
84.TP 20
85.B *.text.stats
86Statistics about the text.
87.TP
88.B *.text.dict
89Compressed compression dictionary.
90.SH "SEE ALSO"
91.na
92.BR mg (1),
93.BR mg_compression_dict (1),
94.BR mg_fast_comp_dict (1),
95.BR mg_get (1),
96.BR mg_invf_dict (1),
97.BR mg_invf_dump (1),
98.BR mg_invf_rebuild (1),
99.BR mg_passes (1),
100.BR mg_perf_hash_build (1),
101.BR mg_weights_build (1),
102.BR mgbilevel (1),
103.BR mgbuild (1),
104.BR mgdictlist (1),
105.BR mgfelics (1),
106.BR mgquery (1),
107.BR mgstat (1),
108.BR mgtic (1),
109.BR mgticbuild (1),
110.BR mgticdump (1),
111.BR mgticprune (1),
112.BR mgticstat (1).
Note: See TracBrowser for help on using the repository browser.