.\"------------------------------------------------------------
.\" Id - set Rv,revision, and Dt, Date using rcs-Id tag.
.de Id
.ds Rv \\$3
.ds Dt \\$4
..
.Id $Id: mg_text_estimate.1 3745 2003-02-20 21:20:24Z mdewsnip $
.\"------------------------------------------------------------
.TH mg_text_estimate 1 \*(Dt CITRI
.SH NAME
mg_text_estimate \- Estimate the size of the compressed text based on the text statistics and a compression dictionary.
.SH SYNOPSIS
.B mg_text_estimate
[
.B \-h
]
[
.BR \-H " |"
.BR \-B " |"
.BR \-D " |"
.BR \-Y " |"
.B \-M
]
.if n .ti +9n
.I stats-dict
.I compression-dict
.SH DESCRIPTION
This program estimates the size of the compressed text that would be
generated if text with certain statistics were compressed with a
particular dictionary.  This program has no real use in the
.BR mg (1)
system, except for experimenting with the
.BR mg_compression_dict (1)
program.
.SH OPTIONS
Options may appear in any order.
.TP "\w'\fB\-m\fP'u+2n"
.B \-h
This displays a usage line on
.IR stderr .
.TP
.B \-H
This specifies that novel words will be coded character by character
using Huffman codes.
.TP
.B \-B
This specifies that the size of an auxiliary dictionary will be
estimated.  Each novel word found will be placed at the end of the
auxiliary dictionary.  Novel words will be coded in the compressed text
using binary codes.  The binary code represents their occurrence
position in the auxiliary dictionary.
.TP
.B \-D
This specifies that the size of an auxiliary dictionary will be
estimated.  Each novel word found will be placed at the end of the
auxiliary dictionary.  Novel words will be coded in the compressed text
using delta codes.  The delta code represents their occurrence position
in the auxiliary dictionary.
.TP
.B \-Y
This specifies that the size of an auxiliary dictionary will be
estimated.  Each novel word found will be placed at the end of the
auxiliary dictionary.  Novel words will be coded in the compressed text
using a combination of gamma and binary codes.  The code represents
their occurrence position in the auxiliary dictionary.  This generally
produces better compression than
.B \-B
or
.BR \-D .
.TP
.B \-M
This specifies that the size of an auxiliary dictionary will be
estimated.  Each novel word found will be placed at the end of the
auxiliary dictionary.  Novel words will be coded in the compressed text
using a combination of gamma and binary codes.  The code represents
their occurrence position in the auxiliary dictionary.  This method is
adaptive within documents, and generally produces better compression
than
.BR \-B ,
.B \-D
or
.BR \-Y .
.SH FILES
.TP 20
.B *.text.stats
Statistics about the text.
.TP
.B *.text.dict
Compressed compression dictionary.
.SH "SEE ALSO"
.na
.BR mg (1),
.BR mg_compression_dict (1),
.BR mg_fast_comp_dict (1),
.BR mg_get (1),
.BR mg_invf_dict (1),
.BR mg_invf_dump (1),
.BR mg_invf_rebuild (1),
.BR mg_passes (1),
.BR mg_perf_hash_build (1),
.BR mg_weights_build (1),
.BR mgbilevel (1),
.BR mgbuild (1),
.BR mgdictlist (1),
.BR mgfelics (1),
.BR mgquery (1),
.BR mgstat (1),
.BR mgtic (1),
.BR mgticbuild (1),
.BR mgticdump (1),
.BR mgticprune (1),
.BR mgticstat (1).