[31442] | 1 | FILE(1) BSD General Commands Manual FILE(1)
|
---|
| 2 |
|
---|
| 3 | NAME
|
---|
| 4 | file -- determine file type
|
---|
| 5 |
|
---|
| 6 | SYNOPSIS
|
---|
| 7 | file [-bchikLnNprsvz] [--mime-type] [--mime-encoding]
|
---|
| 8 | [-f namefile] [-F separator] [-m magicfiles] file
|
---|
| 9 | file -C [-m magicfile]
|
---|
| 10 | file [--help]
|
---|
| 11 |
|
---|
| 12 | DESCRIPTION
|
---|
| 13 | This manual page documents version 5.03 of the file com-
|
---|
| 14 | mand.
|
---|
| 15 |
|
---|
| 16 | file tests each argument in an attempt to classify it.
|
---|
| 17 | There are three sets of tests, performed in this order:
|
---|
| 18 | filesystem tests, magic tests, and language tests. The
|
---|
| 19 | first test that succeeds causes the file type to be
|
---|
| 20 | printed.
|
---|
| 21 |
|
---|
| 22 | The type printed will usually contain one of the words
|
---|
| 23 | text (the file contains only printing characters and a few
|
---|
| 24 | common control characters and is probably safe to read on
|
---|
| 25 | an ASCII terminal), executable (the file contains the
|
---|
| 26 | result of compiling a program in a form understandable to
|
---|
| 27 | some UNIX kernel or another), or data meaning anything
|
---|
| 28 | else (data is usually `binary' or non-printable). Excep-
|
---|
| 29 | tions are well-known file formats (core files, tar ar-
|
---|
| 30 | chives) that are known to contain binary data. When modi-
|
---|
| 31 | fying magic files or the program itself, make sure to
|
---|
| 32 | preserve these keywords. Users depend on knowing that all
|
---|
| 33 | the readable files in a directory have the word `text'
|
---|
| 34 | printed. Don't do as Berkeley did and change `shell
|
---|
| 35 | commands text' to `shell script'.
|
---|
| 36 |
|
---|
| 37 | The filesystem tests are based on examining the return
|
---|
| 38 | from a stat(2) system call. The program checks to see if
|
---|
| 39 | the file is empty, or if it's some sort of special file.
|
---|
| 40 | Any known file types appropriate to the system you are
|
---|
| 41 | running on (sockets, symbolic links, or named pipes
|
---|
| 42 | (FIFOs) on those systems that implement them) are intuited
|
---|
| 43 | if they are defined in the system header file
|
---|
| 44 | <sys/stat.h>.
|
---|
| 45 |
|
---|
| 46 | The magic tests are used to check for files with data in
|
---|
| 47 | particular fixed formats. The canonical example of this
|
---|
| 48 | is a binary executable (compiled program) a.out file,
|
---|
| 49 | whose format is defined in <elf.h>, <a.out.h> and possibly
|
---|
| 50 | <exec.h> in the standard include directory. These files
|
---|
| 51 | have a `magic number' stored in a particular place near
|
---|
| 52 | the beginning of the file that tells the UNIX operating
|
---|
| 53 | system that the file is a binary executable, and which of
|
---|
| 54 | several types thereof. The concept of a `magic' has been
|
---|
| 55 | applied by extension to data files. Any file with some
|
---|
| 56 | invariant identifier at a small fixed offset into the file
|
---|
| 57 | can usually be described in this way. The information
|
---|
| 58 | identifying these files is read from the compiled magic
|
---|
| 59 | file c:/progra~1/file/share/misc/magic.mgc, or the files
|
---|
| 60 | in the directory c:/progra~1/file/share/misc/magic if the
|
---|
| 61 | compiled file does not exist. In addition, if
|
---|
| 62 | $HOME/.magic.mgc or $HOME/.magic exists, it will be used
|
---|
| 63 | in preference to the system magic files.
|
---|
| 64 |
|
---|
| 65 | If a file does not match any of the entries in the magic
|
---|
| 66 | file, it is examined to see if it seems to be a text file.
|
---|
| 67 | ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character
|
---|
| 68 | sets (such as those used on Macintosh and IBM PC systems),
|
---|
| 69 | UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
|
---|
| 70 | character sets can be distinguished by the different
|
---|
| 71 | ranges and sequences of bytes that constitute printable
|
---|
| 72 | text in each set. If a file passes any of these tests,
|
---|
| 73 | its character set is reported. ASCII, ISO-8859-x, UTF-8,
|
---|
| 74 | and extended-ASCII files are identified as `text' because
|
---|
| 75 | they will be mostly readable on nearly any terminal;
|
---|
| 76 | UTF-16 and EBCDIC are only `character data' because, while
|
---|
| 77 | they contain text, it is text that will require transla-
|
---|
| 78 | tion before it can be read. In addition, file will
|
---|
| 79 | attempt to determine other characteristics of text-type
|
---|
| 80 | files. If the lines of a file are terminated by CR, CRLF,
|
---|
| 81 | or NEL, instead of the Unix-standard LF, this will be
|
---|
| 82 | reported. Files that contain embedded escape sequences or
|
---|
| 83 | overstriking will also be identified.
|
---|
| 84 |
|
---|
| 85 | Once file has determined the character set used in a text-
|
---|
| 86 | type file, it will attempt to determine in what language
|
---|
| 87 | the file is written. The language tests look for particu-
|
---|
| 88 | lar strings (cf. <names.h> ) that can appear anywhere in
|
---|
| 89 | the first few blocks of a file. For example, the keyword
|
---|
| 90 | .br indicates that the file is most likely a troff(1)
|
---|
| 91 | input file, just as the keyword struct indicates a C pro-
|
---|
| 92 | gram. These tests are less reliable than the previous two
|
---|
| 93 | groups, so they are performed last. The language test
|
---|
| 94 | routines also test for some miscellany (such as tar(1) ar-
|
---|
| 95 | chives).
|
---|
| 96 |
|
---|
| 97 | Any file that cannot be identified as having been written
|
---|
| 98 | in any of the character sets listed above is simply said
|
---|
| 99 | to be `data'.
|
---|
| 100 |
|
---|
| 101 | OPTIONS
|
---|
| 102 | -b, --brief
|
---|
| 103 | Do not prepend filenames to output lines (brief
|
---|
| 104 | mode).
|
---|
| 105 |
|
---|
| 106 | -c, --checking-printout
|
---|
| 107 | Cause a checking printout of the parsed form of
|
---|
| 108 | the magic file. This is usually used in conjunc-
|
---|
| 109 | tion with the -m flag to debug a new magic file
|
---|
| 110 | before installing it.
|
---|
| 111 |
|
---|
| 112 | -C, --compile
|
---|
| 113 | Write a magic.mgc output file that contains a pre-
|
---|
| 114 | parsed version of the magic file or directory.
|
---|
| 115 |
|
---|
| 116 | -e, --exclude testname
|
---|
| 117 | Exclude the test named in testname from the list
|
---|
| 118 | of tests made to determine the file type. Valid
|
---|
| 119 | test names are:
|
---|
| 120 |
|
---|
| 121 | apptype
|
---|
| 122 | EMX application type (only on EMX).
|
---|
| 123 |
|
---|
| 124 | text
|
---|
| 125 | Various types of text files (this test will try
|
---|
| 126 | to guess the text encoding, irrespective of the
|
---|
| 127 | setting of the `encoding' option).
|
---|
| 128 |
|
---|
| 129 | encoding
|
---|
| 130 | Different text encodings for soft magic tests.
|
---|
| 131 |
|
---|
| 132 | tokens
|
---|
| 133 | Looks for known tokens inside text files.
|
---|
| 134 |
|
---|
| 135 | cdf
|
---|
| 136 | Prints details of Compound Document Files.
|
---|
| 137 |
|
---|
| 138 | compress
|
---|
| 139 | Checks for, and looks inside, compressed files.
|
---|
| 140 |
|
---|
| 141 | elf
|
---|
| 142 | Prints ELF file details.
|
---|
| 143 |
|
---|
| 144 | soft
|
---|
| 145 | Consults magic files.
|
---|
| 146 |
|
---|
| 147 | tar
|
---|
| 148 | Examines tar files.
|
---|
| 149 |
|
---|
| 150 | -f, --files-from namefile
|
---|
| 151 | Read the names of the files to be examined from
|
---|
| 152 | namefile (one per line) before the argument list.
|
---|
| 153 | Either namefile or at least one filename argument
|
---|
| 154 | must be present; to test the standard input, use
|
---|
| 155 | `-' as a filename argument.
|
---|
| 156 |
|
---|
| 157 | -F, --separator separator
|
---|
| 158 | Use the specified string as the separator between
|
---|
| 159 | the filename and the file result returned.
|
---|
| 160 | Defaults to `:'.
|
---|
| 161 |
|
---|
| 162 | -h, --no-dereference
|
---|
| 163 | option causes symlinks not to be followed (on sys-
|
---|
| 164 | tems that support symbolic links). This is the
|
---|
| 165 | default if the environment variable
|
---|
| 166 | POSIXLY_CORRECT is not defined.
|
---|
| 167 |
|
---|
| 168 | -i, --mime
|
---|
| 169 | Causes the file command to output mime type
|
---|
| 170 | strings rather than the more traditional human
|
---|
| 171 | readable ones. Thus it may say `text/plain;
|
---|
| 172 | charset=us-ascii' rather than `ASCII text'. In
|
---|
| 173 | order for this option to work, file changes the
|
---|
| 174 | way it handles files recognized by the command
|
---|
| 175 | itself (such as many of the text file types,
|
---|
| 176 | directories etc), and makes use of an alternative
|
---|
| 177 | `magic' file. (See the FILES section, below).
|
---|
| 178 |
|
---|
| 179 | --mime-type, --mime-encoding
|
---|
| 180 | Like -i, but print only the specified element(s).
|
---|
| 181 |
|
---|
| 182 | -k, --keep-going
|
---|
| 183 | Don't stop at the first match, keep going. Subse-
|
---|
| 184 | quent matches will be have the string `\012- '
|
---|
| 185 | prepended. (If you want a newline, see the `-r'
|
---|
| 186 | option.)
|
---|
| 187 |
|
---|
| 188 | -L, --dereference
|
---|
| 189 | option causes symlinks to be followed, as the
|
---|
| 190 | like-named option in ls(1) (on systems that sup-
|
---|
| 191 | port symbolic links). This is the default if the
|
---|
| 192 | environment variable POSIXLY_CORRECT is defined.
|
---|
| 193 |
|
---|
| 194 | -m, --magic-file list
|
---|
| 195 | Specify an alternate list of files and directories
|
---|
| 196 | containing magic. This can be a single item, or a
|
---|
| 197 | colon-separated list. If a compiled magic file is
|
---|
| 198 | found alongside a file or directory, it will be
|
---|
| 199 | used instead.
|
---|
| 200 |
|
---|
| 201 | -n, --no-buffer
|
---|
| 202 | Force stdout to be flushed after checking each
|
---|
| 203 | file. This is only useful if checking a list of
|
---|
| 204 | files. It is intended to be used by programs that
|
---|
| 205 | want filetype output from a pipe.
|
---|
| 206 |
|
---|
| 207 | -N, --no-pad
|
---|
| 208 | Don't pad filenames so that they align in the out-
|
---|
| 209 | put.
|
---|
| 210 |
|
---|
| 211 | -p, --preserve-date
|
---|
| 212 | On systems that support utime(2) or utimes(2),
|
---|
| 213 | attempt to preserve the access time of files ana-
|
---|
| 214 | lyzed, to pretend that file never read them.
|
---|
| 215 |
|
---|
| 216 | -r, --raw
|
---|
| 217 | Don't translate unprintable characters to \ooo.
|
---|
| 218 | Normally file translates unprintable characters to
|
---|
| 219 | their octal representation.
|
---|
| 220 |
|
---|
| 221 | -s, --special-files
|
---|
| 222 | Normally, file only attempts to read and determine
|
---|
| 223 | the type of argument files which stat(2) reports
|
---|
| 224 | are ordinary files. This prevents problems,
|
---|
| 225 | because reading special files may have peculiar
|
---|
| 226 | consequences. Specifying the -s option causes
|
---|
| 227 | file to also read argument files which are block
|
---|
| 228 | or character special files. This is useful for
|
---|
| 229 | determining the filesystem types of the data in
|
---|
| 230 | raw disk partitions, which are block special
|
---|
| 231 | files. This option also causes file to disregard
|
---|
| 232 | the file size as reported by stat(2) since on some
|
---|
| 233 | systems it reports a zero size for raw disk parti-
|
---|
| 234 | tions.
|
---|
| 235 |
|
---|
| 236 | -v, --version
|
---|
| 237 | Print the version of the program and exit.
|
---|
| 238 |
|
---|
| 239 | -z, --uncompress
|
---|
| 240 | Try to look inside compressed files.
|
---|
| 241 |
|
---|
| 242 | -0, --print0
|
---|
| 243 | Output a null character `\0' after the end of the
|
---|
| 244 | filename. Nice to cut(1) the output. This does not
|
---|
| 245 | affect the separator which is still printed.
|
---|
| 246 |
|
---|
| 247 | --help Print a help message and exit.
|
---|
| 248 |
|
---|
| 249 | FILES
|
---|
| 250 | c:/progra~1/file/share/misc/magic.mgc Default compiled
|
---|
| 251 | list of magic.
|
---|
| 252 | c:/progra~1/file/share/misc/magic Directory contain-
|
---|
| 253 | ing default magic
|
---|
| 254 | files.
|
---|
| 255 |
|
---|
| 256 | ENVIRONMENT
|
---|
| 257 | The environment variable MAGIC can be used to set the
|
---|
| 258 | default magic file name. If that variable is set, then
|
---|
| 259 | file will not attempt to open $HOME/.magic. file adds
|
---|
| 260 | `.mgc' to the value of this variable as appropriate. The
|
---|
| 261 | environment variable POSIXLY_CORRECT controls (on systems
|
---|
| 262 | that support symbolic links), whether file will attempt to
|
---|
| 263 | follow symlinks or not. If set, then file follows symlink,
|
---|
| 264 | otherwise it does not. This is also controlled by the -L
|
---|
| 265 | and -h options.
|
---|
| 266 |
|
---|
| 267 | SEE ALSO
|
---|
| 268 | magic(5), strings(1), od(1), hexdump(1,) file(1posix)
|
---|
| 269 |
|
---|
| 270 | STANDARDS CONFORMANCE
|
---|
| 271 | This program is believed to exceed the System V Interface
|
---|
| 272 | Definition of FILE(CMD), as near as one can determine from
|
---|
| 273 | the vague language contained therein. Its behavior is
|
---|
| 274 | mostly compatible with the System V program of the same
|
---|
| 275 | name. This version knows more magic, however, so it will
|
---|
| 276 | produce different (albeit more accurate) output in many
|
---|
| 277 | cases.
|
---|
| 278 |
|
---|
| 279 | The one significant difference between this version and
|
---|
| 280 | System V is that this version treats any white space as a
|
---|
| 281 | delimiter, so that spaces in pattern strings must be
|
---|
| 282 | escaped. For example,
|
---|
| 283 |
|
---|
| 284 | >10 string language impress (imPRESS data)
|
---|
| 285 |
|
---|
| 286 | in an existing magic file would have to be changed to
|
---|
| 287 |
|
---|
| 288 | >10 string language\ impress (imPRESS data)
|
---|
| 289 |
|
---|
| 290 | In addition, in this version, if a pattern string contains
|
---|
| 291 | a backslash, it must be escaped. For example
|
---|
| 292 |
|
---|
| 293 | 0 string \begindata Andrew Toolkit document
|
---|
| 294 |
|
---|
| 295 | in an existing magic file would have to be changed to
|
---|
| 296 |
|
---|
| 297 | 0 string \\begindata Andrew Toolkit document
|
---|
| 298 |
|
---|
| 299 | SunOS releases 3.2 and later from Sun Microsystems include
|
---|
| 300 | a file command derived from the System V one, but with
|
---|
| 301 | some extensions. My version differs from Sun's only in
|
---|
| 302 | minor ways. It includes the extension of the `&' opera-
|
---|
| 303 | tor, used as, for example,
|
---|
| 304 |
|
---|
| 305 | >16 long&0x7fffffff >0 not stripped
|
---|
| 306 |
|
---|
| 307 | MAGIC DIRECTORY
|
---|
| 308 | The magic file entries have been collected from various
|
---|
| 309 | sources, mainly USENET, and contributed by various
|
---|
| 310 | authors. Christos Zoulas (address below) will collect
|
---|
| 311 | additional or corrected magic file entries. A consolida-
|
---|
| 312 | tion of magic file entries will be distributed periodi-
|
---|
| 313 | cally.
|
---|
| 314 |
|
---|
| 315 | The order of entries in the magic file is significant.
|
---|
| 316 | Depending on what system you are using, the order that
|
---|
| 317 | they are put together may be incorrect. If your old file
|
---|
| 318 | command uses a magic file, keep the old magic file around
|
---|
| 319 | for comparison purposes (rename it to
|
---|
| 320 | c:/progra~1/file/share/misc/magic.orig ).
|
---|
| 321 |
|
---|
| 322 | EXAMPLES
|
---|
| 323 | $ file file.c file /dev/{wd0a,hda}
|
---|
| 324 | file.c: C program text
|
---|
| 325 | file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
|
---|
| 326 | dynamically linked (uses shared libs), stripped
|
---|
| 327 | /dev/wd0a: block special (0/0)
|
---|
| 328 | /dev/hda: block special (3/0)
|
---|
| 329 |
|
---|
| 330 | $ file -s /dev/wd0{b,d}
|
---|
| 331 | /dev/wd0b: data
|
---|
| 332 | /dev/wd0d: x86 boot sector
|
---|
| 333 |
|
---|
| 334 | $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
|
---|
| 335 | /dev/hda: x86 boot sector
|
---|
| 336 | /dev/hda1: Linux/i386 ext2 filesystem
|
---|
| 337 | /dev/hda2: x86 boot sector
|
---|
| 338 | /dev/hda3: x86 boot sector, extended partition table
|
---|
| 339 | /dev/hda4: Linux/i386 ext2 filesystem
|
---|
| 340 | /dev/hda5: Linux/i386 swap file
|
---|
| 341 | /dev/hda6: Linux/i386 swap file
|
---|
| 342 | /dev/hda7: Linux/i386 swap file
|
---|
| 343 | /dev/hda8: Linux/i386 swap file
|
---|
| 344 | /dev/hda9: empty
|
---|
| 345 | /dev/hda10: empty
|
---|
| 346 |
|
---|
| 347 | $ file -i file.c file /dev/{wd0a,hda}
|
---|
| 348 | file.c: text/x-c
|
---|
| 349 | file: application/x-executable
|
---|
| 350 | /dev/hda: application/x-not-regular-file
|
---|
| 351 | /dev/wd0a: application/x-not-regular-file
|
---|
| 352 |
|
---|
| 353 |
|
---|
| 354 | HISTORY
|
---|
| 355 | There has been a file command in every UNIX since at least
|
---|
| 356 | Research Version 4 (man page dated November, 1973). The
|
---|
| 357 | System V version introduced one significant major change:
|
---|
| 358 | the external list of magic types. This slowed the program
|
---|
| 359 | down slightly but made it a lot more flexible.
|
---|
| 360 |
|
---|
| 361 | This program, based on the System V version, was written
|
---|
| 362 | by Ian Darwin <[email protected]> without looking at any-
|
---|
| 363 | body else's source code.
|
---|
| 364 |
|
---|
| 365 | John Gilmore revised the code extensively, making it bet-
|
---|
| 366 | ter than the first version. Geoff Collyer found several
|
---|
| 367 | inadequacies and provided some magic file entries. Con-
|
---|
| 368 | tributions by the `&' operator by Rob McMahon, cudcv@war-
|
---|
| 369 | wick.ac.uk, 1989.
|
---|
| 370 |
|
---|
| 371 | Guy Harris, [email protected], made many changes from 1993 to
|
---|
| 372 | the present.
|
---|
| 373 |
|
---|
| 374 | Primary development and maintenance from 1990 to the
|
---|
| 375 | present by Christos Zoulas ([email protected]).
|
---|
| 376 |
|
---|
| 377 | Altered by Chris Lowth, [email protected], 2000: Handle the
|
---|
| 378 | -i option to output mime type strings, using an alterna-
|
---|
| 379 | tive magic file and internal logic.
|
---|
| 380 |
|
---|
| 381 | Altered by Eric Fischer ([email protected]), July, 2000, to
|
---|
| 382 | identify character codes and attempt to identify the lan-
|
---|
| 383 | guages of non-ASCII files.
|
---|
| 384 |
|
---|
| 385 | Altered by Reuben Thomas ([email protected]), 2007 to 2008, to
|
---|
| 386 | improve MIME support and merge MIME and non-MIME magic,
|
---|
| 387 | support directories as well as files of magic, apply many
|
---|
| 388 | bug fixes and improve the build system.
|
---|
| 389 |
|
---|
| 390 | The list of contributors to the `magic' directory (magic
|
---|
| 391 | files) is too long to include here. You know who you are;
|
---|
| 392 | thank you. Many contributors are listed in the source
|
---|
| 393 | files.
|
---|
| 394 |
|
---|
| 395 | LEGAL NOTICE
|
---|
| 396 | Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
|
---|
| 397 | Covered by the standard Berkeley Software Distribution
|
---|
| 398 | copyright; see the file LEGAL.NOTICE in the source distri-
|
---|
| 399 | bution.
|
---|
| 400 |
|
---|
| 401 | The files tar.h and is_tar.c were written by John Gilmore
|
---|
| 402 | from his public-domain tar(1) program, and are not covered
|
---|
| 403 | by the above license.
|
---|
| 404 |
|
---|
| 405 | BUGS
|
---|
| 406 | There must be a better way to automate the construction of
|
---|
| 407 | the Magic file from all the glop in Magdir. What is it?
|
---|
| 408 |
|
---|
| 409 | file uses several algorithms that favor speed over accu-
|
---|
| 410 | racy, thus it can be misled about the contents of text
|
---|
| 411 | files.
|
---|
| 412 |
|
---|
| 413 | The support for text files (primarily for programming lan-
|
---|
| 414 | guages) is simplistic, inefficient and requires recompila-
|
---|
| 415 | tion to update.
|
---|
| 416 |
|
---|
| 417 | The list of keywords in ascmagic probably belongs in the
|
---|
| 418 | Magic file. This could be done by using some keyword like
|
---|
| 419 | `*' for the offset value.
|
---|
| 420 |
|
---|
| 421 | Complain about conflicts in the magic file entries. Make
|
---|
| 422 | a rule that the magic entries sort based on file offset
|
---|
| 423 | rather than position within the magic file?
|
---|
| 424 |
|
---|
| 425 | The program should provide a way to give an estimate of
|
---|
| 426 | `how good' a guess is. We end up removing guesses (e.g.
|
---|
| 427 | `Fromas first 5 chars of file) because' they are not as
|
---|
| 428 | good as other guesses (e.g. `Newsgroups:' versus
|
---|
| 429 | `Return-Path:' ). Still, if the others don't pan out, it
|
---|
| 430 | should be possible to use the first guess.
|
---|
| 431 |
|
---|
| 432 | This manual page, and particularly this section, is too
|
---|
| 433 | long.
|
---|
| 434 |
|
---|
| 435 | RETURN CODE
|
---|
| 436 | file returns 0 on success, and non-zero on error.
|
---|
| 437 |
|
---|
| 438 | AVAILABILITY
|
---|
| 439 | You can obtain the original author's latest version by
|
---|
| 440 | anonymous FTP on ftp.astron.com in the directory
|
---|
| 441 | /pub/file/file-X.YZ.tar.gz
|
---|
| 442 |
|
---|
| 443 | BSD October 9, 2008 BSD
|
---|