source: main/trunk/binaries/windows/bin/GNUfile/man/cat5/magic.5.txt@ 31442

Last change on this file since 31442 was 31442, checked in by ak19, 7 years ago

Adding the GNUFile windows port of linux file utility to detect bitness of an executable. Link to license included and added a GS_README text file with some basic explanations.

File size: 20.6 KB
Line 
1MAGIC(5) BSD File Formats Manual MAGIC(5)
2
3NAME
4 magic -- file command's magic pattern file
5
6DESCRIPTION
7 This manual page documents the format of the magic file as
8 used by the file(1) command, version 5.03. The file(1)
9 command identifies the type of a file using, among other
10 tests, a test for whether the file contains certain
11 ``magic patterns''. The file
12 c:/progra~1/file/share/misc/magic specifies what patterns
13 are to be tested for, what message or MIME type to print
14 if a particular pattern is found, and additional informa-
15 tion to extract from the file.
16
17 Each line of the file specifies a test to be performed. A
18 test compares the data starting at a particular offset in
19 the file with a byte value, a string or a numeric value.
20 If the test succeeds, a message is printed. The line con-
21 sists of the following fields:
22
23 offset A number specifying the offset, in bytes, into
24 the file of the data which is to be tested.
25
26 type The type of the data to be tested. The possible
27 values are:
28
29 byte A one-byte value.
30
31 short A two-byte value in this machine's
32 native byte order.
33
34 long A four-byte value in this machine's
35 native byte order.
36
37 quad An eight-byte value in this machine's
38 native byte order.
39
40 float A 32-bit single precision IEEE float-
41 ing point number in this machine's
42 native byte order.
43
44 double A 64-bit double precision IEEE float-
45 ing point number in this machine's
46 native byte order.
47
48 string A string of bytes. The string type
49 specification can be optionally fol-
50 lowed by /[Bbc]*. The ``B'' flag
51 compacts whitespace in the target,
52 which must contain at least one
53 whitespace character. If the magic
54 has n consecutive blanks, the target
55 needs at least n consecutive blanks
56 to match. The ``b'' flag treats
57 every blank in the target as an
58 optional blank. Finally the ``c''
59 flag, specifies case insensitive
60 matching: lowercase characters in the
61 magic match both lower and upper case
62 characters in the target, whereas
63 upper case characters in the magic
64 only match uppercase characters in
65 the target.
66
67 pstring A Pascal-style string where the first
68 byte is interpreted as the an
69 unsigned length. The string is not
70 NUL terminated.
71
72 date A four-byte value interpreted as a
73 UNIX date.
74
75 qdate A eight-byte value interpreted as a
76 UNIX date.
77
78 ldate A four-byte value interpreted as a
79 UNIX-style date, but interpreted as
80 local time rather than UTC.
81
82 qldate An eight-byte value interpreted as a
83 UNIX-style date, but interpreted as
84 local time rather than UTC.
85
86 beid3 A 32-bit ID3 length in big-endian
87 byte order.
88
89 beshort A two-byte value in big-endian byte
90 order.
91
92 belong A four-byte value in big-endian byte
93 order.
94
95 bequad An eight-byte value in big-endian
96 byte order.
97
98 befloat A 32-bit single precision IEEE float-
99 ing point number in big-endian byte
100 order.
101
102 bedouble A 64-bit double precision IEEE float-
103 ing point number in big-endian byte
104 order.
105
106 bedate A four-byte value in big-endian byte
107 order, interpreted as a Unix date.
108
109 beqdate An eight-byte value in big-endian
110 byte order, interpreted as a Unix
111 date.
112
113 beldate A four-byte value in big-endian byte
114 order, interpreted as a UNIX-style
115 date, but interpreted as local time
116 rather than UTC.
117
118 beqldate An eight-byte value in big-endian
119 byte order, interpreted as a UNIX-
120 style date, but interpreted as local
121 time rather than UTC.
122
123 bestring16 A two-byte unicode (UCS16) string in
124 big-endian byte order.
125
126 leid3 A 32-bit ID3 length in little-endian
127 byte order.
128
129 leshort A two-byte value in little-endian
130 byte order.
131
132 lelong A four-byte value in little-endian
133 byte order.
134
135 lequad An eight-byte value in little-endian
136 byte order.
137
138 lefloat A 32-bit single precision IEEE float-
139 ing point number in little-endian
140 byte order.
141
142 ledouble A 64-bit double precision IEEE float-
143 ing point number in little-endian
144 byte order.
145
146 ledate A four-byte value in little-endian
147 byte order, interpreted as a UNIX
148 date.
149
150 leqdate An eight-byte value in little-endian
151 byte order, interpreted as a UNIX
152 date.
153
154 leldate A four-byte value in little-endian
155 byte order, interpreted as a UNIX-
156 style date, but interpreted as local
157 time rather than UTC.
158
159 leqldate An eight-byte value in little-endian
160 byte order, interpreted as a UNIX-
161 style date, but interpreted as local
162 time rather than UTC.
163
164 lestring16 A two-byte unicode (UCS16) string in
165 little-endian byte order.
166
167 melong A four-byte value in middle-endian
168 (PDP-11) byte order.
169
170 medate A four-byte value in middle-endian
171 (PDP-11) byte order, interpreted as a
172 UNIX date.
173
174 meldate A four-byte value in middle-endian
175 (PDP-11) byte order, interpreted as a
176 UNIX-style date, but interpreted as
177 local time rather than UTC.
178
179 indirect Starting at the given offset, consult
180 the magic database again.
181
182 regex A regular expression match in
183 extended POSIX regular expression
184 syntax (like egrep). Regular expres-
185 sions can take exponential time to
186 process, and their performance is
187 hard to predict, so their use is dis-
188 couraged. When used in production
189 environments, their performance
190 should be carefully checked. The type
191 specification can be optionally fol-
192 lowed by /[c][s]. The ``c'' flag
193 makes the match case insensitive,
194 while the ``s'' flag update the off-
195 set to the start offset of the match,
196 rather than the end. The regular
197 expression is tested against line N +
198 1 onwards, where N is the given off-
199 set. Line endings are assumed to be
200 in the machine's native format. ^
201 and $ match the beginning and end of
202 individual lines, respectively, not
203 beginning and end of file.
204
205 search A literal string search starting at
206 the given offset. The same modifier
207 flags can be used as for string pat-
208 terns. The modifier flags (if any)
209 must be followed by /number the
210 range, that is, the number of posi-
211 tions at which the match will be
212 attempted, starting from the start
213 offset. This is suitable for search-
214 ing larger binary expressions with
215 variable offsets, using \ escapes for
216 special characters. The offset works
217 as for regex.
218
219 default This is intended to be used with the
220 test x (which is always true) and a
221 message that is to be used if there
222 are no other matches.
223
224 Each top-level magic pattern (see below for an
225 explanation of levels) is classified as text or
226 binary according to the types used. Types
227 ``regex'' and ``search'' are classified as text
228 tests, unless non-printable characters are used
229 in the pattern. All other tests are classified as
230 binary. A top-level pattern is considered to be a
231 test text when all its patterns are text pat-
232 terns; otherwise, it is considered to be a binary
233 pattern. When matching a file, binary patterns
234 are tried first; if no match is found, and the
235 file looks like text, then its encoding is deter-
236 mined and the text patterns are tried.
237
238 The numeric types may optionally be followed by &
239 and a numeric value, to specify that the value is
240 to be AND'ed with the numeric value before any
241 comparisons are done. Prepending a u to the type
242 indicates that ordered comparisons should be
243 unsigned.
244
245 test The value to be compared with the value from the
246 file. If the type is numeric, this value is
247 specified in C form; if it is a string, it is
248 specified as a C string with the usual escapes
249 permitted (e.g. \n for new-line).
250
251 Numeric values may be preceded by a character
252 indicating the operation to be performed. It may
253 be =, to specify that the value from the file
254 must equal the specified value, <, to specify
255 that the value from the file must be less than
256 the specified value, >, to specify that the value
257 from the file must be greater than the specified
258 value, &, to specify that the value from the file
259 must have set all of the bits that are set in the
260 specified value, ^, to specify that the value
261 from the file must have clear any of the bits
262 that are set in the specified value, or ~, the
263 value specified after is negated before tested.
264 x, to specify that any value will match. If the
265 character is omitted, it is assumed to be =.
266 Operators &, ^, and ~ don't work with floats and
267 doubles. The operator ! specifies that the line
268 matches if the test does not succeed.
269
270 Numeric values are specified in C form; e.g. 13
271 is decimal, 013 is octal, and 0x13 is hexadeci-
272 mal.
273
274 For string values, the string from the file must
275 match the specified string. The operators =, <
276 and > (but not &) can be applied to strings. The
277 length used for matching is that of the string
278 argument in the magic file. This means that a
279 line can match any non-empty string (usually used
280 to then print the string), with >\0 (because all
281 non-empty strings are greater than the empty
282 string).
283
284 The special test x always evaluates to true.
285 message The message to be printed if the compari-
286 son succeeds. If the string contains a printf(3)
287 format specification, the value from the file
288 (with any specified masking performed) is printed
289 using the message as the format string. If the
290 string begins with ``\b'', the message printed is
291 the remainder of the string with no whitespace
292 added before it: multiple matches are normally
293 separated by a single space.
294
295 An APPLE 4+4 character APPLE creator and type can be spec-
296 ified as:
297
298 !:apple CREATYPE
299
300 A MIME type is given on a separate line, which must be the
301 next non-blank or comment line after the magic line that
302 identifies the file type, and has the following format:
303
304 !:mime MIMETYPE
305
306 i.e. the literal string ``!:mime'' followed by the MIME
307 type.
308
309 An optional strength can be supplied on a separate line
310 which refers to the current magic description using the
311 following format:
312
313 !:strength OP VALUE
314
315 The operand OP can be: +, -, *, or / and VALUE is a con-
316 stant between 0 and 255. This constant is applied using
317 the specified operand to the currently computed default
318 magic strength.
319
320 Some file formats contain additional information which is
321 to be printed along with the file type or need additional
322 tests to determine the true file type. These additional
323 tests are introduced by one or more > characters preceding
324 the offset. The number of > on the line indicates the
325 level of the test; a line with no > at the beginning is
326 considered to be at level 0. Tests are arranged in a
327 tree-like hierarchy: If a the test on a line at level n
328 succeeds, all following tests at level n+1 are performed,
329 and the messages printed if the tests succeed, untile a
330 line with level n (or less) appears. For more complex
331 files, one can use empty messages to get just the
332 "if/then" effect, in the following way:
333
334 0 string MZ
335 >0x18 leshort <0x40 MS-DOS executable
336 >0x18 leshort >0x3f extended PC executable (e.g., MS Windows)
337
338 Offsets do not need to be constant, but can also be read
339 from the file being examined. If the first character fol-
340 lowing the last > is a ( then the string after the paren-
341 thesis is interpreted as an indirect offset. That means
342 that the number after the parenthesis is used as an offset
343 in the file. The value at that offset is read, and is
344 used again as an offset in the file. Indirect offsets are
345 of the form: (( x [.[bislBISL]][+-][ y ]). The value of x
346 is used as an offset in the file. A byte, id3 length,
347 short or long is read at that offset depending on the
348 [bislBISLm] type specifier. The capitalized types inter-
349 pret the number as a big endian value, whereas the small
350 letter versions interpret the number as a little endian
351 value; the m type interprets the number as a middle endian
352 (PDP-11) value. To that number the value of y is added
353 and the result is used as an offset in the file. The
354 default type if one is not specified is long.
355
356 That way variable length structures can be examined:
357
358 # MS Windows executables are also valid MS-DOS executables
359 0 string MZ
360 >0x18 leshort <0x40 MZ executable (MS-DOS)
361 # skip the whole block below if it is not an extended executable
362 >0x18 leshort >0x3f
363 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
364 >>(0x3c.l) string LX\0\0 LX executable (OS/2)
365
366 This strategy of examining has a drawback: You must make
367 sure that you eventually print something, or users may get
368 empty output (like, when there is neither PE\0\0 nor
369 LE\0\0 in the above example)
370
371 If this indirect offset cannot be used directly, simple
372 calculations are possible: appending [+-*/%&|^]number
373 inside parentheses allows one to modify the value read
374 from the file before it is used as an offset:
375
376 # MS Windows executables are also valid MS-DOS executables
377 0 string MZ
378 # sometimes, the value at 0x18 is less that 0x40 but there's still an
379 # extended executable, simply appended to the file
380 >0x18 leshort <0x40
381 >>(4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP)
382 >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
383
384 Sometimes you do not know the exact offset as this depends
385 on the length or position (when indirection was used
386 before) of preceding fields. You can specify an offset
387 relative to the end of the last up-level field using `&'
388 as a prefix to the offset:
389
390 0 string MZ
391 >0x18 leshort >0x3f
392 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
393 # immediately following the PE signature is the CPU type
394 >>>&0 leshort 0x14c for Intel 80386
395 >>>&0 leshort 0x184 for DEC Alpha
396
397 Indirect and relative offsets can be combined:
398
399 0 string MZ
400 >0x18 leshort <0x40
401 >>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
402 # if it's not COFF, go back 512 bytes and add the offset taken
403 # from byte 2/3, which is yet another way of finding the start
404 # of the extended executable
405 >>>&(2.s-514) string LE LE executable (MS Windows VxD driver)
406
407 Or the other way around:
408
409 0 string MZ
410 >0x18 leshort >0x3f
411 >>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
412 # at offset 0x80 (-4, since relative offsets start at the end
413 # of the up-level match) inside the LE header, we find the absolute
414 # offset to the code area, where we look for a specific signature
415 >>>(&0x7c.l+0x26) string UPX \b, UPX compressed
416
417 Or even both!
418
419 0 string MZ
420 >0x18 leshort >0x3f
421 >>(0x3c.l) string LE\0\0 LE executable (MS-Windows)
422 # at offset 0x58 inside the LE header, we find the relative offset
423 # to a data area where we look for a specific signature
424 >>>&(&0x54.l-3) string UNACE \b, ACE self-extracting archive
425
426 Finally, if you have to deal with offset/length pairs in
427 your file, even the second value in a parenthesized
428 expression can be taken from the file itself, using
429 another set of parentheses. Note that this additional
430 indirect offset is always relative to the start of the
431 main indirect offset.
432
433 0 string MZ
434 >0x18 leshort >0x3f
435 >>(0x3c.l) string PE\0\0 PE executable (MS-Windows)
436 # search for the PE section called ".idata"...
437 >>>&0xf4 search/0x140 .idata
438 # ...and go to the end of it, calculated from start+length;
439 # these are located 14 and 10 bytes after the section name
440 >>>>(&0xe.l+(-4)) string PK\3\4 \b, ZIP self-extracting archive
441
442SEE ALSO
443 file(1) - the command that reads this file.
444
445BUGS
446 The formats long, belong, lelong, melong, short, beshort,
447 leshort, date, bedate, medate, ledate, beldate, leldate,
448 and meldate are system-dependent; perhaps they should be
449 specified as a number of bytes (2B, 4B, etc), since the
450 files being recognized typically come from a system on
451 which the lengths are invariant.
452
453BSD August 30, 2008 BSD
Note: See TracBrowser for help on using the repository browser.