Guide to Unix/Commands/File Analysing

file edit

file displays the file type. To get the mimetype, use the -i option.


$ file Unix.txt
Unix.txt: ASCII text
$ file -i Unix.txt
Unix.txt: text/plain; charset=us-ascii


wc edit

wc tells you the number of lines, words and characters in a file.


$ wc hello.txt
2       6      29 hello.txt
$ wc -l hello.txt
2 hello.txt
$ wc -w hello.txt
6 hello.txt
$ wc -c hello.txt
29 hello.txt


cksum edit

Outputs a particular variant of 32-bit cyclic redundancy check (CRC) checksum of a file, files or standard input, together with sizes; in latest GNU Coreutils and some other implementations, it can output other checksums via -a option. This variant of 32-bit CRC is different from the CRC-32 used by zip, PNG and zlib; for one thing, cksum calculates the CRC not only from the octet stream of the file or input but rather from the stream to which the stream length has been appended.

The CRC output by cksum can be used to protect against accidental modifications to files: if the checksum has not changed, the file is very likely undamaged. The default CRC checksum is not cryptographic: it protects only against modifications that are not malicious (intentional).

Latest GNU Coreutils cksum allows a choice from multiple different kinds of checksums, including cryptographic ones, via -a option. These include sysv, bsd, crc, md5, sha1, sha224, sha256, sha384, sha512, blake2b, and sm3. None of the checksums is the CRC-32 of zip, PNG and zlib. OpenBSD cksum provides -a option as well, while the list of algorithms differs slightly. FreeBSD cksum allows a choice of one of three checksum algorithms in addition to the default one via -o1, -o2 and -o3 options; -o3 is the CRC-32 of zip, PNG and zlib; this applies to macOS as well.


$ cksum /etc/passwd
3052342160 2119 /etc/passwd

Some "cksum" implementations provide other algorithms, such as "md5" and "sha1":

$ cksum -a sha1 /etc/passwd
SHA1 (/etc/passwd) = 816d937ca4cdb4dee92d5002610fae63b639d224

You can test "cksum" by feeding it a string via standard input:

$ printf 'Guide to UNIX'|cksum
2195826759 13


sum edit

A legacy tool, outputs a certain kind of checksum of a file, files or standard input, together with sizes. Is not covered by POSIX; POSIX codified #cksum as a replacement tool instead, using a kind of checksum different from those used by legacy sum. Different variants of legacy sum used different algorithms. The legacy algorithms used by variants of sum are provided by the FreeBSD cksum via -o1 and -o2 options, and by latests GNU Coreutils cksum via -a option.

GNU Coreutils sum allows choice of legacy algorithm via -r and -s options.

The two commonly used legacy algorithms are as follows.

The BSD sum, -r in GNU sum:

  • Initialize checksum to 0
  • For each byte of the input stream
    • Perform 16-bit bitwise right rotation by 1 bit on the checksum
    • Add the byte to the checksum, and apply modulo 2 ^ 16 to the result, thereby keeping it within 16 bits
  • The result is a 16-bit checksum

The System V sum, -s in GNU sum:

  • checksum0 = sum of all bytes of the input stream modulo 2 ^ 32
  • checksum1 = checksum0 modulo 2 ^ 16 + checksum0 / 2 ^ 16;
  • checksum = checksum1 modulo 2 ^16 + checksum1 / 2 ^ 16;
  • The result is a 16-bit checksum calculated from the initial 32-bit plain byte sum


stat edit

Outputs file or file system status, including size, access rights, creation and modification times and more. The command seems absent from POSIX; POSIX only specifies system call stat().


  • stat,
  • stat,
  • stat in GNU Coreutils manual,

grep edit

Outputs lines matching a regular expression, not matching it, and similar, depending on options and the regular expression used. See Grep Wikibook.


diff edit

Compares file content of two files line by line and outputs differences. See also diff3.


diff3 edit

Compares file content of three files line by line and outputs differences. See also diff.


cmp edit

Compares files byte by byte, outputting the byte number and the line number where a first difference is found, if any. Outputs nothing if the files are binary identical. No indication is made of the further differences beyond the first one unless option -l is used.


strings edit

Outputs printable strings found in files, useful when these files are binary.