Guide to Unix/Commands/File Compression

gzip

edit

gzip compresses files. Each single file is compressed into a single file. The compressed file consists of a GNU zip header and deflated data.

If given a file as an argument, gzip compresses the file, adds a ".gz" suffix, and deletes the original file. With no arguments, gzip compresses the standard input and writes the compressed file to standard output.

Some useful options are:

-c  Write compressed file to stdout. Do not delete original file.
-d  Act like gunzip.
-1  Performance: Use fast compression (somewhat bigger result)
-9  Performance: Use best compression (somewhat slower)

Examples:

Compress the file named README. Creates README.gz and deletes README.

$ gzip README

Compress the file called README. The standard output (which is the compressed file) is redirected by the shell to gzips/README.gz. Keeps README.

$ gzip -c README > gzips/README.gz

Use gzip without arguments to compress README.

$ < README gzip > gzips/README.gz

Links:

gunzip

edit

gunzip uncompresses a file that was compressed with "gzip" or "compress". It tries to handle both the GNU zip format of gzip and the older Unix compress format. It does this by recognizing the extension (".gz" or ".Z" or several others) of a file.

Some useful options are:

-c  Write uncompressed data to stdout. Do not delete original file.

Undo the effect of gzip README.gz by replacing the compressed version of the file with the original, uncompressed version. Creates README and deletes README.gz.

$ gunzip README.gz

Write the uncompressed contents of README.gz to standard output. Pipe it into a pager for easy reading of a compressed file.

$ gunzip -c README.gz | more

Another way to do that is:

$ gunzip < README.gz | more

Some people name files package.tgz as short for package.tar.gz.

Links:

zcat

edit

zcat is same thing as uncompress -c, though on many systems it is actually same as "gzcat" and gunzip -c.

Links:

gzcat

edit

gzcat is same as gunzip -c which is gzip -dc.

Archives without compression. Not covered by modern POSIX, which covers #pax instead; yet, tar continues to be widely used. An archive contains one or more files or directories.

Options to tar are confusing. Specify a mode every time.

Modes:

  • -c create an archive (files to archive, archive from files)
  • -x extract an archive (archive to files, files from archive)
  • -t list an archive (lists the files in the archive)

Options:

  • -f FILE name of archive - must specify unless using tape drive for archive
  • -v be verbose, list all files being archived/extracted
  • -p preserve permissions and (if possible) user/group when extracting.
  • -z create/extract archive with gzip/gunzip
  • -j create/extract archive with bzip2/bunzip2
  • -J create/extract archive with XZ

Examples:

Compress (gzip) and package (tar) the directory myfiles to create myfiles.tar.gz:

$ tar -czvf myfiles.tar.gz myfiles

Uncompress (gzip) and unpack compressed package, extracting contents from myfiles:

$ tar -xzvf myfiles.tar.gz

There are two different conventions concerning gzipped tarballs. One often encounters .tar.gz. The other popular choice is .tgz. Slackware packages use the latter convention.

If you have access to a tape device or other backup medium, then you can use it instead of an archive file. If the material to be archived exceeds the capacity of the backup medium, the program will prompt the user to insert a new tape or diskette.

Use the following command to back up the myfiles directory to floppies:

$ tar -cvf /dev/fd0 myfiles

Restore that backup with:

$ tar -xvf /dev/fd0

You can also specify standard input or output -f - instead of an archive file or device. It is possible to use copy between directories by piping two "tar" commands together. For example, suppose we have two directories, from-stuff and to-stuff

$ ls -F
from-stuff/
to-stuff/

As described in Running Linux, one can mirror everything from from-stuff to to-stuff this way:

$ tar cf - . | (cd ../to-stuff; tar xvf -)

Reference: Welsh, Matt, Matthias Kalle Dalheimer and Lar Kaufman (1999), Running Linux. Third edition, O'Reilly and Associates.

Links:

cpio

edit

cpio is used for creating archives. When creating an archive, a list of files is fed to its standard-input (rather than specifying the files on the commandline). This file-list is typically created by ls, find or locate and then piped directly to cpio; but it can also first be filtered/edited with commands like *grep, sed, sort and others. A (pre-edited) list stored as a file can also be used, by using cat to feed the pipeline or simply by redirecting the shell's standard-input (<).

cpio works in one of three modes:

  • cpio -o - Copy-Out mode: Files are copied out from the filesystem to create an archive. Usually the archive is created by simply using the shell to redirect cpio's output to a file (with >).
  • cpio -i - Copy-In mode: Files from an existing archive are restored/extracted, and copied back in to the filesystem.
  • cpio -p - Pass-Through mode: cpio is used to copy files from one location in the directory-tree to another, without an actual archiving being made.

In addition comes:

  • cpio -t - List archive: The content of an archive is listed without extracting it.
  • cpio -tv - Here the verbose-option (-v) will cause a "long listing", with permissions, size and ownership.

Adding the verbose-option (-v) in Copy-In, Copy-Out and Pass-Through mode, will cause cpio to list the files as they're extracted/archived/copied.

Using ls to create an archive (verbosely) with all doc-files in the current directory:

$ ls *.doc | cpio -ov > word-docs.cpio

Using find to create an archive with all txt-files in and below the current directory:

$ find . -name "*.txt" | cpio -ov > text-files.cpio

Using find and fgrep to create an archive of just the txt-files containing the word wiki (any case):

$ find . -name "*.txt" -exec fgrep -l -i "wiki" {} \; | cpio -ov > wiki.cpio

For fgrep the option -i means "ignore case", and the option -l cause it to just list the filenames of files matching the pattern.

Using an existing list of files:

$ cpio -ov < file-list.txt > archive.cpio

Using several list of files, but first after sort-ing and uniq-ing them:

$ cat files1 files2 files3 | sort | uniq | cpio -ov > myfiles.cpio

To add more files, use the append-option (-A). Specify the file with the file-option (-F):

$ cat files4 | cpio -ovA -F myfiles.cpio

To extract files (being verbose):

$ cpio -iv < myfiles.cpio

cpio doesn't create directories by default, so use the option -d to make it.

To extract files, while creating directories as needed:

$ cpio -ivd < myfiles.cpio

To list the content of an archive, short listing:

$ cpio -t < myfiles.cpio

To list the content of an archive, long listing:

$ cpio -tv < myfiles.cpio

Links:

Provides archiving services like tar but with different command-line syntax; provides more archive formats than tar. Because pax does not assume the tape device, some prefer it to tar.

Archive formats to be supported at minimum per POSIX are cpio, pax, and ustar. The FreeBSD pax tool does not support the pax archive format; the pax format is supported by AIX and Solaris.

Although covered by POSIX, pax is usually not installed per default in Linux distributions; tar sees continued use instead. Even when installed as an additional package, pax for Linux does not support the POSIX-required pax archive format.

Links:

bzip2

edit

bzip2 and bunzip2 are similar to "gzip"/"gunzip" but with a different compression method. Compression is generally better but slower than "gzip". Decompression is somewhat fast.

An option of -1 through -9 can be used to specify how good bzip2 should compress. The number tells how large "chunks" in steps of 100kB should compress at a time, so using bzip2 -5 foo.bar will compress foo.bar in chunks of 500kB each. Generally, larger chunks means better compression (but probably slower). Only undamaged "chunks" can be recovered with bzip2recover from a damaged bzip2-file, so if you've compressed 900kB chunks, you'll loose 900kB of your file if one chunk is damaged - but only 100kB if you used 100kB chunks (bzip2 -1). By default bzip2 uses 900kB chunks for best possible compression.

bzcat is same as bunzip2 -c which is bzip2 -dc.

Links:

Adds files to a compressed zip archive. You can extract files from a zip archive using unzip. The zip format is a common archiving file format used on Microsoft Windows PCs. A zip archive has members compressed individually; imagine gzip of every file before tar-ing them, but with a different format.

Like for gzip the quality of the compression can be specified by giving a number between 1 and 9 as an option (e.g. zip -5). 1 is quickest, but gives a low-quality compression. 9 gives the highest quality of compression, but is slow. In addition 0 can be used (i.e. zip -0) to specify that the files should just be "stored" and not compressed (a compression of 0%), thus making it possible to use zip to make uncompressed archives.

Note that a zip-archive contains individualy compressed files collected into a single file. This is the opposite of how it's done for most other compressed Unix-archives (e.g. tar.gz and tar.bz2), where the files/directories are first collected into a single file -- an archive (e.g. cpio or tar), and then this single file is compressed (e.g. using gzip or bzip2).

Examples:

  • zip archive.zip file.txt
    • Adds the file to the archive. If the archive does not exist, creates it.
  • zip archive file.txt
    • As above; adds the .zip extension automatically, creating archive.zip.
  • cat filelist.txt | zip archive.zip -@
    • Adds the files listed in filelist.txt to the archive.
  • zip -0 archive.zip file.txt
    • Adds the file to the archive, making no compression, merely storing the file.
  • zip archive.zip file1.txt file2.txt file3.txt
    • Adds multiple files to the archive.
  • zip -r archive.zip .
    • Adds all files in the current directory and the sub-directories into the archive except for the archive itself, preserving the directory nesting information.
  • zip -r -j archive.zip .
    • As above, but without the directory nesting information. Thus, each file is tracked under its file name only in the archive.
  • zip -h2
    • Outputs extended help, longer than the -h one.

Links:

unzip

edit

Extracts files from zip archives. See also zip. You can get a Windows version of Info-ZIP unzip from GnuWin32. FreeBSD appears to be using a custom version of unzip, distinct from Info-ZIP yet largely compatible with it.

Examples:

  • unzip archive.zip
    • Extracts all files from the archive.
  • unzip archive.zip file.txt
    • Extracts a particular file from the archive.
  • unzip -l archive.zip
    • Lists files contained in the archive without extracting them.

Links:

compress

edit

compress is a compressed file format that is popular on UNIX systems. Files compressed with compress will have a ".Z" extension appended to its name.

Links:

uncompress

edit

Extracts files from an archive created by compress.

Links:

unar

edit

Extracts files from a variety of compression formats, including 7z (7-zip) and RAR. License: LGPL. A companion utility to show archive file listing is lsar.

Links:

shar

edit

A legacy tool and a file format to create self-extracting file archives, not covered by POSIX. The self-extracting archives are shell scripts.

Links: