Why should I compress my data
Running any of these commands to create compressed directories or files does not remove the originals. If you are trying to save space, you will need to delete the originals once you are confident you have what you need.
Creating a tarball (compressing a directory)
A common way to compress large amounts of data in linux is to create a tar ball. A tarball is a file that has been tarred up and then compressed. Tar is a command initially written to write files to tape for backups (T[ape]AR[chriver]). You have probably seen tarballs when downloading software for linux.
The simplest way to crea t tarball with gzip compressions is to do it on the directory level. LIke the following
tar zcvf directory_name.tar.gz directory_name/
To do the same, but compress with bzip, do the following
tar jcvf directory_name.tar.gz directory_name/
Note - these do not remove the original directories
Compress and decompress individual files
If you want to compress an individual file:
Using gzip:
gzip file_to_compress
Using bzip2:
bzip2 file_to_compress
Decompress using gzip:
gunzip file_to_decompress.gz
Decompress a file using bzip2:
bunzip2 file_to_decompress.bz
Working with tarballs.
To get the list of files/directories a gzip-compressed tarball:
tar ztvf filename.tar.gz
To get the list of files/directories in a bzip2-compressed tarball:
tar jtvf filename.tar.bz2
Extracting a tarball:
tar zxvf filename.tar.gz
or
tar jxvf mycompressedfile.tar.bz2
Creating tarballs with parallel compression tools
Sometimes the compresion can be the longest part of creating a tarball. In these cases you can use an external parallel compression tool to speed things up. To do this you call an external program to do the compression rather than call it directly
To create a tarball using the pigz (parallel gzip) use the following:
tar -cvf directory_name.tar.gz -I pigz directory_name
To create a tarball using pbzip2 (parallel bzip) use the following:
tar -cvf directory_name.tar.bz2 -I pbzip2 directory_name
Note: you can use the same method when decompressing tarballs
Better understanding tar options
- x - extract an archive
- c - create an archive
- z - gzip or gunzip the archive
- j - bzip or bunzip the archive
- v - use verbose mode. this shows what it is doing rather than do it silently
- f - file to use or create
- t - list the files in an archive
- J - use the less used xz compression method