9
C O M P R E S S I N G A N D A R C H I V I N G
Hackers often need to download and
install new software, as well as send and
download multiple scripts and large files.
These tasks are easier if these files are com-
pressed and combined into a single file. If you come
from
the Windows world, you will probably recognize
this concept from the
.zip format, which combines and compresses files to
make them smaller for transferring over the internet or removable media.
There are many ways to do this in Linux, and we look at a few of the most
common tools for doing so in this chapter. We also look at the
dd
command,
which allows
you to copy entire drives, including
deleted files on those drives.
What Is Compression?
The interesting subject of compression could fill an entire book by itself,
but for this book we only need a rudimentary understanding of the process.
Compression, as the name implies, makes data smaller, thereby requiring less
94
Chapter 9
storage capacity and making the data easier to transmit. For your purposes
as
a beginning hacker, it will suffice to categorize compression as either
lossy or lossless.
Lossy compression is very effective in reducing the size of files, but the
integrity of the information is lost. In other words, the file after compres-
sion is not exactly the same as the original. This type of compression works
great
for graphics, video, and audio files, where a small difference in the
file is hardly noticeable—
.mp3,
.mp4, and
.jpg are all lossy compression algo-
rithms. If a pixel in a
.jpg file or a single note in an
.mp3 file is changed, your
eye or ear is unlikely to notice the difference—though, of course,
music
aficionados will say that they can definitely tell the difference between an
.mp3 and an uncompressed
.flac file. The strengths of lossy compression are
its efficiency and effectiveness. The compression ratio is very high, meaning
that the resulting file is significantly smaller than the original.
However, lossy compression is unacceptable when you’re sending files
or software and data integrity is crucial. For example,
if you are sending a
script or document, the integrity of the original file must be retained when
it is decompressed. This chapter focuses on this
lossless type of compression,
which is available from a number of utilities and algorithms. Unfortunately,
lossless compression is not as efficient as lossy compression, as you might
imagine,
but for the hacker, integrity is often far more important than com-
pression ratio.