One particular
source coding
algorithm is the Huffman encoding algorithm. It is a source coding
algorithm which approaches, and sometimes achieves, Shannon's bound for
source compression.
Huffman encoding algorithm
- Sort source outputs in decreasing order of their probabilities
- Merge the two least-probable outputs into a single output whose
probability is the sum of the corresponding probabilities.
- If the number of remaining outputs is more than 2, then go to
step 1.
- Arbitrarily assign 0 and 1 as codewords for the two remaining
outputs.
- If an output is the result of the merger of two outputs in a
preceding step, append the current codeword with a 0 and a 1 to
obtain the codeword the the preceding outputs and repeat step 5. If
no output is preceded by another output in a preceding step, then
stop.
Example 1
X∈ABCD
X
A
B
C
D
with probabilities {
12
1
2
,
14
1
4
,
18
1
8
,
18
1
8
}
Average length=121+142+183+183=148
Average length
1
2
1
1
4
2
1
8
3
1
8
3
14
8
. As you may recall, the entropy of the source was
also
HX=148
H
X
14
8
.
In this case, the Huffman code achieves the lower bound of
148bitsoutput
14
8
bits
output
.
In general, we can define average code length as
ℓ¯=∑x∈
X
¯
pXxℓx
ℓ
x
x
X
¯
p
X
x
ℓ
x
(1)
where
X
¯
X
¯
is the set of possible values of
xx.
It is not very hard to show that
HX≥ℓ¯>HX+1
H
X
ℓ
H
X
1
(2)
For compressing single source output at a time, Huffman codes
provide nearly optimum code lengths.
The drawbacks of Huffman coding
- Codes are variable length.
- The algorithm requires the knowledge of the probabilities,
pXx
p
X
x
for all
x∈
X
¯
x
X
¯
.
Another powerful source coder that does not have the above
shortcomings is Lempel and Ziv.