You are here: Home » Content » Huffman Coding

Huffman Coding

Module by: Tuan Do-Hong

One particular source coding algorithm is the Huffman encoding algorithm. It is a source coding algorithm which approaches, and sometimes achieves, Shannon's bound for source compression.

Huffman encoding algorithm

  1. Sort source outputs in decreasing order of their probabilities
  2. Merge the two least-probable outputs into a single output whose probability is the sum of the corresponding probabilities.
  3. If the number of remaining outputs is more than 2, then go to step 1.
  4. Arbitrarily assign 0 and 1 as codewords for the two remaining outputs.
  5. If an output is the result of the merger of two outputs in a preceding step, append the current codeword with a 0 and a 1 to obtain the codeword the the preceding outputs and repeat step 5. If no output is preceded by another output in a preceding step, then stop.
Example 1 
XABCD X A B C D with probabilities { 12 1 2 , 14 1 4 , 18 1 8 , 18 1 8 }
Figure 1
Average length=121+142+183+183=148 Average length 1 2 1 1 4 2 1 8 3 1 8 3 14 8 . As you may recall, the entropy of the source was also HX=148 H X 14 8 . In this case, the Huffman code achieves the lower bound of 148bitsoutput 14 8 bits output .
In general, we can define average code length as
¯=x X ¯ pXxx x x X ¯ p X x x (1)
where X ¯ X ¯ is the set of possible values of xx.
It is not very hard to show that
HX¯>HX+1 H X H X 1 (2)
For compressing single source output at a time, Huffman codes provide nearly optimum code lengths.
The drawbacks of Huffman coding
  1. Codes are variable length.
  2. The algorithm requires the knowledge of the probabilities, pXx p X x for all x X ¯ x X ¯ .
Another powerful source coder that does not have the above shortcomings is Lempel and Ziv.

Comments, questions, feedback, criticisms?

Send feedback