Geometric Huffman Coding

last update 2013-02-06 — Georg Böcherer (geborg@gmail.com)

Dyadic PMFs can be generated by parsing a stream of independent equiprobable data bits by a full prefix-free code. Formally, p is dyadic if p_i=2^{-ell_i}, ell_iinmathbb{N} and sum_i 2^{-ell_i}=1. Such a PMF p is generated by a prefix-free code with codeword lengths ell_i. To match dyadic PMFs to communication channels, the Kullback-Leibler distance between the dyadic PMF and the capacity achieving PMF of the considered channel has to be minimized. See Matching Dyadic Distributions to Channels for details. On this web site I provide implementations of algorithms that find the optimal dyadic PMF.

References

Capacity Achieving Probabilistic Shaping for Noisy and Noiseless Channels

PhD thesis, 2012.

Writing on the Facade of RWTH ICT Cubes: Cost Constrained Geometric Huffman Coding

G. Böcherer, Fabian Altenbach, Martina Malsbender, and Rudolf Mathar, best paper award at ISWCS 2011, Aachen.  — pdf

Matching Dyadic Distributions to Channels

G. Böcherer and R. Mathar, presented at DCC 2011, Snowbird.  — pdf

Downloads

matching.tar — all implementations from this web site. They are tested in Matlab R2010b and Octave 3.4.0.

Geometric Huffman Coding (GHC)

Given is a vector x with non-negative entries. If additionally the entries sum up to one, than x is a PMF. The objective is to minimize

 mathrm{D}(pVert x)=sum_i p_i log frac{p_i}{x_i}

subject to p is a dyadic PMF. Geometric Huffman Coding (GHC) finds the dyadic PMF p that minimizes mathrm{D}(pVert x). GHC constructs a Huffman tree using the following updating rule. Denote by x_{m} and x_{m-1}geq x_m the two smallest entries of x. Then

 x'=left{begin{array}{ll} x_{m-1},&mathrm{if};x_{m-1}geq 4x_m 2sqrt{x_{m-1}x_{m}},&mathrm{if};x_{m-1}<4x_m end{array} right.

For comparison, conventional Huffman coding finds the dyadic PMF p that minimizes mathrm{D}(xVert p) (minimization is here over the second argument) by using the updating rule

 x' = x_{m-1}+x_m.

ghc.m in matching.tar is a Matlab implementation of GHC.

Normalized Geometric Huffman Coding (nGHC)

Given is a target vector x with non-negative entries and a weight vector w with strictly positive entries. The objective is to minimize the Kullback-Leibler distance mathrm{D}(pVert x) normalized by the average weight w^Tp=sum_i p_iw_i, i.e., to minimize the fraction

 frac{mathrm{D}(pVert x)}{w^Tp}

subject to p is a dyadic PMF. Normalized Geometric Huffman Coding (nGHC) iteratively finds the solution. nGHC supersedes the LEC Algorithm stated in Matching Dyadic Distributions to Channels.

nghc.m in matching.tar is a Matlab implementation of nGHC.

Cost Constrained Geometric Huffman Coding (ccGHC)

Given is a target vector x with non-negative entries and a weight vector w with strictly positive entries. The objective is to minimize the Kullback-Leibler distance mathrm{D}(pVert x) subject to an average cost constraint w^Tpleq E over all dyadic PMFs p. Cost Constrained Geometric Huffman Coding (ccGHC) iteratively finds the solution.

ccghc.m in matching.tar is a Matlab implementation of ccGHC.

Miscellaneous m-files

ghc.m in matching.tar is a Matlab implementation of Huffman coding.

kldiv.m in matching.tar is a Matlab implementation of the Kullback-Leibler distance.

Examples

Example 1

example1.m in matching.tar. From Matching Dyadic Distributions to Channels

octave:11> addpath /home/georg/matching
octave:12> q = [0.328 0.32 0.22 0.11 0.022]';
octave:13> pghc = ghc(q)
pghc =

   0.50000
   0.25000
   0.12500
   0.12500
   0.00000

octave:14> phc = hc(q)
phc =

   0.25000
   0.25000
   0.25000
   0.12500
   0.12500

octave:15> kldiv(pghc,q)
ans =  0.13619
octave:16> kldiv(phc,q)
ans =  0.19548
octave:17> phc_ = [hc(q(1:4));0]
phc_ =

   0.25000
   0.25000
   0.25000
   0.25000
   0.00000

octave:18> kldiv(phc_,q)
ans =  0.15523
octave:19>

Example 2

example2.m in matching.tar. Comparison of normalized geometric Huffman coding and Huffman coding for the (d,k) = (10,19) constraint.

example2.m
w = (11:20)';
C = fsolve(@(s) sum(exp(-w*s))-1,1);
t = exp(-w*C);
d_nghc = nghc(t,w);
d_hc = hc(t);
R_nghc = -d_nghc'*log(d_nghc)/(w'*d_nghc);
R_hc = -d_hc'*log(d_hc)/(w'*d_hc);
disp(['codeword lengths of nghc: ' num2str(-log2(d_nghc'))]);
disp(['codeword lengths of hc  : ' num2str(-log2(d_hc'))]);
disp(['rate achieved by nghc:' num2str(R_nghc)]);
disp(['rate achieved by hc:  ' num2str(R_hc)]);
octave:7> addpath /home/georg/matching
octave:7> example2
codeword lengths of nghc: 3   3   3   3   3   3   4   4   4   4
codeword lengths of hc  : 2   3   3   3   3   4   4   4   5   5
rate achieved by nghc:0.15273
rate achieved by hc:  0.15265
octave:8>