Mutual Information

In general,

H(Y|X) ≠ H(X|Y)
but, because of the chain rule:
H(C) = H(X) + H(Y|X) = H(Y) + H(X|Y) H(X) − H(X|Y) = H(Y) − H(Y|X)
Recall that, H(Y|X) measures the uncertainty remaining in Y. Thus, Given the knowledge of Y, the reduction in uncertainty in X is called the mutual information. I(X; Y) or I(Y; X) is called Mutual Information and implies that mutual information is reciprocal.

What is the practical meaning of I(X; Y)?

Mutual information, I(X; Y) implies that, given the knowledge of Y, I(X; Y) measures how much we know about X.

To further elucidate this let us consider three cases:

1. If x = f(y) then it implies that knowing y tells us about x. In other words, given any y all the x are known and hence there is no uncertainty H(X|Y) = 0. Therefore
I(X; Y) = H(X) − H(X|Y)
= H(X)
which is maximum mutual information.
2. If xf(y) then it implies that knowing x and y are independent to each other. In other words, given any y nothing about x is known and hence there is maximum uncertainty H(X|Y) = H(X). Therefore
I(X; Y) = H(X) − H(X|Y)
= 0
which is no (= minimum) mutual information.
3. Given some knowledge of Y so that there is uncertainty in X and uncertainty remaining in X is H(X|Y) ≤ H(X).
Thus, possible range of mutual information is
0 ≤ I(X;Y) ≤ H(X)

Notice that:

With minimum uncertainty, mutual information is maximum

and

With maximum uncertainty, mutual information is minimum

Application example of mutual information:

Consider a communication system. Here information source X emits symbols x0, x1, x2, x3, … into information channel which in turn emits y0, y1, y2, y3, … into sink Y. Therefore,
1. In ideal communication system, every symbol yi emitted by the information channel communicates with every symbol xi emitted by the information source. In other words,

Y should tell us about X

and

X should tell us about Y.

Therefore, given Y uncertainty remaining in X is minimum, H(X|Y) = 0. Hence
I(X; Y) = H(X) − H(X|Y)
= H(X)
which is maximum mutual information.
2. In non-ideal communication system, every symbol yi emitted by the information channel may not communicate with every symbol xi emitted by the information source. Therefore, given Y there exists some uncertainty remaining in X, H(X|Y) ≠ 0.
Since the range of equivocation is
0 ≤ H(X|Y) ≤ H(X)
but H(X|Y) ≠ 0 thus
0 < H(X|Y) ≤ H(X)
Therefore
I(X; Y) = H(X) − H(X|Y)
< H(X)
which means that information is lost in transmission.

Illustration of "loss in transmission"

Let us assume four binary digits x0, x1, x2 & x3 fed into an encoder box encoding three binary digits c4, c5 & c6. The outputs: x0, x1, x2, x3, c4, c5 & c6 are called Code Word and also called Hamming code (if systematically done).

For instance,

x0, x1, x2, x3, c4, c5 & c6: 7 outputs x0, x1, x2 & x3: 4 inputs
Thus, Hamming code (7,4).

Observe that c4, c5 & c6 are redundant data or error correction. Unlike information error, we don't mind data error. Also note that ⟨c4, c5, c6⟩ = f(x0, x1, x2, x3).

Other applications of mutual information (other than communication system):

• Instrumentation
• Numerical algorithm
• Signal processing (eg. Filters)

Next:

Summary (p:5) ➽