In general,
☛
Mutual information, I(X; Y) implies that, given the knowledge of Y, I(X; Y) measures how much we know about X.
To further elucidate this let us consider three cases:

If x = f(y) then it implies that knowing y tells us about x. In other words, given any y all the x are known and hence there is no uncertainty H(XY) = 0. Therefore
I(X; Y) = H(X) − H(XY) which is maximum mutual information.
= H(X) 
If x ≠ f(y) then it implies that knowing x and y are independent to each other. In other words, given any y nothing about x is known and hence there is maximum uncertainty H(XY) = H(X). Therefore
I(X; Y) = H(X) − H(XY) which is no (= minimum) mutual information.
= 0  Given some knowledge of Y so that there is uncertainty in X and uncertainty remaining in X is H(XY) ≤ H(X).
Notice that:
With minimum uncertainty, mutual information is maximum
and
With maximum uncertainty, mutual information is minimum
❷Consider a communication system. Here information source X emits symbols x_{0}, x_{1}, x_{2}, x_{3}, … into information channel which in turn emits y_{0}, y_{1}, y_{2}, y_{3}, … into sink Y.
Therefore,

In ideal communication system, every symbol y_{i} emitted by the information channel communicates with every symbol x_{i} emitted by the information source. In other words,
Y should tell us about X
and
X should tell us about Y.
Therefore, given Y uncertainty remaining in X is minimum, H(XY) = 0. HenceI(X; Y) = H(X) − H(XY) which is maximum mutual information.
= H(X) 
In nonideal communication system, every symbol y_{i} emitted by the information channel may not communicate with every symbol x_{i} emitted by the information source. Therefore, given Y there exists some uncertainty remaining in X, H(XY) ≠ 0.
Since the range of equivocation is0 ≤ H(XY) ≤ H(X) but H(XY) ≠ 0 thus0 < H(XY) ≤ H(X) ThereforeI(X; Y) = H(X) − H(XY) which means that information is lost in transmission.
< H(X)
Let us assume four binary digits x_{0}, x_{1}, x_{2} & x_{3} fed into an encoder box encoding three binary digits c_{4}, c_{5} & c_{6}.
The outputs: x_{0}, x_{1}, x_{2}, x_{3}, c_{4}, c_{5} & c_{6} are called Code Word and also called Hamming code (if systematically done).
For instance,
Observe that c_{4}, c_{5} & c_{6} are redundant data or error correction. Unlike information error, we don't mind data error. Also note that ⟨c_{4}, c_{5}, c_{6}⟩ = f(x_{0}, x_{1}, x_{2}, x_{3}).
Other applications of mutual information (other than communication system):
 Instrumentation
 Numerical algorithm
 Signal processing (eg. Filters)