Mathematics of compound symbols

Probability

Since a compound symbol cij is defined

cij ≜ ⟨ xi, yj
Probability of the compound symbol, p(cij) = p(⟨xi, yj⟩) = p(xi, yj) is a joint probability.

Hence,

p(C sub ij) = product of p(x sub i Given y sub j) and p(x sub i) = product of p(y sub j Given x sub i) and p(y sub j)
Similarly for a word wi given by wi ≜ ⟨xi,0, xi,1, xi,2, …, xi,n−1⟩, probability (joint) of the word is
p(wi) = p(⟨xi,0, xi,1, xi,2, …, xi,n−1⟩) = p(xi,0, xi,1, xi,2, …, xi,n−1)

Joint Entropy

Since the set of all compound symbols,

C = set of all possible C sub ij
And probability of a compound symbol is the joint probability, p(cij) = p(xi, yj). Thus, entropy of the set of all compound symbols H(C) is given by
H(C) = for all C sub ij, sum of p(C sub ij) times log2(1 over p(C sub ij))
Thus, H(C) ≡ H(X,Y).

In other words, H(C) and H(X,Y) are one and the same (≡) and not just in quantity (=), i.e., it is true for all values of C, X and Y.

What does it mean (practical) to say, H (C) ≡ H(X,Y)?

H(X,Y) = for all x sub i and for all y sub j p(x sub i, y sub j) times log2(1 over p(x sub i, y sub j)) = for all C sub ij, sum of p(C sub ij) times log2(1 over p(C sub ij)) = H(C)

Equivocation

Since,

p(x, y) = p(x|y) p(y)
    ⇒ log p(x, y) = log p(x|y) + log p(y)
alternatively
p(x, y) = p(y|x) p(x)
    ⇒ log p(x, y) = log p(y|x) + log p(x)
Thus,
H(X,Y) = for all x sub i, for all y sub j, sum of p(y sub j given x sub i) times p(x sub i) times log2 (1 over p(x sub i)) + for all x sub i, for all y sub j, sum of p(x sub i, y sub j) times log2 (1 over p(y sub j given x sub i))

What is the practical meaning of H(Y|X)?


Equivocation or conditional entropy, H(Y|X) implies that, given the knowledge of X, H(Y|X) measures the uncertainty remaining in Y. For instance, if Y = f(x) then it implies that knowing x tell us about Y.

What does uncertainty remaining in Y mean, i.e., what does H(Y|X) mean?


To answer let us consider three cases:

  1. Given all the knowledge of X if it tells us nothing about Y, then uncertainty remaining in Y, H(Y|X) = H(Y).
  2. Given all the knowledge of X if it tells us everything about Y, then uncertainty remaining in Y, H(Y|X) = 0.
  3. Given some the knowledge of X so that there is uncertainty in Y, then uncertainty remaining in Y, H(Y|X) ≤ H(Y).
Thus, possible range of uncertainty remaining in Y is
0 ≤ H(Y|X) ≤ H(Y)

Next:

Chain Rule of Entropy (p:3) ➽