Mathematics of compound symbols

Probability

Since a compound symbol c_ij is defined

c_ij ≜ ⟨ x_i, y_j ⟩ Probability of the compound symbol, p(c_ij) = p(⟨x_i, y_j⟩) = p(x_i, y_j) is a joint probability.

Hence,

$p(C sub ij) = product of p(x sub i Given y sub j) and p(x sub i) = product of p(y sub j Given x sub i) and p(y sub j)$

Similarly for a word w_i given by w_i ≜ ⟨x_i_,0, x_i_,1, x_i_,2, …, x_i_,n−1⟩, probability (joint) of the word is p(w_i) = p(⟨x_i_,0, x_i_,1, x_i_,2, …, x_i_,n−1⟩) = p(x_i_,0, x_i_,1, x_i_,2, …, x_i_,n−1)

❶

Joint Entropy

Since the set of all compound symbols,

$C = set of all possible C sub ij$

And probability of a compound symbol is the joint probability, p(c_ij) = p(x_i, y_j). Thus, entropy of the set of all compound symbols H(C) is given by

$H(C) = for all C sub ij, sum of p(C sub ij) times log2(1 over p(C sub ij))$

Thus, H(C) ≡ H(X,Y).

In other words, H(C) and H(X,Y) are one and the same (≡) and not just in quantity (=), i.e., it is true for all values of C, X and Y.

What does it mean (practical) to say, H (C) ≡ H(X,Y)?

☛

$H(X,Y) = for all x sub i and for all y sub j p(x sub i, y sub j) times log2(1 over p(x sub i, y sub j)) = for all C sub ij, sum of p(C sub ij) times log2(1 over p(C sub ij)) = H(C)$

❷

Equivocation

Since,

p(x, y) = p(x|y) p(y)
⇒ log p(x, y) = log p(x|y) + log p(y)
alternatively
p(x, y) = p(y|x) p(x)
⇒ log p(x, y) = log p(y|x) + log p(x) Thus,

$H(X,Y) = for all x sub i, for all y sub j, sum of p(y sub j given x sub i) times p(x sub i) times log2 (1 over p(x sub i)) + for all x sub i, for all y sub j, sum of p(x sub i, y sub j) times log2 (1 over p(y sub j given x sub i))$

What is the practical meaning of H(Y|X)?

☛

Equivocation or conditional entropy, H(Y|X) implies that, given the knowledge of X, H(Y|X) measures the uncertainty remaining in Y. For instance, if Y = f(x) then it implies that knowing x tell us about Y.

What does uncertainty remaining in Y mean, i.e., what does H(Y|X) mean?

☛

To answer let us consider three cases:

Given all the knowledge of X if it tells us nothing about Y, then uncertainty remaining in Y, H(Y|X) = H(Y).
Given all the knowledge of X if it tells us everything about Y, then uncertainty remaining in Y, H(Y|X) = 0.
Given some the knowledge of X so that there is uncertainty in Y, then uncertainty remaining in Y, H(Y|X) ≤ H(Y).

Thus, possible range of uncertainty remaining in Y is 0 ≤ H(Y|X) ≤ H(Y)

❸

Next:

Chain Rule of Entropy (p:3) ➽

✪

Lectures on Information Theory

Section 1

Section 2

Section 3

Probability

Joint Entropy

Equivocation