Breaking down the information channel (contd.)
Why I(X; Y3) > I(X; Y2) ?

Assume a DMS X emitting symbols x0, x1, treating both as a single symbol ⟨x0, x1⟩ into an encoder E emitting encoded sequence ⟨x0, x1, ∏⟩ where parity ∏ = sum of bits ≜ x0x1, is an "addition modulo–2" or "exclusive–or" function.

Hence the total information for the DMS is

H(x0, x1, ∏) = H(x0) + H(x1|x0) + H(∏|x0, x1)
= H(X) + H(X) + 0
= 2H(X)
This total information sent into the total number of channel used, 2H(X) × 1 ∕ 3 is called the coded information rate.

Let us now consider the above case with some numbers. Assume that the compound symbols (C) are x0, x1, ∏ all ∈ {0, 1} with p(x0 = 0) = 0.4, p(x1 = 1) = 0.6. Thus p(C) = p(x0) p(x1). Then for four decoding rules (four cases)

In each respective case only four possibilities are noted. This is because 0 (from source) is given out as 0 and similarly 1→1. But when e occurs 0↛1 or 1↛0. In other words, when e occurs all other possibilities of Y3 are declared "error detected and re-transmit". This is therefore also called an Automatic Repeat reQuest (ARQ) system.

Because of the e output it is not possible for either x0 or x1 to be confounded. We can therefore transmit without any information loss through the erasure channel at the cost of sending more data symbols.

Next:

Summary (p:5) ➽