*H*(

*Y*) <

*H*(

*C*)

Since, *p*(*x _{i}*,

*η*) =

_{j}*p*(

*x*)

_{i}*p*(

*η*) ≠ 0

_{j}*H*(

*C*) =

*H*(

*X*) +

*H*(

*N*|

*X*) =

*H*(

*X*) +

*H*(

*N*) = 1 + 0.9212928 = 1.9212928

*H*(

*Y*) <

*H*(

*C*).

## Most functions are information lossy

For the system with minimal specifications

This can be shown in directed graph as

But we know that *p*(*Y*) = transitional probability × *p*(*X*).

Thus,

*p*(

*y*

_{0}) =

*p*(

*y*

_{0}|

*x*

_{0})

*p*(

*x*

_{0}) ⇒

*p*(

*y*

_{0}|

*x*

_{0}) =

*p*(

*y*

_{0}) ÷

*p*(

*x*

_{0}) = 0.05 ÷ 0.5 = 0.1

*p*(

*y*

_{1}) =

*p*(

*y*

_{1}|

*x*

_{0})

*p*(

*x*

_{0}) ⇒

*p*(

*y*

_{1}|

*x*

_{0}) =

*p*(

*y*

_{1}) ÷

*p*(

*x*

_{0}) = 0.4 ÷ 0.5 = 0.8

*p*(

*y*

_{3}) =

*p*(

*y*

_{3}|

*x*

_{1})

*p*(

*x*

_{1}) ⇒

*p*(

*y*

_{3}|

*x*

_{1}) =

*p*(

*y*

_{3}) ÷

*p*(

*x*

_{1}) = 0.4 ÷ 0.5 = 0.8

*p*(

*y*

_{4}) =

*p*(

*y*

_{4}|

*x*

_{1})

*p*(

*x*

_{1}) ⇒

*p*(

*y*

_{4}|

*x*

_{1}) =

*p*(

*y*

_{4}) ÷

*p*(

*x*

_{1}) = 0.05 ÷ 0.5 = 0.1

*x*combine with symbol

_{i}*η*to give symbol

_{j}*y*, i.e.,

_{k}*x*+

_{i}*η*=

_{j}*y*.

_{k}Therefore,

Given symbol *x _{i}* we know about

*y*

_{k}Also

Given symbol *y _{k}* we know about

*η*

_{j}Thus *p*(*y _{k}*|

*x*) =

_{i}*p*(

*η*|

_{j}*x*+

_{i}*η*=

_{j}*y*).

_{k}
Hence we compute the mutual information between *X* and *Y* following the addition of *N* as follows

*X*there is uncertainty in

*Y*, i.e., information is lost in transmission. This can further be explained as follows. For this case

*H*(

*X*) = 1 and

*I*(

*X*;

*Y*) =

*H*(

*X*) −

*H*(

*X*|

*Y*) ⇒

*H*(

*X*|

*Y*) = 1 − 0.90 = 0.1

*Y*there is some uncertainty remaining in knowledge of

*X*. Hence,

**"Entropy of outcome of a function is not necessarily the same as entropy of the compound symbol".**

This explains why in the above example system

*H*(

*Y*) <

*H*(

*C*)

or

*H*(

*Y*) <

*H*(

*X*;

*N*)

*H*(

*Y*) ≤

*H*(

*C*)

but never

*H*(

*Y*) ≥

*H*(

*C*)

*x*+

_{i}*η*=

_{j}*y*=

_{k}*f*(

*x*,

_{i}*η*) we state

_{j}**"Most functions are information lossy".**

## Identifying confounding causes for information loss and solution to improve SNR (signal-to-noise ratio)

In our example the statement **"Most functions are information lossy".** applies because

*y*

_{2}receives inputs from both

*x*

_{0}and

*x*

_{1}, and

**confounds the input**.

*How can this confounding of input be minimized?*

One solution is

**Normalization to improve SNR (signal–to–noise ratio).**

In our example

*N*= {−1, 0, +1}

*p*(

*N*) = {0.1, 0.8, 0.1}

*p*(

*N*) = {0.1, 0.8, 0.1} unchanged

*N*= {−1, 0, +1} is normalized to

whose directed graph is

Therefore we have

Notice that *H*(*X*, *N*) = *H*(*Y*). This is because

*H*(

*X*,

*N*) =

*H*(

*C*) =

*H*(

*X*) +

*H*(

*N*) = 1.921928

*I*(

*X*;

*Y*) = 1.

Therefore information loss from the original symbol *X* after the output *Y* is given by

*H*(

*X*|

*Y*) =

*H*(

*X*) −

*I*(

*X*;

*Y*) = 1 − 1 = 0

❷

*Next:*

Information in terms of usability (useful/useless) (p:2) ➽