Information scoring – a first look at information theory

The relative information scoring system used in the Credence Game might seem a bit strange the first time you see it, despite being told that it satisfies some nice properties. This post, intended for mathematically curious non-experts, is an introduction to relative information in the form of an example to help build up its intuitive meaning. You can also read the Wikipedia articles on information theory and logarithmic scoring for a more technical discussion, and playing the game itself will give you a first-person experience of what information theory feels like!

Imagine we’re at a party. The host draws a name from a hat with the following 12 names in it:

Andy
(Canadian)
Bob
(Canadian)
Chuck
(Canadian)
David
(American)
Eddy
(American)
Frank
(American)
Alice
(Canadian)
Betty
(Canadian)
Caitlin
(Canadian)
Dorothy
(American)
Ethyl
(American)
Flora
(American)

Suppose we all want to answer this yes-or-no question:

Question: Is the winner a Canadian man?

In other words, is the winner one of the bold people below?

Andy
(Canadian)
Bob
(Canadian)
Chuck
(Canadian)
David
(American)
Eddy
(American)
Frank
(American)
Alice
(Canadian)
Betty
(Canadian)
Caitlin
(Canadian)
Dorothy
(American)
Ethyl
(American)
Flora
(American)

Let’s suppose also that the answer is “yes”, so we’re at an initial disadvantage in that we all believe the wrong answer “no” with a credence of 9/12=75%. What will be important for our information score is the credence we give to the truth or the correct answer, which in this case is 3/12=25%.

Scenario 1:

To tease me, the host whispers in my ear that the winner is a man, intensifying my gaze on the remaining possible winners to a bolder shade of yellow:

Andy
(Canadian)
Bob
(Canadian)
Chuck
(Canadian)
David
(American)
Eddy
(American)
Frank
(American)
Alice
(Canadian)
Betty
(Canadian)
Caitlin
(Canadian)
Dorothy
(American)
Ethyl
(American)
Flora
(American)
My credence in the correct answer now jumps to 3/6=50%, twice what is was before. That my credence in the truth has doubled once is summarized by saying I’ve received 1 bit of information about the answer from the host, namely, the gender of the winner. A formula to compute this is

    log2((probability I now assign to the correct answer)/(probability I used to assign to the correct answer)) =
    log2(50%/25%) = log2(2) = 1
If my credence had quadrupled — that is, doubled twice — we’d say I got 2 bits, and if it octupled we’d say I got 3 bits. Of course, quadrupling is the best we can do here without going above 100%!

Scenario 2:

Suppose the host tells you that the winner is Canadian and is a man:

Andy
(Canadian)
Bob
(Canadian)
Chuck
(Canadian)
David
(American)
Eddy
(American)
Frank
(American)
Alice
(Canadian)
Betty
(Canadian)
Caitlin
(Canadian)
Dorothy
(American)
Ethyl
(American)
Flora
(American)

Now you assign a probability of 1=100% to the truth, compared to the 25% you started with, so the amount of information you’ve gained about the answer is

    log2(100%/25%) = log2(4) = 2

This 2 corresponds to the fact that you’ve received 2 bits of information about the answer, namely, the gender and ethnicity of the winner.

We can also say that if you convince me (from Scenario 1) to adopt your opinion — to shift my credence in the truth from 50% to 100% — then you will have provided me with 1 bit of information about the answer:

    log2(100%/50%) = log2(2) = 1

This is your relative information score (relative to me): If I adopt your opinion, then upon finding out the true answer, I can credit you with having moved me “1 bit closer to the truth”. Alternatively, it’s how much closer you are to the truth, “measured in bits”.

In the Credence Game, your information score is always relative to someone with no idea what the answer is, i.e. someone with a credence of 50% in each of the two answers.

Scenario 3:

Now consider a slightly different set of hints. Suppose again that I’m told the winner is a man, so I am currently 50% sure of the correct answer,

Andy
(Canadian)
Bob
(Canadian)
Chuck
(Canadian)
David
(American)
Eddy
(American)
Frank
(American)
Alice
(Canadian)
Betty
(Canadian)
Caitlin
(Canadian)
Dorothy
(American)
Ethyl
(American)
Flora
(American)

so I have gained

    log2(50%/25%) = log2(2) = 1

bits of information from the host’s hint. Then you are told that the winner is a man and isn’t Frank:

Andy
(Canadian)
Bob
(Canadian)
Chuck
(Canadian)
David
(American)
Eddy
(American)
Frank
(American)
Alice
(Canadian)
Betty
(Canadian)
Caitlin
(Canadian)
Dorothy
(American)
Ethyl
(American)
Flora
(American)

You now assign a probability of 3/5=60% to the truth, so using the formula we’ve been using all along, you have gained

    log2(60%/25%) = 1.26

bits of information. This is not an integer, but it makes a lot of sense. For example, you have gained more information than me because you’ve ruled out Frank and I haven’t. The value 0.26=1.26-1 is how much more information you have than me, which by properties of log turns out to equal your relative information score:

    log2(60%/50%) = 0.26

In the Credence Game, recall that you get 26=100·0.26 points for a correct answer with 60% credence. This is where the number 26 comes from: it’s how much information you have relative to someone with a 50% credence in each answer, measured in centibits.






Further remarks

There are a few other things worth noting about information scoring that I avoided discussing in the flow of the above discussion.

Depdendence on the truth: When my credence in the true answer “yes” goes from 25% to 50%, my credence in the wrong answer “no” goes from 75% to 50%. Only someone who knows the correct answer knows that my information gain is

    log2(50%/25%) = 1

and not

    log2(50%/75%) = -0.58
Without knowing the truth, one can compute expected information gain, a quantity which itself has excellent mathematical properties. This is essentially why information theory exists as a field of mathematics. For example, the KL-divergence of your posterior credences from your prior credences is your posterior expectation of the information you gained in the belief update.

Information vs information about: (Scenario 4) Suppose you are simply told that the winner is Andy. You have actually gained further information than just the answer to the question above; namely, you know the exact identity of the winner. But this is not entirely necessary for answering the question. This can be quantified: the amount of information you’ve gained about the identity of the winner is

    log2(1/(1/12)) = 3.58,

whereas the amount of information you’ve gained about the answer to the question is

    log2(1/(1/4)) = 2.

So 1.58 bits of the information about the winner’s identity was redundant to answering the question. Indeed,

    log2(1/(1/3)) = 1.58,
the amount of information you would gain in updating from only knowing the answer to the question to knowing the winner’s full identity.