This script implements a strategy for the game of Guess Who? using the concept of conditional mutual information.

We first run through a demo of how the strategy works. The Python functions that are used in the demo will be shown at the end.

We will use a simplified Guess Who set-up with only 6 names and 5 features or characteristics, as shown in the following dataframe:

*(1 represents 'Yes', 0 represents 'No')*

In [10]:

```
df
```

Out[10]:

Intuitively, we should first query the features that divide the dataset into half. These are `Female`

and `Short Hair`

.

Our intuition agrees with the conditional mutual information between each feature and the set of all names. `Female`

and `Short Hair`

have the highest CMI, assuming we don't know the values of any of the features:

In [11]:

```
known_features = {}
features_by_cmi(df,known_features)
```

Suppose we have some prior information that the person in question has brown eyes. This information changes the mutual information provided by the remaining features. In fact, `Female`

now becomes the least informative, since 3 of the 4 people with brown eyes are female.

In [12]:

```
known_features = {'Brown Eyes' : 1}
features_by_cmi(df,known_features)
```

Since `Headgear`

has one of the highest CMIs, let's query that next. Suppose we get the answer 0. We then see that asking about `Glasses`

will offer us no information at all, since both remaining candidates wear glasses.

In [13]:

```
known_features = {'Brown Eyes' : 1, 'Headgear' : 0}
features_by_cmi(df,known_features)
```

Finally, we ask about `Short Hair`

and get the answer 0. Since there is only one row remaining, we have found the answer!

In [14]:

```
known_features = {'Brown Eyes' : 1, 'Headgear' : 0, 'Short Hair' : 0}
features_by_cmi(df,known_features)
```

We now show the scripts that were used to create the above demo. If you're running this as an IPython notebook, you have to run all the following code cells first, before running the demo above!

First we define the dataframe that we will be using:

In [1]:

```
import numpy as np
import pandas as pd
from functools import reduce
# Initial data
names = ['Alice', 'Bob', 'Charles', 'Diane', 'Emma', 'Fred']
features = ['Female', 'Glasses', 'Short Hair', 'Brown Eyes', 'Headgear']
data_array = np.array([[1, 0, 0, 1, 1],
[0, 1, 1, 1, 0],
[0, 0, 1, 0, 0],
[1, 1, 0, 1, 0],
[1, 0, 1, 1, 1],
[0, 0, 0, 0, 0]
])
# Create DataFrame from above data, placing names in the first column.
df = pd.DataFrame(data_array, columns = features)
df.insert(0, 'Name', names)
df
```

Out[1]:

We then define a function that lets us filter this dataframe by specifying the values of certain features, and another function that returns the probability of observing a particular combination of known features and names.

In [2]:

```
def filter_by_features(df, known_features):
if len(known_features) == 0:
return df
else:
return df[reduce(lambda x,y: x&y, [df[feature] == val for feature,val in known_features.items()])]
def prob(df, known_features, name = None):
filtered_df = filter_by_features(df, known_features)
if name is None:
return len(filtered_df)/len(df)
else:
return len(filtered_df[filtered_df['Name'] == name])/len(df)
```

If we filter by the features `{'Female' : 1, 'Short Hair' : 0}`

, we obtain the following two rows:

In [3]:

```
known_features = {'Female' : 1, 'Short Hair' : 0}
filter_by_features(df, known_features)
```

Out[3]:

Since the filtered dataset has only 2 rows out of the original 6, the probability of observing this combination of known features is $1/3$:

In [4]:

```
prob(df,known_features)
```

Out[4]:

We may also calculate the probability of someone being called `'Alice'`

:

In [5]:

```
prob(df, {}, 'Alice')
```

Out[5]:

Finally, we define a function that computes the mutual information between a single feature and the set of all names, conditioned on the knowing the values of some other features:

$$ I(X,Y | z) = \Sigma_{x \in X} \Sigma_{y \in Y} p(x,y|z) \log_2 \frac{p(x,y|z)}{p(x|z) p(y|z)} $$Here $Y$ is the set of all names, and each $y$ is one particular name. $X = {0,1}$ is the set of values for one particular feature, so each $x$ is either $0$ or $1$. The symbol $z$ stands for a particular combination of values for some of the other features that are not $X$.

The value $I(X,Y|z)$ tells us the number of bits of information that knowing about the value of $X$ tells us about the value of $Y$, given that we already know $z$.

In [6]:

```
def conditional_mutual_info(df, feature, known_features):
filtered_df = filter_by_features(df, known_features)
cmi_sum = 0
for val in [0,1]:
for name in set(df['Name']):
p_valname = prob(filtered_df, {feature: val}, name)
if p_valname > 0:
p_val = prob(filtered_df, {feature: val})
p_name = prob(filtered_df, {}, name)
cmi_sum += p_valname * np.log2( p_valname/(p_val * p_name) )
return cmi_sum
```

We also write a script that prints out a sorted list of values of $I(X,Y|z)$ for each feature $X$ that is not specified by $z$, so that we can compare which feature has the highest mutual information.

In [7]:

```
def features_by_cmi(df, known_features):
remaining_features = [feature for feature in features if feature not in known_features.keys()]
cmi = [conditional_mutual_info(df, feature, known_features) for feature in remaining_features]
order = np.argsort(cmi)[::-1]
print('Known features: {}\n'.format(known_features))
print('Rows matching known features:')
display(filter_by_features(df, known_features))
print('Conditional mutual information of remaining features:\n')
[print("{:12}: {:.3f}".format(remaining_features[i], cmi[i])) for i in order]
```

Trying this on the above set of known features, we see that asking about `Headgear`

or `Glasses`

next would give us 1 bit of information about the name, whereas asking about `Brown Eyes`

tells us nothing:

In [9]:

```
features_by_cmi(df,known_features)
```