Guess Who Solver

This script implements a strategy for the game of Guess Who? using the concept of conditional mutual information.

We first run through a demo of how the strategy works. The Python functions that are used in the demo will be shown at the end.

Demo

We will use a simplified Guess Who set-up with only 6 names and 5 features or characteristics, as shown in the following dataframe:

(1 represents 'Yes', 0 represents 'No')

In [10]:
df
Out[10]:
Name Female Glasses Short Hair Brown Eyes Headgear
0 Alice 1 0 0 1 1
1 Bob 0 1 1 1 0
2 Charles 0 0 1 0 0
3 Diane 1 1 0 1 0
4 Emma 1 0 1 1 1
5 Fred 0 0 0 0 0

Intuitively, we should first query the features that divide the dataset into half. These are Female and Short Hair.

Our intuition agrees with the conditional mutual information between each feature and the set of all names. Female and Short Hair have the highest CMI, assuming we don't know the values of any of the features:

In [11]:
known_features = {}
features_by_cmi(df,known_features)
Known features: {}

Rows matching known features:
Name Female Glasses Short Hair Brown Eyes Headgear
0 Alice 1 0 0 1 1
1 Bob 0 1 1 1 0
2 Charles 0 0 1 0 0
3 Diane 1 1 0 1 0
4 Emma 1 0 1 1 1
5 Fred 0 0 0 0 0
Conditional mutual information of remaining features:

Short Hair  : 1.000
Female      : 1.000
Headgear    : 0.918
Glasses     : 0.918
Brown Eyes  : 0.918

Suppose we have some prior information that the person in question has brown eyes. This information changes the mutual information provided by the remaining features. In fact, Female now becomes the least informative, since 3 of the 4 people with brown eyes are female.

In [12]:
known_features = {'Brown Eyes' : 1}
features_by_cmi(df,known_features)
Known features: {'Brown Eyes': 1}

Rows matching known features:
Name Female Glasses Short Hair Brown Eyes Headgear
0 Alice 1 0 0 1 1
1 Bob 0 1 1 1 0
3 Diane 1 1 0 1 0
4 Emma 1 0 1 1 1
Conditional mutual information of remaining features:

Headgear    : 1.000
Short Hair  : 1.000
Glasses     : 1.000
Female      : 0.811

Since Headgear has one of the highest CMIs, let's query that next. Suppose we get the answer 0. We then see that asking about Glasses will offer us no information at all, since both remaining candidates wear glasses.

In [13]:
known_features = {'Brown Eyes' : 1, 'Headgear' : 0}
features_by_cmi(df,known_features)
Known features: {'Brown Eyes': 1, 'Headgear': 0}

Rows matching known features:
Name Female Glasses Short Hair Brown Eyes Headgear
1 Bob 0 1 1 1 0
3 Diane 1 1 0 1 0
Conditional mutual information of remaining features:

Short Hair  : 1.000
Female      : 1.000
Glasses     : 0.000

Finally, we ask about Short Hair and get the answer 0. Since there is only one row remaining, we have found the answer!

In [14]:
known_features = {'Brown Eyes' : 1, 'Headgear' : 0, 'Short Hair' : 0}
features_by_cmi(df,known_features)
Known features: {'Brown Eyes': 1, 'Headgear': 0, 'Short Hair': 0}

Rows matching known features:
Name Female Glasses Short Hair Brown Eyes Headgear
3 Diane 1 1 0 1 0
Conditional mutual information of remaining features:

Glasses     : 0.000
Female      : 0.000

Python script

We now show the scripts that were used to create the above demo. If you're running this as an IPython notebook, you have to run all the following code cells first, before running the demo above!

First we define the dataframe that we will be using:

In [1]:
import numpy as np
import pandas as pd
from functools import reduce

# Initial data
names = ['Alice', 'Bob', 'Charles', 'Diane', 'Emma', 'Fred']
features = ['Female', 'Glasses', 'Short Hair', 'Brown Eyes', 'Headgear']
data_array = np.array([[1, 0, 0, 1, 1],
                       [0, 1, 1, 1, 0],
                       [0, 0, 1, 0, 0],
                       [1, 1, 0, 1, 0],
                       [1, 0, 1, 1, 1],
                       [0, 0, 0, 0, 0]
                      ])

# Create DataFrame from above data, placing names in the first column.
df = pd.DataFrame(data_array, columns = features)
df.insert(0, 'Name', names)
df
Out[1]:
Name Female Glasses Short Hair Brown Eyes Headgear
0 Alice 1 0 0 1 1
1 Bob 0 1 1 1 0
2 Charles 0 0 1 0 0
3 Diane 1 1 0 1 0
4 Emma 1 0 1 1 1
5 Fred 0 0 0 0 0

We then define a function that lets us filter this dataframe by specifying the values of certain features, and another function that returns the probability of observing a particular combination of known features and names.

In [2]:
def filter_by_features(df, known_features):
    if len(known_features) == 0:
        return df
    else:
        return df[reduce(lambda x,y: x&y, [df[feature] == val  for feature,val in known_features.items()])] 
    
def prob(df, known_features, name = None):
    filtered_df = filter_by_features(df, known_features)
    
    if name is None:
        return len(filtered_df)/len(df)
    else:
        return len(filtered_df[filtered_df['Name'] == name])/len(df)    

If we filter by the features {'Female' : 1, 'Short Hair' : 0}, we obtain the following two rows:

In [3]:
known_features = {'Female' : 1, 'Short Hair' : 0}
filter_by_features(df, known_features)
Out[3]:
Name Female Glasses Short Hair Brown Eyes Headgear
0 Alice 1 0 0 1 1
3 Diane 1 1 0 1 0

Since the filtered dataset has only 2 rows out of the original 6, the probability of observing this combination of known features is $1/3$:

In [4]:
prob(df,known_features)
Out[4]:
0.3333333333333333

We may also calculate the probability of someone being called 'Alice':

In [5]:
prob(df, {}, 'Alice')
Out[5]:
0.16666666666666666

Finally, we define a function that computes the mutual information between a single feature and the set of all names, conditioned on the knowing the values of some other features:

$$ I(X,Y | z) = \Sigma_{x \in X} \Sigma_{y \in Y} p(x,y|z) \log_2 \frac{p(x,y|z)}{p(x|z) p(y|z)} $$

Here $Y$ is the set of all names, and each $y$ is one particular name. $X = {0,1}$ is the set of values for one particular feature, so each $x$ is either $0$ or $1$. The symbol $z$ stands for a particular combination of values for some of the other features that are not $X$.

The value $I(X,Y|z)$ tells us the number of bits of information that knowing about the value of $X$ tells us about the value of $Y$, given that we already know $z$.

In [6]:
def conditional_mutual_info(df, feature, known_features):
    filtered_df = filter_by_features(df, known_features)
    
    cmi_sum = 0
    for val in [0,1]:
        for name in set(df['Name']):
            p_valname = prob(filtered_df, {feature: val}, name)
            if p_valname > 0:
                p_val     = prob(filtered_df, {feature: val})
                p_name    = prob(filtered_df, {}, name)
                
                cmi_sum += p_valname * np.log2( p_valname/(p_val * p_name) )
        
    return cmi_sum

We also write a script that prints out a sorted list of values of $I(X,Y|z)$ for each feature $X$ that is not specified by $z$, so that we can compare which feature has the highest mutual information.

In [7]:
def features_by_cmi(df, known_features):
    remaining_features = [feature for feature in features if feature not in known_features.keys()]    
    cmi = [conditional_mutual_info(df, feature, known_features) for feature in remaining_features]
    
    order = np.argsort(cmi)[::-1]
    
    print('Known features: {}\n'.format(known_features))
    print('Rows matching known features:')
    display(filter_by_features(df, known_features))
    print('Conditional mutual information of remaining features:\n')
    [print("{:12}: {:.3f}".format(remaining_features[i], cmi[i])) for i in order]

Trying this on the above set of known features, we see that asking about Headgear or Glasses next would give us 1 bit of information about the name, whereas asking about Brown Eyes tells us nothing:

In [9]:
features_by_cmi(df,known_features)
Known features: {'Female': 1, 'Short Hair': 0}

Rows matching known features:
Name Female Glasses Short Hair Brown Eyes Headgear
0 Alice 1 0 0 1 1
3 Diane 1 1 0 1 0
Conditional mutual information of remaining features:

Headgear    : 1.000
Glasses     : 1.000
Brown Eyes  : 0.000