Precision and Recall

Recall = P( predict A | truth is A )

Precision = P( truth is A | predict A )

Applying Metrics to Your POI Identifier

In [1]:
import sys
sys.path.append("C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/tools/")
sys.path.append('C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/choose_your_own')
sys.path.append('C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/datasets_questions')

import os
os.chdir('C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/evaluation')
In [2]:
import pickle
from feature_format import featureFormat, targetFeatureSplit

data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )

### add more features to features_list!
features_list = ["poi", "salary"]

data = featureFormat(data_dict, features_list)
labels, features = targetFeatureSplit(data)

# decision tree code from the previous lesson
from sklearn.tree import DecisionTreeClassifier
from sklearn import cross_validation
from sklearn.metrics import confusion_matrix, precision_score, recall_score, classification_report

features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(features, labels, test_size=0.3, random_state=42)

clf = DecisionTreeClassifier()
clf.fit(features_train, labels_train)

pred = clf.predict(features_test)

print confusion_matrix(labels_test, pred)
[[21  4]
 [ 4  0]]

Unpacking Into Precision and Recall

In [3]:
print precision_score(labels_test, pred)
0.0
In [4]:
print classification_report(labels_test, pred)
             precision    recall  f1-score   support

        0.0       0.84      0.84      0.84        25
        1.0       0.00      0.00      0.00         4

avg / total       0.72      0.72      0.72        29

Recall of Your POI Identifier

In [5]:
print recall_score(labels_test, pred)
0.0

How Many True Positives?

In [6]:
predictions = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1] 
true_labels = [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]

cm = confusion_matrix(true_labels, predictions)

print cm, '\n'
print '{0} True positives'.format(cm[1][1])
print '{0} True negatives'.format(cm[0][0])
print '{0} False positives'.format(cm[0][1])
print '{0} False negatives'.format(cm[1][0])
[[9 3]
 [2 6]] 

6 True positives
9 True negatives
3 False positives
2 False negatives

Precision

In [7]:
print precision_score(true_labels, predictions)
0.666666666667

Recall

In [8]:
print recall_score(true_labels, predictions)
0.75