Explained Variance of Each PC

In [1]:
import sys
sys.path.append("C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/tools/")
sys.path.append('C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/choose_your_own')
sys.path.append('C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/datasets_questions')

import os
os.chdir('C:/Users/Jeff/udacity/Intro_to_Machine_Learning/ud120-projects/pca')
In [2]:
%matplotlib inline

# code provided with the course
%run eigenfaces.py
===================================================
Faces recognition example using eigenfaces and SVMs
===================================================

The dataset used in this example is a preprocessed excerpt of the
"Labeled Faces in the Wild", aka LFW_:

  http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)

  .. _LFW: http://vis-www.cs.umass.edu/lfw/

  original source: http://scikit-learn.org/stable/auto_examples/applications/face_recognition.html


Total dataset size:
n_samples: 1288
n_features: 1850
n_classes: 7
Extracting the top 150 eigenfaces from 966 faces
done in 0.276s
Projecting the input data on the eigenfaces orthonormal basis
done in 0.035s
Fitting the classifier to the training set
done in 19.882s
Best estimator found by grid search:
SVC(C=1000.0, cache_size=200, class_weight='balanced', coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
Predicting the people names on the testing set
done in 0.058s
                   precision    recall  f1-score   support

     Ariel Sharon       0.50      0.62      0.55        13
     Colin Powell       0.76      0.88      0.82        60
  Donald Rumsfeld       0.73      0.70      0.72        27
    George W Bush       0.92      0.87      0.89       146
Gerhard Schroeder       0.77      0.80      0.78        25
      Hugo Chavez       0.75      0.60      0.67        15
       Tony Blair       0.88      0.83      0.86        36

      avg / total       0.83      0.83      0.83       322

[[  8   0   3   2   0   0   0]
 [  2  53   1   3   0   1   0]
 [  4   1  19   2   0   0   1]
 [  1  11   2 127   3   1   1]
 [  0   2   0   1  20   1   1]
 [  0   2   0   1   2   9   1]
 [  1   1   1   2   1   0  30]]
In [3]:
print 'Variance explained by the first principal component:  {0}'.format(pca.explained_variance_ratio_[0])
print 'Variance explained by the second principal component: {0}'.format(pca.explained_variance_ratio_[1])
Variance explained by the first principal component:  0.193464736102
Variance explained by the second principal component: 0.151169305389

F1 Score vs. No. of PCs Used

In [4]:
for n_components in [10, 15, 25, 50, 100, 250]:
  pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)

  eigenfaces = pca.components_.reshape((n_components, h, w))

  X_train_pca = pca.transform(X_train)
  X_test_pca = pca.transform(X_test)


  # Train a SVM classification model
  param_grid = {
           'C': [1e3, 5e3, 1e4, 5e4, 1e5],
            'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
            }
  clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'), param_grid)
  clf = clf.fit(X_train_pca, y_train)


  # Quantitative evaluation of the model quality on the test set
  y_pred = clf.predict(X_test_pca)

  if n_components==10:
    print 'n_components' + classification_report(y_test, y_pred, target_names=target_names).split('\n')[0]

  print '{0:12d}'.format(n_components) + classification_report(y_test, y_pred, target_names=target_names).split('\n')[2]
n_components                   precision    recall  f1-score   support
          10     Ariel Sharon       0.14      0.23      0.17        13
          15     Ariel Sharon       0.45      0.38      0.42        13
          25     Ariel Sharon       0.53      0.69      0.60        13
          50     Ariel Sharon       0.62      0.77      0.69        13
         100     Ariel Sharon       0.67      0.62      0.64        13
         250     Ariel Sharon       0.60      0.69      0.64        13