Validation
Different problems require different types of Error Functions
Supervised Learning
Classiication Some form of counting misclassified datapoints
Regression Average distance between predicted and target output
Unsupervised learning
Within-class and between-class distances
Errors for Supervised Learning - Classification
The primary source for performance estimation is the confusion matrix TP, TN, FP, FN
Errors for Supervised Learning - Regression
Assess the difference between predicted output and target output
Sum of squared errors (SSE) Which we want to minimise
Propability of the predicted outputs given the target outputs Which we want to maximise
Errors for Supervised Learning - Clustering
Internal measures - Cohesion vs Separation
Cohesion: how closely related are samples in a cluster
Within sum squared errors (WSS)
Separation: how well separated is one cluster from other clusters
Between cluster sum of squares (BSS)