Everything should be made as simple as possible, but not simpler
Albert Einstein
The world is complex and multidimensional
Univariate analysis focuses on only one dimension
Sometimes, world issues are best understood as multivariate
Define a given number of categories based on
many characteristics (multi-dimensional)
Find the category where each observation fits best.
Reduce complexity, keep all the relevant information
Produce easier-to-understand outputs
Split a dataset into groups of observations that are similar within the group
and dissimilar between groups based on a series of attributes
The computer learns some of the dataset’s properties without the human specifying them.
There is no apriori structure imposed on the classification \(\rightarrow\) before the analysis, no observations are in a category.
partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid)
Wikipedia
distance-based
always standardise
Split a dataset into groups of observations that are similar within the group
and dissimilar between groups based on a series of attributes
with the additional constraint that observations need to be spatial neighbours
Aggregating basic spatial units (areas) into larger units (regions)
Duque et al. (2007)
See Duque et al. (2007) for an excellent, though advanced, overview.
import sklearn