Is there space in machine learning?

Subsets of data science

Supervised

Classification problems

Regression problems

Unsupervised

Clustering

Dimensionality reduction

Models

Linear regression

Logistic regression

Decision trees

Random forest

Gradient-boosted trees

Neural networks

Workflow

Split to train and test parts

Fit the model

Evaluate

(data standardisation)
Hyper-parameter tuning
Data augmentation

A visual explanation

By R2D3 (Stephanie Yee and Tony Chu)

Evaluation methods

Classification

Confusion matrix

Cat Dog Ant Fly
Cat 45 5 2 0
Dog 3 40 5 2
Ant 1 2 38 4
Fly 0 1 3 46

Evaluation methods

Classification

Confusion matrix

Accuracy

Cohen’s kappa score

Precision

Recall

Evaluation methods

Regression

Residuals

\(R^2\)

Mean absolute error

Mean squared error

Spatial?

Explicitly spatial ML is rare.

Spatial dimension is squeezed in
non-spatial models and methods.

Data leakage

Spatial cross-validation

Spatial feature engineering

Map synthesis

Proximity

Map matching

Spatial effects

Dependence

Heterogeneity

Spatial evaluation

Spatial patterns of errors

import sklearn