Predict future

In this part, you will not start anything new but will continue working with the data from Prague from the previous section and get a bit deeper into the problem. Not everything has been covered in class, so consult the documentation when unsure.

Continue with classficiation

Let’s try to explore the classification problem a bit further.

  1. Try different combinations of independent variables.
    • Does it make sense to combine proximity variables with spatial heterogeneity? Test that.
    • Contrary to what you may expect, removing some variables with low importance helps the performance. Is this the case in our situation?
    • Find the best combination of variables. How far can you push accuracy?
  2. Test other ML models.
    • Check what happens when you use different models than random forest. Compare the same input using different models, like HistGradientBoostingClassifier, DecisionTreeClassifier, or AdaBoostClassifier. Which one is the best when using the default hyperparameters?
  3. Pick your favourite model and find high and low prediction certainty clusters.
  4. Fine tune the models using grid search.