Predict future
In this part, you will not start anything new but will continue working with the data from Prague from the previous section and get a bit deeper into the problem. Not everything has been covered in class, so consult the documentation when unsure.
Continue with classficiation
Let’s try to explore the classification problem a bit further.
- Try different combinations of independent variables.
- Does it make sense to combine proximity variables with spatial heterogeneity? Test that.
- Contrary to what you may expect, removing some variables with low importance helps the performance. Is this the case in our situation?
- Find the best combination of variables. How far can you push accuracy?
- Test other ML models.
- Check what happens when you use different models than random forest. Compare the same input using different models, like
HistGradientBoostingClassifier
,DecisionTreeClassifier
, orAdaBoostClassifier
. Which one is the best when using the default hyperparameters?
- Check what happens when you use different models than random forest. Compare the same input using different models, like
- Pick your favourite model and find high and low prediction certainty clusters.
- Fine tune the models using grid search.