Predict future
In this part, you will not start anything new but will continue working with the data from Prague from the previous section and get a bit deeper into the problem. Not everything has been covered in class, so consult the documentation, Google, or your favourite LLM.
Continue with classficiation
Let’s try to explore the classification problem a bit further.
- Try different combinations of independent variables.
- Does it make sense to combine proximity variables with spatial heterogeneity? Test that.
- Contrary to what you may expect, removing some variables with low importance helps the performance. Is this the case in our situation?
- Find the best combination of variables. How far can you push accuracy?
- Test other ML models.
- Check what happens when you use different models than random forest. Compare the same input using different models, like
HistGradientBoostingClassifier
,DecisionTreeClassifier
, orAdaBoostClassifier
. Which one is the best when using the default hyperparameters?
- Check what happens when you use different models than random forest. Compare the same input using different models, like
- Pick your favourite model and find high and low prediction certainty clusters.
Explore regression
Try using the techniques of spatial augmentation on the regression model predicting price outlined at the end of the last section. Combine all you think may help. What is the best \(R^2\) you can get? You can even optionally compare the result with the performance of Geographically Weighted Regression on the same problem (but do not try to fit it on the full dataset :)).
Effect of hyperparameters (advanced)
You’ve been using the default model parameters. Can you figure out a way of fine-tuning them? Have a look at this resource and Google and try to fine-tune one of the models you’ve worked with. Does the change of parameters like the number of estimators affect the result in any way? How important do you think it is?