= "http://data.insideairbnb.com/czech-republic/prague/prague/2023-06-24/data/listings.csv.gz" url
Is there a pattern?
It’s time to explore some point patterns.
Airbnb listings in Prague
Your task is to explore the locations of Airbnb listings in Prague, downloaded from the Inside Airbnb portal. The dataset is available in CSV from this URL.
It needs some pre-processing, but it is up to you to figure it out this time. Below are some tips if you get stuck.
- Read the data and create a
GeoDataFrame
projected to S-JTSK / Krovak East North (EPSG:5514).
You can read the url
directly with pd.read_csv
.
You will need to create geometry from "longitude"
and "latitude"
columns. gpd.points_from_xy
may help.
Don’t forget to assign a CRS when creating a GeoDataFrame
. When dealing with "longitude"
and "latitude"
, you always want EPSG:4326.
This is how the pre-processing should look.
import pandas as pd
import geopandas as gpd
= pd.read_csv(url)
airbnb = gpd.GeoDataFrame(
airbnb
airbnb,=gpd.points_from_xy(
geometry"longitude"], airbnb["latitude"], crs="EPSG:4326"
airbnb[
),
)= airbnb.to_crs("EPSG:5514") airbnb
With the data ready:
Visualisation
- Create a
hexbin
visualisation of the listings - Create a kernel density estimate of the distribution of Airbnb’s
- What can you read from the
hexbin
you cannot from the KDE and vice versa?
Centrography
- Measure mean, median, and mean weighted by a column of your choice.
- Can you plot them on a map?
- Are they the same? Can you tell why?
Centrography and Ripley’s alphabet measure the distance between the points. It is not wise to measure distances based on coordinates in latitude and longitude, so don’t forget to extract projected coordinates from your geometry.
Randomness
- Measure quadrat statistic. How does it change when you change the grid size?
- Measure Ripley’s \(G\) and \(F\)
- Is the pattern clustered?