Is there a pattern?

It’s time to explore some point patterns.

Airbnb listings in Prague

Your task is to explore the locations of Airbnb listings in Prague, downloaded from the Inside Airbnb portal. The dataset is available in CSV from this URL.

url = "http://data.insideairbnb.com/czech-republic/prague/prague/2024-12-22/data/listings.csv.gz"

Backup data

If the link does not work, please report it and use the backup.

It needs some pre-processing, but it is up to you to figure it out this time. Below are some tips if you get stuck.

Read the data and create a GeoDataFrame projected to S-JTSK / Krovak East North (EPSG:5514).

A few hints

You can read the url directly with pd.read_csv.

More hints

You will need to create geometry from "longitude" and "latitude" columns. gpd.points_from_xy may help.

Even more hints

Don’t forget to assign a CRS when creating a GeoDataFrame. When dealing with "longitude" and "latitude", you always want EPSG:4326.

Okay, here’s the code

This is how the pre-processing should look.

import pandas as pd
import geopandas as gpd

airbnb = pd.read_csv(url)
airbnb = gpd.GeoDataFrame(
    airbnb,
    geometry=gpd.points_from_xy(
        airbnb["longitude"], airbnb["latitude"], crs="EPSG:4326"
    ),
)
airbnb = airbnb.to_crs("EPSG:5514")

With the data ready:

Visualisation

Create a hexbin visualisation of the listings
Create a kernel density estimate of the distribution of Airbnb’s
What can you read from the hexbin you cannot from the KDE and vice versa?

Centrography

Measure mean, median, and mean weighted by a column of your choice.
Can you plot them on a map?
Are they the same? Can you tell why?

Use projected coordinates

Centrography and Ripley’s alphabet measure the distance between the points. It is not wise to measure distances based on coordinates in latitude and longitude, so don’t forget to extract projected coordinates from your geometry.

Randomness

Measure quadrat statistic. How does it change when you change the grid size?
Measure Ripley’s \(G\) and \(F\)
Is the pattern clustered?

Optional extension

Can you subset the data based on variables and check the point pattern properties of different subsets? Think about splitting based on the number of rooms, host characterisation, property type, etc.