Is there a pattern?

It’s time to explore some point patterns.

Airbnb listings in Prague

Your task is to explore the locations of Airbnb listings in Prague, downloaded from the Inside Airbnb portal. The dataset is available in CSV from this URL.

url = "http://data.insideairbnb.com/czech-republic/prague/prague/2023-06-24/data/listings.csv.gz"

If the link does not work, please report it and use the backup.

It needs some pre-processing, but it is up to you to figure it out this time. Below are some tips if you get stuck.

  1. Read the data and create a GeoDataFrame projected to S-JTSK / Krovak East North (EPSG:5514).

You can read the url directly with pd.read_csv.

You will need to create geometry from "longitude" and "latitude" columns. gpd.points_from_xy may help.

Don’t forget to assign a CRS when creating a GeoDataFrame. When dealing with "longitude" and "latitude", you always want EPSG:4326.

This is how the pre-processing should look.

import pandas as pd
import geopandas as gpd

airbnb = pd.read_csv(url)
airbnb = gpd.GeoDataFrame(
    airbnb,
    geometry=gpd.points_from_xy(
        airbnb["longitude"], airbnb["latitude"], crs="EPSG:4326"
    ),
)
airbnb = airbnb.to_crs("EPSG:5514")

With the data ready:

Visualisation

  • Create a hexbin visualisation of the listings
  • Create a kernel density estimate of the distribution of Airbnb’s
  • What can you read from the hexbin you cannot from the KDE and vice versa?

Centrography

  • Measure mean, median, and mean weighted by a column of your choice.
  • Can you plot them on a map?
  • Are they the same? Can you tell why?

Centrography and Ripley’s alphabet measure the distance between the points. It is not wise to measure distances based on coordinates in latitude and longitude, so don’t forget to extract projected coordinates from your geometry.

Randomness

  • Measure quadrat statistic. How does it change when you change the grid size?
  • Measure Ripley’s \(G\) and \(F\)
  • Is the pattern clustered?
Optional extension

Can you subset the data based on variables and check the point pattern properties of different subsets? Think about splitting based on the number of rooms, host characterisation, property type, etc.