A course on Spatial Data Science

The course Spatial Data Science in Python introduces data science and computational analysis using open source tools written in the Python programming language.

The course is provided by Charles University as a standalone micro-credentials certification and is taught online every August. The course is open to anyone.

Aims

The course supports students with little prior knowledge of core competencies in Spatial Data Science (SDS). It includes:

  • Advancing their statistical and numerical literacy.
  • Introducing basic principles of programming for data science and state-of-the-art computational tools for SDS.
  • Presenting a comprehensive overview of the main methodologies available to the Spatial Data Scientist and their intuition on how and when they can be applied.
  • Focusing on real-world applications of these techniques in a geographical and applied context.
What is the scope?

The course revolves around data typically used in human geography, but its applicability is not limited to human geography. In practice, you will work more with vector data than rasters (although we cover those a bit as well) and often with data capturing various aspects of human life. The spatial data science concepts, however, are universal.

Learning outcomes

After finishing the course, students will be able to:

  • Demonstrate understanding of advanced concepts of spatial data science and use the open tools to load and analyze spatial data.
  • Understand the motivation and inner logic of the main methodological approaches of open SDS.
  • Critically evaluate the suitability of a specific technique, what it can offer, and how it can help answer questions of interest.
  • Apply several spatial analysis techniques and explain how to interpret the results in the process of turning data into information.
  • Work independently using SDS tools to extract valuable insight when faced with a new dataset.

Prerequisites

This course assumes an understanding of geography and its key concepts (e.g. coordinate reference systems or the Modifiable Areal Unit Problem (Openshaw 1983)), at least basic familiarity with GIS (like file formats and basic spatial data manipulation) and a basic understanding of statistics (e.g. understanding the concept of regression), optimally with some spatial component (geographically weighted regression). While the course will briefly explain these topics, it will not cover the theory and statistics behind these concepts in detail.

A basic understanding of Python is required. Understanding of data analysis using pandas is not expected but certainly helps.

Course structure

The material is planned for 5 days with two 3-hour sessions per day. Each session is divided into three parts - Concepts, Hands-on and Exercise, following the model proposed by Arribas-Bel (2019). Concepts have the form of a lecture covering conceptual aspects of the day’s topic, providing necessary theoretical background before digging into code. This part can be nicknamed “I do”. Hands-on contains documented code in a Jupyter notebook, executed in parallel by a lecturer, providing an additional explanation, and by students. Therefore, we have a part “We do”. Exercise is a set of tasks to be performed by students individually, with occasional guidance by the lecturer. So we finish the session with “You do”.

For enrolled students, the course will finish with a written assignment in the form of a computational essay. See the Assignment section for details.

Literature

The course loosely follows the contents of the Geographic Data Science with Python by Rey, Arribas-Bel, and Wolf (2023). The online version of the book is available under open access from geographicdata.science/book. Using the online version over the printed one is recommended, although this is entirely up to you.

Spatial or geographic data science?

Spatial data science and geographic data science are often treated as synonyms. In some interpretations, spatial is broader than geographic. In this case, we do spatial […] for […] geography, which is, in principle, geographic data science. We will treat both terms as equal within the context of this course.

Acknowledgements

The course material is partially derived from A Course on Geographic Data Science by Arribas-Bel (2019) and follows its structure, main learning logic, and some hands-on materials. Thanks, Dani! A few sections are derived from other sources acknowledged at the bottom of respective pages. Thank you all!

References

Arribas-Bel, Dani. 2019. “A Course on Geographic Data Science.” The Journal of Open Source Education 2 (14). https://doi.org/10.21105/jose.00042.
Openshaw, S. 1983. The Modifiable Areal Unit Problem. Concepts and Techniques in Modern Geography. Geo Books.
Rey, Sergio, Dani Arribas-Bel, and Levi John Wolf. 2023. Geographic Data Science with Python. Chapman & Hall/CRC Texts in Statistical Science. London, England: Taylor & Francis.