3 - 10 = 65529. What? | Martin Fleischmann

Yes, the formula above is correct. Well, it depends on what we mean by correct.

NDVI does not make sense

Imagine the following situation. We have fetched a cloud-free mosaic of Sentinel 2 satellite data and want to measure NDVI (Normalised difference vegetation index), which uses red and near-infrared bands within this simple formula.

NDVI = (NIR - Red) / (NIR + Red)

The results are normalised, which in this case means that they lie between -1 and 1. Always.

We open out raster data using xarray and have all 4 bands in a single xarray.DataSet. The code to measure NDVI is then simple.

>>> red = data.sel(band=1)  # select red band
>>> nir = data.sel(band=4)  # select near-infrared band

>>> ndvi = (nir - red) / (nir + red)  # compute NDVI

And, a surprise! Our results are between 0 and 170. That is certainly not correct. What has happened?

16-bit unsigned integer

The data coming from Sentinel 2 are stored as 16-bit unsigned integer (uint16). That means that the value the array can hold can be anything between 0 and 216 - 1 (65,535). Remember that NDVI is between -1 and 1. Does it mean that uint16 cannot represent NDVI values? Yes, precisely.

Look at this toy example to understand what is happening during the computation of NDVI in uint16. Let’s have an array with four numbers and subtract 10 from each of them.

>>> array = numpy.array([1, 3, 6, 9], dtype='uint16')
>>> array - 10

array([65527, 65529, 65532, 65535], dtype=uint16)

Yes, as weird as it is, it is correct. The result should be negative, but we can’t have negative values in uint16. So what happens is that the counter rolls over and subtracts the remaining value from the maximum it can represent (65,535 - x).

It is exactly like a rollover of the odometer. We ran out of values, so they started over. The only difference is that we have 16 binary values encoding each number, not decimal.

Odometer rollover. By Hellbus - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3089111

The fix is easy. We have to use data type which does not limit you like this, like a 64-bit integer.

>>> array.astype('int64') - 10

array([-9, -7, -4, -1])

Compared to a 64-bit integer, a 16-bit integer is efficient since the resulting file will be much smaller (that is why it is used in the first place) but it can be limiting.

Be aware of your data types, so you don’t make the same mistake we did ;).

NDVI does not make sense#

16-bit unsigned integer#

NDVI does not make sense

16-bit unsigned integer