Yes, the formula above is correct. Well, it depends on what we mean by correct.
NDVI does not make sense
Imagine the following situation. We have fetched a cloud-free mosaic of Sentinel 2 satellite data and want to measure NDVI (Normalised difference vegetation index), which uses red and near-infrared bands within this simple formula.
NDVI = (NIR - Red) / (NIR + Red)
The results are normalised, which in this case means that they lie between -1 and 1. Always.
We open out raster data using xarray
and have all 4 bands in a single xarray.DataSet
. The code to measure NDVI is then simple.
>>> red = data.sel(band=1) # select red band
>>> nir = data.sel(band=4) # select near-infrared band
>>> ndvi = (nir - red) / (nir + red) # compute NDVI
And, a surprise! Our results are between 0 and 170. That is certainly not correct. What has happened?
16-bit unsigned integer
The data coming from Sentinel 2 are stored as 16-bit unsigned integer (uint16
). That means that the value the array can hold can be anything between 0 and 216 - 1 (65,535). Remember that NDVI is between -1 and 1. Does it mean that uint16
cannot represent NDVI values? Yes, precisely.
Look at this toy example to understand what is happening during the computation of NDVI in uint16
. Let’s have an array with four numbers and subtract 10 from each of them.
>>> array = numpy.array([1, 3, 6, 9], dtype='uint16')
>>> array - 10
array([65527, 65529, 65532, 65535], dtype=uint16)
Yes, as weird as it is, it is correct. The result should be negative, but we can’t have negative values in uint16
. So what happens is that the counter rolls over and subtracts the remaining value from the maximum it can represent (65,535 - x).
It is exactly like a rollover of the odometer. We ran out of values, so they started over. The only difference is that we have 16 binary values encoding each number, not decimal.
The fix is easy. We have to use data type which does not limit you like this, like a 64-bit integer.
>>> array.astype('int64') - 10
array([-9, -7, -4, -1])
Compared to a 64-bit integer, a 16-bit integer is efficient since the resulting file will be much smaller (that is why it is used in the first place) but it can be limiting.
Be aware of your data types, so you don’t make the same mistake we did ;).