Mapping Modis Data

From EdwardsLab

Jump to: navigation, search

Contents

The question

Did the 2007 California wild fires cause planktonic blooms in the ocean?

We're going to use MODIS data to figure this out, and along the way sort out their data and build some databases.

About MODIS

NASA has two satellites that are looking at the earth, Terra and Aqua. These satellites contain Moderate Resolution Imaging Spectroradiometer, abbreviated MODIS, that scan the earth twice per day, in relatively broad swaths. The data is freely available at the MODIS website.

For this study, we are going to use data from Aqua MODIS (it is the newer satellite, and so the data is more upto date). We're doing to have to download and parse the data files, extract the pieces of data that we need, and build our database from those data. Then we can start to address the question posed above.

Although the satellites traverse the earth twice per day, they only measure a small part of the globe on each pass. Therefore, composite data are used to generate larger images. The satellites also scan the earth at two scales, 4 km and 8 km wide bands. For this study we will use the 4 km wide band that has been composited over 8 days. You can view all the satellite data online at the ocean color website and play with some of the settings.

Note that we will be using chlorophyll A measurements, as these are a proxy for phytoplankton growth.

About the fires

The 2007 San Diego wild fires burned from October 21-30, 2007, and as shown in the satellite images at that page, resulted in significant smoke being released across the county and into the ocean. There are further satellite images on the first two floors of GMCS.

Downloading the data

All of the data can be downloaded from the Ocean Color FTP site. This link is to the 8 day composite, that has 4 km wide bands, measures chlorophyll A, and was generated in 2007.

The file names are typically something like this:

   A20072812007288.L3m_8D_CHLO_4.bz2

the extension is bz2 - it is a compression algorithm, and on most *NIX computers you can use bunzip to decompress the file. Once decompressed the file is in HSF format. More below.

The file name is in the format:

    2007  -- start year
    281   -- start day within the year
    2007  -- end year
    288   -- end day within the year

    L3m   -- level 3 means data

    8D    -- eight day composite

    CHLO  -- chlorophyll A

    4     -- 4 km band width

HDF Format

The HDF format is a binary data format for sharing scientific data. You can read more about HDF at the HDF group homepage.

There are several applications that can view HDF data. In particular, I like the java program HDFview. However, this data is pretty meaningless until you mine the file further.

The HDF group also have a set of programs for extracting data from HDF files, the latest release is version 1.8.3. Note: get HDF4.

We will need to convert this to ASCII to get the data out and put it in our database.

To start, you will need to get their ncdump program (part of the HDF software.

If you uncompress the HDF file, and then use the following command:

   ./ncdump A20072812007288.L3m_8D_CHLO_4 > A20072812007288.L3m_8D_CHLO_4.txt

You will extract the data into ASCII format.

The header of the file has some information shown here A20072812007288.L3m_8D_CHLO_4 header. In particular, you need to know a couple of things:

  1. The projection is an equidistant cylindrical projection
  2. The Northernmost/Southernmost Latitudes are 90/-90
  3. The Westernmost/Easternmost Longitudes are -180/180
  4. The step between latitudes is 0.041667
  5. The step between longitudes is 0.041667
  6. There are 4,320 lines
  7. There are 8,640 columns
  8. This gives a total of 37,324,800 data points (4,320 * 8,640)
  9. If you calculate (2*180)/0.041667 = 8,639.9 -- the number of columns is the same as increments of longitude
  10. If you calculate (2*90)/0.041667 = 4320 -- the number of lines is the same as increments of latitude
  11. The data is calculated as Base**((Slope*l3m_data) + Intercept) = Parameter value where Base = 10, Slope = 0.000058, and Intercept = -2. This is how you convert from data to value.

Goals

  • We're going to build a database of the chlorophyll concentrations off the coast of California from August until December 2007.
  • San Diego is at 32° 44' N; 117° 10' W. Therefore, to start we'll define our target box as 30 N -> 35 N and 117 W -> 130 W. We may adjust this if it turns out there is too much data!!
  • Start by downloading and extracting an HDF file using ncdump, and examining the data in there, then see if you can figure out how to get the data in this lat/lon range. A simple way is to just write code to iterate over the whole dataset, printing out tuples of:
  [lat, lon, value]
  • then we need to figure out the conversion of ascii ints in this file to real values, which I think is based on the equation above.
Personal tools
peoples pages