Personal tools
You are here: Home Project Documentation Getting Started Guide Under The Hood - Direct Searches

Under The Hood - Direct Searches

Introduction to searching the database directly using command-line tools

Overview

This page provides a basic introduction to searching the postgres meta-data database directly. This kind of access will be useful to anyone needing to search in ways that the web interface does not allow.

Prerequisites

In order to succeed with direct search, you'll need to have;

  • configured your iVec account appropriately. Steps involved are covered in Preparing Your Account. Please check that you've covered all of these steps.
  • a basic understanding of unix command line tools
  • a basic understanding of SQL. If you've not used SQL before, then you'll just need to be prepared to learn.

The Postgres Client

The postgres database is provided with a command-line tool - "psql" - for running queries and making updates. Providing you've configured your account appropriate, you should be able to invoke psql as follows without getting prompted for a password;
  • From wastac-4240: psql -d wastac
  • From cognac or pbstore: psql -d wastac -h wastac-4240.ivec.org

Wastac Database Basics - Simplified Views

The wastac database used by the portal includes many tables, most of which are not of little interest to most users. Information on swaths, satellites, science programs (EOS etc), sensors (ie MODIS) are spread across multiple tables in accordance with a practice called normalisation. To run a search on normalised data, you usually need to join multiple tables together to get to the information that you need.
 
Rather than burden everyone with details of the specific table layout we have decided to use and forcing them to become proficient at joining tables, we've provided simplified "views" into the database which hide the normalised table structure. Think of a view as a virtual table made up of data from other underlying tables.
 
Our recommendation is that anyone wanting to run direct searches should try to make use of these views. In order to not trip-up people with database structure changes, if decide we need to adjust the table structure, we'll do our best to preserve these simplified views.
 
The views available are;
  • wastac.v_swath - metadata about swaths in the archive
  • wastac.v_archivefile - metadata about archive files in the archive

wastac.v_swath

 
This view contains information about "swaths". A swath is a chunk of data - a granule or a full scene depending on whatever was uploaded to the archive. 
 
When a file arrives in the archive incoming directories, the filename is mapped to a corresponding swath name and an entry in the database is created for that swath if one does not already exist. The initial entry contains only the most basic metadata - in fact only metadata that can be determined from the file-name. Any other metadata fields are initialised to "NULL" - which is a special database value indicating no-known-value.
 
When a swath is processed, metadata about it is accumulated into the database. This information then becomes available to use in searches.
 
The view wastac.v_swath currently exposes the following database fields;
 
 geometry
Field Type Description
swath_name string The swath name. The convention used in the database is that the swath name is filename  off the "primary" datafile (ie L1B or L0 as it may be) without any type extensions. Examples are n16.0912111028_47532, MOD01.A2005059.1545.005.2008196220203 and mut01_20091211_1305_MOD.
swath_timestamp_start timestamp The UTC time of the first observation in the swath
swath_timestamp_end timestamp The UTC time of the last observation in the swath
swath_dayscene boolean True if the swath is a day-time scene
swath_scanlines integer The reported number of scan-lines in the swath; this number is only comparable between like swaths and is intended to give you an idea of the swath size.
swath_bounding geometry This is a geometry object mapping the bounds of the swath. It is a true perimeter of the scene in lon, lat coordinates (as opposed to rectangular bounding box). 
pc_day real Percentage of day-time observations.
For MODIS this information is collected from MOD35 statistics.
pc_land real Percentage of land observations. 
For MODIS this information is collected from MOD35 statistics.
pc_water real Percentage of water observations.
For MODIS this information is collected from MOD35 statistics.
pc_glint real Percentage of sun-glint observations.
For MODIS this information is collected from MOD35 statistics.
pc_cloud real Percentage of cloud observations.
For MODIS this information is collected from MOD35 statistics.
quicklook real True if a quicklook has been created.
sensor_name string The sensor name (ie. modis, avhrr)
satellite_name string The satellite name (ie aqua, terra, n18)
acquirer_name string The data acquirer name (ie wastac, nasa)
     
     
 

wastac.v_archivefile

TBC

Simple Searches

Below are some examples of searches that may help to get you started running direct queries.

How many swaths are in the database?

wastac=> select count(*) from wastac.v_swath;
 count
--------
 130250
(1 row)

How many swaths from each acquirer?

wastac=> select acquirer_name, count(swath_name) from wastac.v_swath group by acquirer_name;
 acquirer_name | count
---------------+--------
 wastac        |  26500
 nasa          | 103750

What MODIS DACC (NASA) granules exist between two dates?

wastac=> select swath_name
from wastac.v_swath
where
     swath_timestamp_start > timestamp '2009-08-01'
and  swath_timestamp_end   < timestamp '2009-08-31'
and acquirer_name = 'nasa'
order by swath_timestamp_start;

Note to save the results of the above query to a file, you could use the output re-direct function of psql as follows;

wastac=> \o ~/results.txt
wastac=> select swath_name
         from wastac.v_swath
         where
              swath_timestamp_start > timestamp '2009-08-01'
         and  swath_timestamp_end   < timestamp '2009-08-31'
         and  acquirer_name = 'nasa'
         order by swath_timestamp_start;
wastac=> \q
hlynch@wastac-4240% less ~/results.txt

How many swaths per acquirer per month over each year?

wastac=> select
       date_part('year', swath_timestamp_start) as year, 
       date_part('month', swath_timestamp_start) as month, 
       acquirer_name, 
       count(swath_name) 
from wastac.v_swath 
group by year, month, acquirer_name 
order by year, month, acquirer_name;

Which swaths fall in october of any year?

wastac=> select swath_name 
  from wastac.v_swath 
where 
     acquirer_name = 'nasa' 
and  date_part('month', swath_timestamp_start) = '10' 
order by swath_timestamp_start;

Which swaths cover a specific area?

Note this is the first spatial query shown so far. It defines a polygon shape by lon/lat coordinates and then searches for swath_bounding boxes that intersect with the polygon. Any overlap is considered a match. Spatial extensions are implemented using PostGIS, which is an extension to Postgres. 
wastac=> SELECT swath_name, swath_timestamp_start AT TIME ZONE 'Australia/Perth' AS localtime 
    FROM wastac.t_swath_metadata 
    WHERE swath_bounding IS NOT NULL 
      AND swath_bounding && GeomFromText('POLYGON((121.75 -34.0, 122.0 -34.0, 122.0 -33.75, 121.75 -33.75, 121.75 -34.0))',-1) 
    ORDER BY swath_timestamp_start;

How can I display times as my local time rather than UTC?

wastac=> SELECT swath_name, swath_timestamp_start AT TIME ZONE 'Australia/Perth' AS localtime;

PSQL Help Commands

The psql tool has built-in help, viewable by using the command "\?" - this will reveal many no-SQL commands built into the tool for managing input and output and examining the database schema.  As noted above the "\o filename" command sends query output to a file rather than the console. The describe command is useful to showing the table structure (ie \d wastac.v_swath).

Useful References

The SQL language is complex and takes some time to get used to, but once you become accustomed to the SQL way of thinking it is not all that difficult. The Postgres web-site contains extensive reference documentation for the psql tool and the PostgreSQL language. It's not all that useful as a learning-SQL tutorial - but there are many such tutorials available such as W3Schools. Please note that although there are SQL standards, you're better off thinking of SQL as a standard idea rather than a portable language like C - all implementations of SQL include many proprietary extensions which are generally needed in order to produce a typical non-trivial database application.
 
Document Actions
Log in


Forgot your password?