California Department of Water Resources Historical Data

ulmo.cdec.historical.get_stations()

Fetches information on all CDEC sites.

Returns:

df : pandas DataFrame

a pandas DataFrame (indexed on site id) with station information.

ulmo.cdec.historical.get_sensors(sensor_id=None)

Gets a list of sensor ids as a DataFrame indexed on sensor number. Can be limited by a list of numbers.

Usage example:

from ulmo import cdec
# to get all available sensor info
sensors = cdec.historical.get_sensors()
# or to get just one sensor
sensor = cdec.historical.get_sensors([1])
Parameters:

sites : iterable of integers or None

Returns:

df : pandas DataFrame

a python dict with site codes mapped to site information

ulmo.cdec.historical.get_station_sensors(station_ids=None, sensor_ids=None, resolutions=None)

Gets available sensors for the given stations, sensor ids and time resolutions. If no station ids are provided, all available stations will be used (this is not recommended, and will probably take a really long time).

The list can be limited by a list of sensor numbers, or time resolutions if you already know what you want. If none of the provided sensors or resolutions are available, an empty DataFrame will be returned for that station.

Usage example:

from ulmo import cdec
# to get all available sensors
available_sensors = cdec.historical.get_station_sensors(['NEW'])
Parameters:

station_ids : iterable of strings or None

sensor_ids : iterable of integers or None

check out or use the get_sensors() function to see a list of available sensor numbers

resolutions : iterable of strings or None

Possible values are ‘event’, ‘hourly’, ‘daily’, and ‘monthly’ but not all of these time resolutions are available at every station.

Returns:

dict : a python dict

a python dict with site codes as keys with values containing pandas DataFrames of available sensor numbers and metadata.

ulmo.cdec.historical.get_data(station_ids=None, sensor_ids=None, resolutions=None, start=None, end=None)

Downloads data for a set of CDEC station and sensor ids. If either is not provided, all available data will be downloaded. Be really careful with choosing hourly resolution as the data sets are big, and CDEC’s servers are slow as molasses in winter.

Usage example:

from ulmo import cdec
dat = cdec.historical.get_data(['PRA'],resolutions=['daily'])
Parameters:

station_ids : iterable of strings or None

sensor_ids : iterable of integers or None

check out or use the get_sensors() function to see a list of available sensor numbers

resolutions : iterable of strings or None

Possible values are ‘event’, ‘hourly’, ‘daily’, and ‘monthly’ but not all of these time resolutions are available at every station.

Returns:

dict : a python dict

a python dict with site codes as keys. Values will be nested dicts containing all of the sensor/resolution combinations.

Climate Prediction Center Weekly Drought

Climate Prediction Center Weekly Drought Index dataset

ulmo.cpc.drought.get_data(state=None, climate_division=None, start=None, end=None, as_dataframe=False)

Retreives data.

Parameters:

state : None or str

If specified, results will be limited to the state corresponding to the given 2-character state code.

climate_division : None or int

If specified, results will be limited to the climate division.

start : None or date (see note on dates and times)

Results will be limited to those after the given date. Default is the start of the current calendar year.

end : None or date (see note on dates and times)

If specified, results will be limited to data before this date.

as_dataframe: bool

If False (default), a dict with a nested set of dicts will be returned with data indexed by state, then climate division. If True then a pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.

Returns:

data : dict or pandas.Dataframe

A dict or pandas.DataFrame representing the data. See the as_dataframe parameter for more.

CUAHSI WaterOneFlow

ulmo.cuahsi.his_central

CUAHSI HIS Central web services

ulmo.cuahsi.his_central.get_services(bbox=None)

Retrieves a list of services.

Parameters:

bbox : None or 4-tuple

Optional argument for a bounding box that covers the area you want to look for services in. This should be a tuple containing (min_longitude, min_latitude, max_longitude, and max_latitude) with these values in decimal degrees. If not provided then the full set of services will be queried from HIS Central.

Returns:

services_dicts : list

A list of dicts that each contain information on an individual service.

ulmo.cuahsi.wof

CUAHSI WaterOneFlow web services

ulmo.cuahsi.wof.get_sites(wsdl_url, suds_cache=('default', ))

Retrieves information on the sites that are available from a WaterOneFlow service using a GetSites request. For more detailed information including which variables and time periods are available for a given site, use get_site_info().

Parameters:

wsdl_url : str

URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.

suds_cache: ``None`` or tuple

SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.

Returns:

sites_dict : dict

a python dict with site codes mapped to site information

ulmo.cuahsi.wof.get_site_info(wsdl_url, site_code, suds_cache=('default', ))

Retrieves detailed site information from a WaterOneFlow service using a GetSiteInfo request.

Parameters:

wsdl_url : str

URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.

site_code : str

Site code of the site you’d like to get more information for. Site codes MUST contain the network and be of the form <network>:<site_code>, as is required by WaterOneFlow.

suds_cache: ``None`` or tuple

SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.

Returns:

site_info : dict

a python dict containing site information

ulmo.cuahsi.wof.get_values(wsdl_url, site_code, variable_code, start=None, end=None, suds_cache=('default', ))

Retrieves site values from a WaterOneFlow service using a GetValues request.

Parameters:

wsdl_url : str

URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.

site_code : str

Site code of the site you’d like to get values for. Site codes MUST contain the network and be of the form <network>:<site_code>, as is required by WaterOneFlow.

variable_code : str

Variable code of the variable you’d like to get values for. Variable codes MUST contain the network and be of the form <vocabulary>:<variable_code>, as is required by WaterOneFlow.

start : None or datetime (see note on dates and times)

Start of a date range for a query. If both start and end parameters are omitted, the entire time series available will be returned.

end : None or datetime (see note on dates and times)

End of a date range for a query. If both start and end parameters are omitted, the entire time series available will be returned.

suds_cache: ``None`` or tuple

SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.

Returns:

site_values : dict

a python dict containing values

ulmo.cuahsi.wof.get_variable_info(wsdl_url, variable_code=None, suds_cache=('default', ))

Retrieves site values from a WaterOneFlow service using a GetVariableInfo request.

Parameters:

wsdl_url : str

URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.

variable_code : None or str

If None (default) then information on all variables will be returned, otherwise, this should be set to the variable code of the variable you’d like to get more information on. Variable codes MUST contain the network and be of the form <vocabulary>:<variable_code>, as is required by WaterOneFlow.

suds_cache: ``None`` or tuple

SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.

Returns:

variable_info : dict

a python dict containing variable information. If no variable code is None (default) then this will be a nested set of dicts keyed by <vocabulary>:<variable_code>

Lower Colorado River Authority (LCRA) Hydromet Data

ulmo.lcra.hydromet.get_sites_by_type(site_type)

Gets list of the hydromet site codes and description for site. Parameters: ———– site_type : str

In all but lake sites, this is the parameter code collected at the site. For lake sites, it is ‘lake’. See site_types and PARAMETERS
sites_dict: dict
A python dict with four char long site codes mapped to site information.
ulmo.lcra.hydromet.get_site_data(site_code, parameter_code, as_dataframe=True, start_date=None, end_date=None, dam_site_location='head')

Fetches site’s parameter data Parameters ———- site_code : str

The LCRA site code (four chars long) of the site you want to query data for.
parameter_code : str
LCRA parameter code. see PARAMETERS
start_date : None or datetime
Start of a date range for a query.
end_date : None or datetime
End of a date range for a query.
as_dataframe : True (default) or False
This determines what format values are returned as. If True (default) then the values will be a pandas.DataFrame object with the values timestamp as the index. If False, the format will be Python dictionary.
dam_site_location : ‘head’ (default) or ‘tail’
The site location relative to the dam.
Returns:

df : pandas.DataFrame or

values_dict : dict

ulmo.lcra.hydromet.get_all_sites()

Returns list of all LCRA hydromet sites as geojson featurecollection.

ulmo.lcra.hydromet.get_current_data(service, as_geojson=False)

fetches the current (near real-time) river stage and flow values from LCRA web service. Parameters ———- service : str

The web service providing data. see current_data_services. Currently we have GetUpperBasin and GetLowerBasin.
as_geojson : ‘True’ or ‘False’ (default)
If True the data is returned as geojson featurecollection and if False data is returned as list of dicts.

current_values_dicts : a list of dicts or current_values_geojson : a geojson featurecollection.

Lower Colorado River Authority (LCRA) Water Quality Data

ulmo.lcra.waterquality.get_sites(source_agency=None)

Fetches a list of sites with location and available metadata. Parameters ———- source_agency : LCRA used code of the that collects the data. there are sites whose sources are not listed so this filter may not return all sites of a certain source. see source_map. Returns ——- sites_geojson : geojson FeatureCollection

ulmo.lcra.waterquality.get_historical_data(site_code, start=None, end=None, as_dataframe=False)

Fetches data for a site at a given date. Parameters ———- site_code: str

The site code to fetch data for. A list of sites can be retrieved with get_sites()
date : None or date (see note on dates and times)
The date of the data to be queried. If date is None (default), then all data will be returned.
as_dataframe : bool
This determines what format values are returned as. If False (default), the values dict will be a dict with timestamps as keys mapped to a dict of gauge variables and values. If True then the values dict will be a pandas.DataFrame object containing the equivalent information.
data_dict : dict
A dict containing site information and values.
ulmo.lcra.waterquality.get_recent_data(site_code, as_dataframe=False)

fetches near real-time instantaneous water quality data for the LCRA bay sites. Parameters ———- site_code : str

The bay site to fetch data for. see real_time_sites
as_dataframe : bool
This determines what format values are returned as. If False (default), the values will be list of value dicts. If True then values are returned as pandas.DataFrame.

list of values or dataframe.

NASA ORNL Daymet weather data services

NASA EARTHDATA ORNL DAAC Daymet web services

ulmo.nasa.daymet.get_variables()

retrieve a list of variables available

Parameters:

None

Returns:

dictionary of variables with variable abreviations as keys

and description as values

ulmo.nasa.daymet.get_daymet_singlepixel(latitude, longitude, variables=['tmax', 'tmin', 'prcp'], years=None, as_dataframe=True)

Fetches a time series of climate variables from the DAYMET single pixel extraction

Parameters:

latitude: float

The latitude (WGS84), value between 52.0 and 14.5.

longitude: float

The longitude (WGS84), value between -131.0 and -53.0.

variables : List of str

Daymet parameters to fetch. Available options:

tmax - maximum temperature tmin - minimum temperature srad - shortwave radiation vp - vapor pressure swe - snow-water equivalent prcp - precipitation dayl - daylength

default = [‘tmax’, ‘tmin’, ‘prcp’]

years: list of int

List of years to return. Daymet version 2 available 1980 to the latest full calendar year. If None (default), all years will be returned

as_dataframe : True (default) or False

if True return pandas dataframe if False return open file with contents in csv format

Returns:

single_pixel_timeseries : pandas dataframe or csv filename

National Climatic Data Center Climate Index Reference Sequential (CIRS)

National Climatic Data Center Climate Index Reference Sequential (CIRS) drought dataset

ulmo.ncdc.cirs.get_data(elements=None, by_state=False, location_names='abbr', as_dataframe=False, use_file=None)

Retrieves data.

Parameters:

elements : ``None`, str or list

The element(s) for which to get data for. If None (default), then all elements are used. An individual element is a string, but a list or tuple of them can be used to specify a set of elements. Elements are:

  • ‘cddc’: Cooling Degree Days
  • ‘hddc’: Heating Degree Days
  • ‘pcpn’: Precipitation
  • ‘pdsi’: Palmer Drought Severity Index
  • ‘phdi’: Palmer Hydrological Drought Index
  • ‘pmdi’: Modified Palmer Drought Severity Index
  • ‘sp01’: 1-month Standardized Precipitation Index
  • ‘sp02’: 2-month Standardized Precipitation Index
  • ‘sp03’: 3-month Standardized Precipitation Index
  • ‘sp06’: 6-month Standardized Precipitation Index
  • ‘sp09’: 9-month Standardized Precipitation Index
  • ‘sp12’: 12-month Standardized Precipitation Index
  • ‘sp24’: 24-month Standardized Precipitation Index
  • ‘tmpc’: Temperature
  • ‘zndx’: ZNDX

by_state : bool

If False (default), divisional data will be retrieved. If True, then regional data will be retrieved.

location_names : str or None

This parameter defines what (if any) type of names will be added to the values. If set to ‘abbr’ (default), then abbreviated location names will be used. If ‘full’, then full location names will be used. If set to None, then no location name will be added and the only identifier will be the location_codes (this is the most memory-conservative option).

as_dataframe : bool

If False (default), a list of values dicts is returned. If True, a dict with element codes mapped to equivalent pandas.DataFrame objects will be returned. The pandas dataframe is used internally, so setting this to True is faster as it skips a somewhat expensive serialization step.

use_file : None, file-like object or str

If None (default), then data will be automatically retrieved from the web. If a file-like object or a file path string, then the file will be used to read data from. This is intended to be used for reading in previously-downloaded versions of the dataset.

Returns:

data : list or pandas.DataFrame

A list of value dicts or a pandas.DataFrame containing data. See the as_dataframe parameter for more.

National Climatic Data Center Global Historical Climate Network Daily

National Climatic Data Center Global Historical Climate Network - Daily dataset

ulmo.ncdc.ghcn_daily.get_data(station_id, elements=None, update=True, as_dataframe=False)

Retrieves data for a given station.

Parameters:

station_id : str

Station ID to retrieve data for.

elements : None, str, or list of str

If specified, limits the query to given element code(s).

update : bool

If True (default), new data files will be downloaded if they are newer than any previously cached files. If False, then previously downloaded files will be used and new files will only be downloaded if there is not a previously downloaded file for a given station.

as_dataframe : bool

If False (default), a dict with element codes mapped to value dicts is returned. If True, a dict with element codes mapped to equivalent pandas.DataFrame objects will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.

Returns:

site_dict : dict

A dict with element codes as keys, mapped to collections of values. See the as_dataframe parameter for more.

ulmo.ncdc.ghcn_daily.get_stations(country=None, state=None, elements=None, start_year=None, end_year=None, update=True, as_dataframe=False)

Retrieves station information, optionally limited to specific parameters.

Parameters:

country : str

The country code to use to limit station results. If set to None (default), then stations from all countries are returned.

state : str

The state code to use to limit station results. If set to None (default), then stations from all states are returned.

elements : None, str, or list of str

If specified, station results will be limited to the given element codes and only stations that have data for any these elements will be returned.

start_year : int

If specified, station results will be limited to contain only stations that have data after this year. Can be combined with the end_year argument to get stations with data within a range of years.

end_year : int

If specified, station results will be limited to contain only stations that have data before this year. Can be combined with the start_year argument to get stations with data within a range of years.

update : bool

If True (default), new data files will be downloaded if they are newer than any previously cached files. If False, then previously downloaded files will be used and new files will only be downloaded if there is not a previously downloaded file for a given station.

as_dataframe : bool

If False (default), a dict with station IDs keyed to station dicts is returned. If True, a single pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.

Returns:

stations_dict : dict or pandas.DataFrame

A dict or pandas.DataFrame representing station information for stations matching the arguments. See the as_dataframe parameter for more.

National Climatic Data Center Global Summary of the Day

National Climatic Data Center Global Summary of the Day dataset

ulmo.ncdc.gsod.get_data(station_codes, start=None, end=None, parameters=None)

Retrieves data for a set of stations.

Parameters:

station_codes : str or list

Single station code or iterable of station codes to retrieve data for.

start : None or date (see note on dates and times)

If specified, data are limited to values after this date.

end : None or date (see note on dates and times)

If specified, data are limited to values before this date.

parameters : None, str or list

If specified, data are limited to this set of parameter codes.

Returns:

data_dict : dict

Dict with station codes keyed to lists of value dicts.

ulmo.ncdc.gsod.get_stations(country=None, state=None, start=None, end=None, update=True)

Retrieve information on the set of available stations.

Parameters:

country : {None, str, or iterable}

If specified, results will be limited to stations with matching country codes.

state : {None, str, or iterable}

If specified, results will be limited to stations with matching state codes.

start : None or date (see note on dates and times)

If specified, results will be limited to stations which have data after this start date.

end : None or date (see note on dates and times)

If specified, results will be limited to stations which have data before this end date.

update : bool

If True (default), check for a newer copy of the stations file and download if it is newer the previously downloaded copy. If False, then a new stations file will only be downloaded if a previously downloaded file cannot be found.

Returns:

stations_dict : dict

A dict with USAF-WBAN codes keyed to station information dicts.

Texas Weather Connection Daily Keetch-Byram Drought Index (KBDI)

ulmo.twc.kbdi.core

This module provides direct access to Texas Weather Connection Daily Keetch-Byram Drought Index (KBDI) dataset.

ulmo.twc.kbdi.get_data(county=None, start=None, end=None, as_dataframe=False, data_dir=None)

Retreives data.

Parameters:

county : None or str

If specified, results will be limited to the county corresponding to the given 5-character Texas county fips code i.e. 48???.

end : None or date (see note on dates and times)

Results will be limited to data on or before this date. Default is the current date.

start : None or date (see note on dates and times)

Results will be limited to data on or after this date. Default is the start of the calendar year for the end date.

as_dataframe: bool

If False (default), a dict with a nested set of dicts will be returned with data indexed by 5-character Texas county FIPS code. If True then a pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.

data_dir : None or directory path

Directory for holding downloaded data files. If no path is provided (default), then a user-specific directory for holding application data will be used (the directory will depend on the platform/operating system).

Returns:

data : dict or pandas.Dataframe

A dict or pandas.DataFrame representing the data. See the as_dataframe parameter for more.

US Army Corps of Engineers - Tulsa District Water Control

United States Army Corps of Engineers Tulsa District Water Control

ulmo.usace.swtwc.get_stations()

Fetches a list of station codes and descriptions.

Returns:

stations_dict : dict

a python dict with station codes mapped to station information

ulmo.usace.swtwc.get_station_data(station_code, date=None, as_dataframe=False)

Fetches data for a station at a given date.

Parameters:

station_code: str

The station code to fetch data for. A list of stations can be retrieved with get_stations()

date : None or date (see note on dates and times)

The date of the data to be queried. If date is None (default), then data for the current day is retreived.

as_dataframe : bool

This determines what format values are returned as. If False (default), the values dict will be a dict with timestamps as keys mapped to a dict of gauge variables and values. If True then the values dict will be a pandas.DataFrame object containing the equivalent information.

Returns:

data_dict : dict

A dict containing station information and values.

USGS National Water Information System

USGS National Water Information System web services

ulmo.usgs.nwis.get_sites(sites=None, state_code=None, huc=None, bounding_box=None, county_code=None, parameter_code=None, site_type=None, service=None, input_file=None, **kwargs)

Fetches site information from USGS services. See the `USGS Site Service`_ documentation for a detailed description of options. For convenience, major options have been included with pythonic names. Options that are not listed below may be provided as extra kwargs (i.e. keyword=’argument’) and will be passed along with the web services request. These extra keywords must match the USGS names exactly. The `USGS Site Service`_ website describes available keyword names and argument formats.

Note

Only the options listed below have been tested and you may have mixed results retrieving data with extra options specified. Currently ulmo requests and parses data in the waterml format. Some options are not available in this format.

Parameters:

service : {None, ‘instantaneous’, ‘iv’, ‘daily’, ‘dv’}

The service to use, either “instantaneous”, “daily”, or None (default). If set to None, then both services are used. The abbreviations “iv” and “dv” can be used for “instantaneous” and “daily”, respectively.

input_file: ``None``, file path or file object

If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.

Returns:

sites_dict : dict

a python dict with site codes mapped to site information

ulmo.usgs.nwis.get_site_data(site_code, service=None, parameter_code=None, statistic_code=None, start=None, end=None, period=None, modified_since=None, input_file=None, methods=None, **kwargs)

Fetches site data.

Parameters:

site_code : str

The site code of the site you want to query data for.

service : {None, ‘instantaneous’, ‘iv’, ‘daily’, ‘dv’}

The service to use, either “instantaneous”, “daily”, or None (default). If set to None, then both services are used. The abbreviations “iv” and “dv” can be used for “instantaneous” and “daily”, respectively.

parameter_code : str

Parameter code(s) that will be passed as the parameterCd parameter.

statistic_code: str

Statistic code(s) that will be passed as the statCd parameter

start : None or datetime (see note on dates and times)

Start of a date range for a query. This parameter is mutually exclusive with period (you cannot use both).

end : None or datetime (see note on dates and times)

End of a date range for a query. This parameter is mutually exclusive with period (you cannot use both).

period : {None, str, datetime.timedelta}

Period of time to use for requesting data. This will be passed along as the period parameter. This can either be ‘all’ to signal that you’d like the entire period of record, or string in ISO 8601 period format (e.g. ‘P1Y2M21D’ for a period of one year, two months and 21 days) or it can be a datetime.timedelta object representing the period of time. This parameter is mutually exclusive with start/end dates.

modified_since : None or datetime.timedelta

Passed along as the modifiedSince parameter.

input_file: ``None``, file path or file object

If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.

methods: ``None``, str or Python dict

If None (default), it’s assumed that there is a single method for each parameter. This raises an error if more than one method ids are encountered. If str, this is the method id for the requested parameter/s and can use “all” if method ids are not known beforehand. If dict, provide the parameter_code to method id mapping. Parameter’s method id is specific to site.

Returns:

data_dict : dict

a python dict with parameter codes mapped to value dicts

ulmo.usgs.nwis.hdf5.get_site(site_code, path=None, complevel=None, complib=None)

Fetches previously-cached site information from an hdf5 file.

Parameters:

site_code : str

The site code of the site you want to get information for.

path : None or file path

Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.

complevel : None or int {0-9}

Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.

complib : None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}

Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.

Returns:

site_dict : dict

a python dict containing site information

ulmo.usgs.nwis.hdf5.get_site_data(site_code, agency_code=None, parameter_code=None, path=None, complevel=None, complib=None, start=None)

Fetches previously-cached site data from an hdf5 file.

Parameters:

site_code : str

The site code of the site you want to get data for.

agency_code : None or str

The agency code to get data for. This will need to be set if a site code is in use by multiple agencies (this is rare).

parameter_code : None, str, or list

List of parameters to read. If None (default) read all parameters. Otherwise only read specified parameters. Parameters should be specified with statistic code, i.e. daily streamflow is ‘00060:00003’

path : None or file path

Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.

complevel : None or int {0-9}

Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.

complib : None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}

Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.

start: ``None`` or string formatted date like 2014-01-01

Filter the dataset to return only data later that the start date

Returns:

data_dict : dict

a python dict with parameter codes mapped to value dicts

ulmo.usgs.nwis.hdf5.get_sites(path=None, complevel=None, complib=None)

Fetches previously-cached site information from an hdf5 file.

Parameters:

path : None or file path

Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.

complevel : None or int {0-9}

Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.

complib : None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}

Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.

Returns:

sites_dict : dict

a python dict with site codes mapped to site information

ulmo.usgs.nwis.hdf5.remove_values(site_code, datetime_dicts, path=None, complevel=None, complib=None, autorepack=True)

Remove values from hdf5 file.

Parameters:

site_code : str

The site code of the site to remove records from.

datetime_dicts : a python dict with a list of datetimes for a given variable

(key) to set as NaNs.

path : file path to hdf5 file.

Returns:

None : None

ulmo.usgs.nwis.hdf5.repack(path, complevel=None, complib=None)

Repack the hdf5 file at path. This is the same as running the pytables ptrepack command on the file.

Parameters:

path : file path

Path to the hdf5 file.

complevel : None or int {0-9}

Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.

complib : None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}

Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.

Returns:

None : None

ulmo.usgs.nwis.hdf5.update_site_data(site_code, start=None, end=None, period=None, path=None, methods=None, input_file=None, complevel=None, complib=None, autorepack=True)

Update cached site data.

Parameters:

site_code : str

The site code of the site you want to query data for.

start : None or datetime (see note on dates and times)

Start of a date range for a query. This parameter is mutually exclusive with period (you cannot use both).

end : None or datetime (see note on dates and times)

End of a date range for a query. This parameter is mutually exclusive with period (you cannot use both).

period : {None, str, datetime.timedelta}

Period of time to use for requesting data. This will be passed along as the period parameter. This can either be ‘all’ to signal that you’d like the entire period of record, or string in ISO 8601 period format (e.g. ‘P1Y2M21D’ for a period of one year, two months and 21 days) or it can be a datetime.timedelta object representing the period of time. This parameter is mutually exclusive with start/end dates.

path : None or file path

Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.

methods: ``None``, str or Python dict

If None (default), it’s assumed that there is a single method for each parameter. This raises an error if more than one method ids are encountered. If str, this is the method id for the requested parameter/s and can use “all” if method ids are not known beforehand. If dict, provide the parameter_code to method id mapping. Parameter’s method id is specific to site.

input_file: ``None``, file path or file object

If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.

autorepack : bool

Whether or not to automatically repack the h5 file(s) after updating. There is a tradeoff between performance and disk space here: large files take a longer time to repack but also tend to grow larger faster, the default of True conserves disk space because untamed file growth can become quite destructive. If you set this to False, you can manually repack files with repack().

Returns:

None : None

ulmo.usgs.nwis.hdf5.update_site_list(sites=None, state_code=None, huc=None, bounding_box=None, county_code=None, parameter_code=None, site_type=None, service=None, input_file=None, complevel=None, complib=None, autorepack=True, path=None, **kwargs)

Update cached site information.

See ulmo.usgs.nwis.core.get_sites() for description of regular parameters, only extra parameters used for caching are listed below.

Parameters:

path : None or file path

Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.

input_file: ``None``, file path or file object

If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.

complevel : None or int {0-9}

Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.

complib : None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}

Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.

autorepack : bool

Whether or not to automatically repack the h5 file after updating. There is a tradeoff between performance and disk space here: large files take a longer time to repack but also tend to grow larger faster, the default of True conserves disk space because untamed file growth can become quite destructive. If you set this to False, you can manually repack files with repack().

Returns:

None : None

USGS Emergency Data Distribution Network services

USGS Emergency Data Distribution Network services

ulmo.usgs.eddn.get_data(dcp_address, start=None, end=None, networklist='', channel='', spacecraft='Any', baud='Any', electronic_mail='', dcp_bul='', glob_bul='', timing='', retransmitted='Y', daps_status='N', use_cache=False, cache_path=None, as_dataframe=True)

Fetches GOES Satellite DCP messages from USGS Emergency Data Distribution Network.

Parameters:

dcp_address : str, iterable of strings

DCP address or list of DCP addresses to be fetched; lists will be joined by a ‘,’.

start : {None, str, datetime, datetime.timedelta}

If None (default) then the start time is 2 days prior (or date of last data if cache is used) If a datetime or datetime like string is specified it will be used as the start date. If a timedelta or string in ISO 8601 period format (e.g ‘P2D’ for a period of 2 days) then ‘now’ minus the timedelta will be used as the start. NOTE: The EDDN service does not specify how far back data is available. The service also imposes a maximum data limit of 25000 character. If this is limit reached multiple requests will be made until all available data is downloaded.

end : {None, str, datetime, datetime.timedelta}

If None (default) then the end time is ‘now’ If a datetime or datetime like string is specified it will be used as the end date. If a timedelta or string in ISO 8601 period format (e.g ‘P2D’ for a period of 2 days) then ‘now’ minus the timedelta will be used as the end. NOTE: The EDDN service does not specify how far back data is available. The service also imposes a maximum data limit of 25000 character.

networklist : str,

‘’ (default). Filter by network.

channel : str,

‘’ (default). Filter by channel.

spacecraft : str,

East, West, Any (default). Filter by GOES East/West Satellite

baud : str,

‘Any’ (default). Filter by baud rate. See http://eddn.usgs.gov/msgaccess.html for options

electronic_mail : str,

‘’ (default) or ‘Y’

dcp_bul : str,

‘’ (default) or ‘Y’

glob_bul : str,

‘’ (default) or ‘Y’

timing : str,

‘’ (default) or ‘Y’

retransmitted : str,

‘Y’ (default) or ‘N’

daps_status : str,

‘N’ (default) or ‘Y’

use_cache : bool,

If True (default) use hdf file to cache data and retrieve new data on subsequent requests

cache_path : {None, str},

If None use default ulmo location for cached files otherwise use specified path. files are named using dcp_address.

as_dataframe : bool

If True (default) return data in a pandas dataframe otherwise return a dict.

Returns:

message_data : {pandas.DataFrame, dict}

Either a pandas dataframe or a dict indexed by dcp message times

ulmo.usgs.eddn.decode(dataframe, parser, **kwargs)

decodes dcp message data in pandas dataframe returned by ulmo.usgs.eddn.get_data().

Parameters:

dataframe : pandas.DataFrame

pandas.DataFrame returned by ulmo.usgs.eddn.get_data()

parser : {function, str}

function that acts on dcp_message each row of the dataframe and returns a new dataframe containing several rows of decoded data. This returned dataframe may have different (but derived) timestamps than that the original row. If a string is passed then a matching parser function is looked up from ulmo.usgs.eddn.parsers

Returns:

decoded_data : pandas.DataFrame

pandas dataframe, the format and parameters in the returned dataframe depend wholly on the parser used

USGS Earth Resources Observation Systems (EROS) services

Earth Resources Observation and Science (EROS) Center application services (Raster)

ulmo.usgs.eros.get_available_datasets(bbox, attrs=None, as_dataframe=True)

retrieve available datasets for a given bounding box.

Parameters:

bbox : (sequence of float|str)

bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)

attrs: comma separated list of str

metadata attributes to retrieve, None (default) retrieves all

as_dataframe : True (default) or False

if True return pandas dataframe

Returns:

datasets : dict or pandas DataFrame

returns availabel datasets

ulmo.usgs.eros.get_themes(as_dataframe=True)

retrieve list of data themes available

Parameters:

as_dataframe : True (default) or False

if True return pandas dataframe

Returns:

available data themes

ulmo.usgs.eros.get_attribute_list(as_dataframe=True)

retrieve list of metadata attributes for dataset

Parameters:

as_dataframe : True (default) or False

if True return pandas dataframe

Returns:

available metadata attributes

ulmo.usgs.eros.get_available_formats(product_key, as_dataframe=True)

retrieve list of data formats available for dataset

Parameters:

product_key : str

dataset name. (see get_available_datasets for list)

as_dataframe : True (default) or False

if True return pandas dataframe

Returns:

available data formats

ulmo.usgs.eros.get_raster(product_key, bbox, fmt=None, path=None, check_modified=False, mosaic=False)

downloads National Elevation Dataset raster tiles that cover the given bounding box for the specified data layer.

Parameters:

product_key : str

dataset name. (see get_available_datasets for list)

bbox : (sequence of float|str)

bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)

fmt : None or str

available formats vary in different datasets. If None, preference will be given to geotiff and then img, followed by whatever fmt is available

path : None or path

if None default path will be used

update_cache: ``True`` or ``False`` (default)

if False then tiles will not be re-downloaded if they exist in the path

check_modified: ``True`` or ``False`` (default)

if tile exists in path, check if newer file exists online and download if available.

mosaic: ``True`` or ``False`` (default)

if True, mosaic and clip downloaded tiles to the extents of the bbox provided. Requires rasterio package and GDAL.

Returns:

raster_tiles : geojson FeatureCollection

metadata as a FeatureCollection. local url of downloaded data is in feature[‘properties’][‘file’]

ulmo.usgs.eros.get_raster_availability(product_key, bbox, fmt=None)

retrieve metadata for raster tiles that cover the given bounding box for the specified data layer.

Parameters:

product_key : str

dataset layer name. (see get_available_layers for list)

bbox : (sequence of float|str)

bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)

fmt : str

desired data format. if None, geotiff followed by img will be given preference

Returns:

metadata : geojson FeatureCollection

returns metadata including download urls as a FeatureCollection

USGS National Elevation Dataset (NED) services

`National Elevation Dataset`_ services (Raster)

ulmo.usgs.ned.get_available_layers()

return list of available data layers

ulmo.usgs.ned.get_raster(layer, bbox, path=None, update_cache=False, check_modified=False, mosaic=False)

downloads National Elevation Dataset raster tiles that cover the given bounding box for the specified data layer.

Parameters:

layer : str

dataset layer name. (see get_available_layers for list)

bbox : (sequence of float|str)

bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)

path : None or path

if None default path will be used

update_cache: ``True`` or ``False`` (default)

if False and output file already exists use it.

check_modified: ``True`` or ``False`` (default)

if tile exists in path, check if newer file exists online and download if available.

mosaic: ``True`` or ``False`` (default)

if True, mosaic and clip downloaded tiles to the extents of the bbox provided. Requires rasterio package and GDAL.

Returns:

raster_tiles : geojson FeatureCollection

metadata as a FeatureCollection. local url of downloaded data is in feature[‘properties’][‘file’]

ulmo.usgs.ned.get_raster_availability(layer, bbox=None)

retrieve metadata for raster tiles that cover the given bounding box for the specified data layer.

Parameters:

layer : str

dataset layer name. (see get_available_layers for list)

bbox : (sequence of float|str)

bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)

Returns:

metadata : geojson FeatureCollection

returns metadata including download urls as a FeatureCollection

note on dates and times

Dates and times can provided a few different ways, depending on what is convenient. They can either be a string representation or as instances of date and datetime objects from python’s datetime standard library module. For strings, the ISO 8061 format (‘YYYY-mm-dd HH:MM:SS’ or some abbreviated version) is accepted, as well dates in ‘mm/dd/YYYY’ format.