ulmo Readers¶
ulmo readers / api’s.
note on dates and times¶
Dates and times can provided a few different ways, depending on what is convenient. They can either be a string representation or as instances of date and datetime objects from python’s datetime standard library module. For strings, the ISO 8061 format (‘YYYY-mm-dd HH:MM:SS’ or some abbreviated version) is accepted, as well dates in ‘mm/dd/YYYY’ format.
Readers for Global to USA-national data¶
Climate Prediction Center (CPC) Weekly Drought¶
Climate Prediction Center Weekly Drought Index dataset
-
ulmo.cpc.drought.
get_data
(state=None, climate_division=None, start=None, end=None, as_dataframe=False)¶ Retreives data.
Parameters: - state (
None
or str) – If specified, results will be limited to the state corresponding to the given 2-character state code. - climate_division (
None
or int) – If specified, results will be limited to the climate division. - start (
None
or date (see note on dates and times)) – Results will be limited to those after the given date. Default is the start of the current calendar year. - end (
None
or date (see note on dates and times)) – If specified, results will be limited to data before this date. - as_dataframe (bool) – If
False
(default), a dict with a nested set of dicts will be returned with data indexed by state, then climate division. IfTrue
then a pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this toTrue
is a little bit faster as it skips a serialization step.
Returns: data – A dict or pandas.DataFrame representing the data. See the
as_dataframe
parameter for more.Return type: dict or pandas.Dataframe
- state (
CUAHSI Hydrologic Information System (HIS)¶
CUAHSI HIS Central¶
CUAHSI HIS Central catalog web services
-
ulmo.cuahsi.his_central.
get_services
(bbox=None, user_cache=False)¶ Retrieves a list of services.
Parameters: - bbox (
None
or 4-tuple) – Optional argument for a bounding box that covers the area you want to look for services in. This should be a tuple containing (min_longitude, min_latitude, max_longitude, and max_latitude) with these values in decimal degrees. If not provided then the full set of services will be queried from HIS Central. - user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns: services_dicts – A list of dicts that each contain information on an individual service.
Return type: - bbox (
CUAHSI WaterOneFlow (WOF)¶
CUAHSI WaterOneFlow (WOF) web data access services. These services provide
access to a wide variety of data sources that use the standardized WOF service protocol.
Most such services are registered with the CUAHSI HIS Central catalog and can
be identified via queries using the ulmo.cuahsi.his_central.get_services
catalog
web service. Each WOF service may have some unique characteristics, such as specific
regional and temporal domains, set of variables, or additional constraints.
The notes below provides additional usage details for some data sources.
- NRCS SNOTEL: USDA Natural Resources Conservation Service (NRCS) Snow Telemetry network of remote, high-elevation mountain sites in the western U.S., used to monitor snowpack, precipitation, temperature and other climatic conditions. Timestamps in the request and data response are in PST (UTC-8).
-
ulmo.cuahsi.wof.
get_sites
(wsdl_url, suds_cache=('default', ), timeout=None, user_cache=False)¶ Retrieves information on the sites that are available from a WaterOneFlow service using a GetSites request. For more detailed information including which variables and time periods are available for a given site, use
get_site_info()
.Parameters: - wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
- suds_cache (None or tuple) – SOAP local cache duration for WSDL description and client object.
Pass a cache duration tuple like (‘days’, 3) to set a custom duration.
Duration may be in months, weeks, days, hours, or seconds.
If unspecified, the default duration (1 day) will be used.
Use
None
to turn off caching. - timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
- user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns: sites_dict – a python dict with site codes mapped to site information
Return type:
-
ulmo.cuahsi.wof.
get_site_info
(wsdl_url, site_code, suds_cache=('default', ), timeout=None, user_cache=False)¶ Retrieves detailed site information from a WaterOneFlow service using a GetSiteInfo request.
Parameters: - wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
- site_code (str) – Site code of the site you’d like to get more information for. Site codes MUST contain the network and be of the form <network>:<site_code>, as is required by WaterOneFlow.
- suds_cache (
None
or tuple) – SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. UseNone
to turn off caching. - timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
- user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns: site_info – a python dict containing site information
Return type:
-
ulmo.cuahsi.wof.
get_values
(wsdl_url, site_code, variable_code, start=None, end=None, suds_cache=('default', ), timeout=None, user_cache=False)¶ Retrieves site values from a WaterOneFlow service using a GetValues request.
Parameters: - wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
- site_code (str) – Site code of the site you’d like to get values for. Site codes MUST contain the network and be of the form <network>:<site_code>, as is required by WaterOneFlow.
- variable_code (str) – Variable code of the variable you’d like to get values for. Variable codes MUST contain the network and be of the form <vocabulary>:<variable_code>, as is required by WaterOneFlow.
- start (
None
or datetime (see note on dates and times)) – Start of the query datetime range. If omitted, data from the start of the time series to theend
timestamp will be returned (but see caveat, in note below). - end (
None
or datetime (see note on dates and times)) – End of the query datetime range. If omitted, data from thestart
timestamp to end of the time series will be returned (but see caveat, in note below). - suds_cache (
None
or tuple) – SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. UseNone
to turn off caching. - timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
- user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns: site_values – a python dict containing values
Return type: Notes
If both
start
andend
parameters are omitted, the entire time series available will typically be returned. However, some service providers will return an error if either start or end are omitted; this is specially true for services hosted or redirected by CUAHSI via the CUAHSI HydroPortal, which have a ‘WSDL’ url using the domain https://hydroportal.cuahsi.org. For HydroPortal, a start datetime of ‘1753-01-01’ has been known to return valid results while catching the oldest start times, though the response may be broken up into chunks (‘paged’).
-
ulmo.cuahsi.wof.
get_variable_info
(wsdl_url, variable_code=None, suds_cache=('default', ), timeout=None, user_cache=False)¶ Retrieves site values from a WaterOneFlow service using a GetVariableInfo request.
Parameters: - wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
- variable_code (None or str) – If None (default) then information on all variables will be returned, otherwise, this should be set to the variable code of the variable you’d like to get more information on. Variable codes MUST contain the network and be of the form <vocabulary>:<variable_code>, as is required by WaterOneFlow.
- suds_cache (
None
or tuple) – SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. UseNone
to turn off caching. - timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
- user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns: variable_info – a python dict containing variable information. If no variable code is None (default) then this will be a nested set of dicts keyed by <vocabulary>:<variable_code>
Return type:
NASA ORNL Daymet weather data services¶
NASA EARTHDATA ORNL DAAC Daymet web services
-
ulmo.nasa.daymet.
get_variables
()¶ retrieve a list of variables available
Parameters: None Returns: - dictionary of variables with variable abbreviations as keys
- and description as values
-
ulmo.nasa.daymet.
get_daymet_singlepixel
(latitude, longitude, variables=['tmax', 'tmin', 'prcp'], years=None, as_dataframe=True)¶ Fetches a time series of climate variables from the DAYMET single pixel extraction
Parameters: latitude (float) – The latitude (WGS84), value between 52.0 and 14.5.
longitude (float) – The longitude (WGS84), value between -131.0 and -53.0.
variables (list of str) – Daymet parameters to fetch. default = [‘tmax’, ‘tmin’, ‘prcp’]. Available options:
- ‘tmax’: maximum temperature
- ‘tmin’: minimum temperature
- ‘srad’: shortwave radiation
- ‘vp’: vapor pressure
- ‘swe’: snow-water equivalent
- ‘prcp’: precipitation;
- ‘dayl’ : daylength.
years (list of int) – List of years to return. Daymet version 2 available 1980 to the latest full calendar year. If
None
(default), all years will be returnedas_dataframe (
True
(default) orFalse
) – ifTrue
return pandas dataframe ifFalse
return open file with contents in csv format
Returns: single_pixel_timeseries
Return type: pandas dataframe or csv filename
National Climatic Data Center (NCDC)¶
NCDC Climate Index Reference Sequential (CIRS)¶
National Climatic Data Center Climate Index Reference Sequential (CIRS) drought dataset
-
ulmo.ncdc.cirs.
get_data
(elements=None, by_state=False, location_names='abbr', as_dataframe=False, use_file=None)¶ Retrieves data.
Parameters: elements (
None
, str or list) – The element(s) for which to get data for. IfNone
(default), then all elements are used. An individual element is a string, but a list or tuple of them can be used to specify a set of elements. Elements are:- ‘cddc’: Cooling Degree Days
- ‘hddc’: Heating Degree Days
- ‘pcpn’: Precipitation
- ‘pdsi’: Palmer Drought Severity Index
- ‘phdi’: Palmer Hydrological Drought Index
- ‘pmdi’: Modified Palmer Drought Severity Index
- ‘sp01’: 1-month Standardized Precipitation Index
- ‘sp02’: 2-month Standardized Precipitation Index
- ‘sp03’: 3-month Standardized Precipitation Index
- ‘sp06’: 6-month Standardized Precipitation Index
- ‘sp09’: 9-month Standardized Precipitation Index
- ‘sp12’: 12-month Standardized Precipitation Index
- ‘sp24’: 24-month Standardized Precipitation Index
- ‘tmpc’: Temperature
- ‘zndx’: ZNDX
by_state (bool) – If False (default), divisional data will be retrieved. If True, then regional data will be retrieved.
location_names (str or
None
) – This parameter defines what (if any) type of names will be added to the values. If set to ‘abbr’ (default), then abbreviated location names will be used. If ‘full’, then full location names will be used. If set to None, then no location name will be added and the only identifier will be the location_codes (this is the most memory-conservative option).as_dataframe (bool) – If
False
(default), a list of values dicts is returned. IfTrue
, a dict with element codes mapped to equivalent pandas.DataFrame objects will be returned. The pandas dataframe is used internally, so setting this toTrue
is faster as it skips a somewhat expensive serialization step.use_file (
None
, file-like object or str) – IfNone
(default), then data will be automatically retrieved from the web. If a file-like object or a file path string, then the file will be used to read data from. This is intended to be used for reading in previously-downloaded versions of the dataset.
Returns: data – A list of value dicts or a pandas.DataFrame containing data. See the
as_dataframe
parameter for more.Return type: list or pandas.DataFrame
NCDC Global Historical Climate Network (GHCN) Daily¶
National Climatic Data Center Global Historical Climate Network - Daily dataset
-
ulmo.ncdc.ghcn_daily.
get_data
(station_id, elements=None, update=True, as_dataframe=False)¶ Retrieves data for a given station.
Parameters: - station_id (str) – Station ID to retrieve data for.
- elements (
None
, str, or list of str) – If specified, limits the query to given element code(s). - update (bool) – If
True
(default), new data files will be downloaded if they are newer than any previously cached files. IfFalse
, then previously downloaded files will be used and new files will only be downloaded if there is not a previously downloaded file for a given station. - as_dataframe (bool) – If
False
(default), a dict with element codes mapped to value dicts is returned. IfTrue
, a dict with element codes mapped to equivalent pandas.DataFrame objects will be returned. The pandas dataframe is used internally, so setting this toTrue
is a little bit faster as it skips a serialization step.
Returns: site_dict – A dict with element codes as keys, mapped to collections of values. See the
as_dataframe
parameter for more.Return type:
-
ulmo.ncdc.ghcn_daily.
get_stations
(country=None, state=None, elements=None, start_year=None, end_year=None, update=True, as_dataframe=False)¶ Retrieves station information, optionally limited to specific parameters.
Parameters: - country (str) – The country code to use to limit station results. If set to
None
(default), then stations from all countries are returned. - state (str) – The state code to use to limit station results. If set to
None
(default), then stations from all states are returned. - elements (
None
, str, or list of str) – If specified, station results will be limited to the given element codes and only stations that have data for any these elements will be returned. - start_year (int) – If specified, station results will be limited to contain only stations
that have data after this year. Can be combined with the
end_year
argument to get stations with data within a range of years. - end_year (int) – If specified, station results will be limited to contain only stations
that have data before this year. Can be combined with the
start_year
argument to get stations with data within a range of years. - update (bool) – If
True
(default), new data files will be downloaded if they are newer than any previously cached files. IfFalse
, then previously downloaded files will be used and new files will only be downloaded if there is not a previously downloaded file for a given station. - as_dataframe (bool) – If
False
(default), a dict with station IDs keyed to station dicts is returned. IfTrue
, a single pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this toTrue
is a little bit faster as it skips a serialization step.
Returns: stations_dict – A dict or pandas.DataFrame representing station information for stations matching the arguments. See the
as_dataframe
parameter for more.Return type: dict or pandas.DataFrame
- country (str) – The country code to use to limit station results. If set to
NCDC Global Summary of the Day (GSoD)¶
National Climatic Data Center Global Summary of the Day dataset
-
ulmo.ncdc.gsod.
get_data
(station_codes, start=None, end=None, parameters=None)¶ Retrieves data for a set of stations.
Parameters: - station_codes (str or list) – Single station code or iterable of station codes to retrieve data for.
- start (
None
or date (see note on dates and times)) – If specified, data are limited to values after this date. - end (
None
or date (see note on dates and times)) – If specified, data are limited to values before this date. - parameters (
None
, str or list) – If specified, data are limited to this set of parameter codes.
Returns: data_dict – Dict with station codes keyed to lists of value dicts.
Return type:
-
ulmo.ncdc.gsod.
get_stations
(country=None, state=None, start=None, end=None, update=True)¶ Retrieve information on the set of available stations.
Parameters: - country ({
None
, str, or iterable}) – If specified, results will be limited to stations with matching country codes. - state ({
None
, str, or iterable}) – If specified, results will be limited to stations with matching state codes. - start (
None
or date (see note on dates and times)) – If specified, results will be limited to stations which have data after this start date. - end (
None
or date (see note on dates and times)) – If specified, results will be limited to stations which have data before this end date. - update (bool) – If
True
(default), check for a newer copy of the stations file and download if it is newer the previously downloaded copy. IfFalse
, then a new stations file will only be downloaded if a previously downloaded file cannot be found.
Returns: stations_dict – A dict with USAF-WBAN codes keyed to station information dicts.
Return type: - country ({
NOAA GOES Data Collection System (DCS) services¶
NOAA GOES Data Collection System Access to data stream transmitted via GOES satellite.
-
ulmo.noaa.goes.
get_data
(dcp_address, hours, use_cache=False, cache_path=None, as_dataframe=True)¶ Fetches GOES Satellite DCP messages from NOAA Data Collection System (DCS) field test.
Parameters: - dcp_address (str, iterable of strings) – DCP address or list of DCP addresses to be fetched; lists will be joined by a ‘,’.
- use_cache (bool,) – If True (default) use hdf file to cache data and retrieve new data on subsequent requests
- cache_path ({
None
, str},) – IfNone
use default ulmo location for cached files otherwise use specified path. files are named using dcp_address. - as_dataframe (bool) – If True (default) return data in a pandas dataframe otherwise return a dict.
Returns: message_data – Either a pandas dataframe or a dict indexed by dcp message times
Return type: {pandas.DataFrame, dict}
-
ulmo.noaa.goes.
decode
(dataframe, parser, **kwargs)¶ decodes goes message data in pandas dataframe returned by ulmo.noaa.goes.get_data().
Parameters: - dataframe (pandas.DataFrame) – pandas.DataFrame returned by ulmo.noaa.goes.get_data()
- parser ({function, str}) – function that acts on dcp_message each row of the dataframe and returns a new dataframe containing several rows of decoded data. This returned dataframe may have different (but derived) timestamps than that the original row. If a string is passed then a matching parser function is looked up from ulmo.noaa.goes.parsers
Returns: decoded_data – pandas dataframe, the format and parameters in the returned dataframe depend wholly on the parser used
Return type: pandas.DataFrame
USGS National Water Information System (NWIS)¶
USGS National Water Information System web services
-
ulmo.usgs.nwis.
get_sites
(service=None, input_file=None, sites=None, state_code=None, huc=None, bounding_box=None, county_code=None, parameter_code=None, site_type=None, **kwargs)¶ Fetches site information from USGS services. See the USGS Site Service documentation for a detailed description of options. For convenience, major options have been included with pythonic names. At least one major filter must be specified. Options that are not listed below may be provided as extra kwargs (i.e. keyword=’argument’) and will be passed along with the web services request. These extra keywords must match the USGS names exactly. The USGS Site Service website describes available keyword names and argument formats.
Note
Only the options listed below have been tested and you may have mixed results retrieving data with extra options specified. Currently ulmo requests and parses data in the WaterML 1.x format. Some options are not available in this format.
Parameters: - service ({
None
, ‘instantaneous’, ‘iv’, ‘daily’, ‘dv’}) – The service to use, either “instantaneous”, “daily”, or None (default). If set toNone
, then both services are used. The abbreviations “iv” and “dv” can be used for “instantaneous” and “daily”, respectively. - input_file (
None
, file path or file object) – IfNone
(default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services. - sites (str, iterable of strings or
None
) – A major filter. The site(s) to use; lists will be joined by a ‘,’. At least one major filter must be specified. - state_code (str or
None
) – A major filter. Two-letter state code used instateCd
parameter. At least one major filter must be specified. - county_code (str, iterable of strings or
None
) – A major filter. The 5 digit FIPS county code(s) used in the countyCd parameter; lists will be joined by a ‘,’. At least one major filter must be specified. - huc (str, iterable of strings or
None
) – A major filter. The hydrologic unit code(s) to use; lists will be joined by a ‘,’. At least one major filter must be specified. - bounding_box (str, iterable of strings or
None
) – A major filter. This bounding box used in the bBox parameter. The format is westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude; lists will be joined by a ‘,’. At least one major filter must be specified. - parameter_code (str, iterable of strings or
None
) – Optional filter. Parameter code(s) that will be passed as theparameterCd
parameter; lists will be joined by a ‘,’. This parameter represents the following USGS website input: Sites serving parameter codes - site_type (str, iterable of strings or
None
) – Optional filter. The type(s) of site used insiteType
parameter; lists will be joined by a ‘,’.
Returns: return_sites – a python dict with site codes mapped to site information
Return type: - service ({
-
ulmo.usgs.nwis.
get_site_data
(site_code, service=None, parameter_code=None, statistic_code=None, start=None, end=None, period=None, modified_since=None, input_file=None, methods=None, **kwargs)¶ Fetches site data.
Parameters: - site_code (str) – The site code of the site you want to query data for.
- service ({
None
, ‘instantaneous’, ‘iv’, ‘daily’, ‘dv’}) – The service to use, either “instantaneous”, “daily”, orNone
(default). If set toNone
, then both services are used. The abbreviations “iv” and “dv” can be used for “instantaneous” and “daily”, respectively. - parameter_code (str) – Parameter code(s) that will be passed as the parameterCd parameter.
- statistic_code (str) – Statistic code(s) that will be passed as the statCd parameter
- start (
None
or datetime (see note on dates and times)) – Start of a date range for a query. This parameter is mutually exclusive with period (you cannot use both). It should not be older than 1910-1-1 for ‘iv’ and 1851-1-1 for ‘dv’ services. - end (
None
or datetime (see note on dates and times)) – End of a date range for a query. This parameter is mutually exclusive with period (you cannot use both). - period ({
None
, str, datetime.timedelta}) – Period of time to use for requesting data. This will be passed along as the period parameter. This can either be ‘all’ to signal that you’d like the entire period of record (down to 1910-1-1 for ‘iv’, 1851-1-1 for ‘dv’), or string in ISO 8601 period format (e.g. ‘P1Y2M21D’ for a period of one year, two months and 21 days) or it can be a datetime.timedelta object representing the period of time. This parameter is mutually exclusive with start/end dates. - modified_since (
None
or datetime.timedelta) – Passed along as the modifiedSince parameter. - input_file (
None
, file path or file object) – IfNone
(default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services. - methods (
None
, str or Python dict) – IfNone
(default), it’s assumed that there is a single method for each parameter. This raises an error if more than one method ids are encountered. If str, this is the method id for the requested parameter/s and can use “all” if method ids are not known beforehand. If dict, provide the parameter_code to method id mapping. Parameter’s method id is specific to site.
Returns: data_dict – a python dict with parameter codes mapped to value dicts
Return type:
-
ulmo.usgs.nwis.hdf5.
get_site
(site_code, path=None, complevel=None, complib=None)¶ Fetches previously-cached site information from an hdf5 file.
Parameters: - site_code (str) – The site code of the site you want to get information for.
- path (
None
or file path) – Path to the hdf5 file to be queried, ifNone
then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking. - complevel (
None
or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is. - complib (
None
or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. IfNone
(default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
Returns: site_dict – a python dict containing site information
Return type:
-
ulmo.usgs.nwis.hdf5.
get_site_data
(site_code, agency_code=None, parameter_code=None, path=None, complevel=None, complib=None, start=None)¶ Fetches previously-cached site data from an hdf5 file.
Parameters: - site_code (str) – The site code of the site you want to get data for.
- agency_code (
None
or str) – The agency code to get data for. This will need to be set if a site code is in use by multiple agencies (this is rare). - parameter_code (
None
, str, or list) – List of parameters to read. IfNone
(default) read all parameters. Otherwise only read specified parameters. Parameters should be specified with statistic code, i.e. daily streamflow is ‘00060:00003’ - path (
None
or file path) – Path to the hdf5 file to be queried, ifNone
then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking. - complevel (
None
or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is. - complib (
None
or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. IfNone
(default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used. - start (
None
or string formatted date like 2014-01-01) – Filter the dataset to return only data later that the start date
Returns: data_dict – a python dict with parameter codes mapped to value dicts
Return type:
-
ulmo.usgs.nwis.hdf5.
get_sites
(path=None, complevel=None, complib=None)¶ Fetches previously-cached site information from an hdf5 file.
Parameters: - path (
None
or file path) – Path to the hdf5 file to be queried, ifNone
then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking. - complevel (
None
or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is. - complib (
None
or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. IfNone
(default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
Returns: sites_dict – a python dict with site codes mapped to site information
Return type: - path (
-
ulmo.usgs.nwis.hdf5.
remove_values
(site_code, datetime_dicts, path=None, complevel=None, complib=None, autorepack=True)¶ Remove values from hdf5 file.
Parameters: - site_code (str) – The site code of the site to remove records from.
- datetime_dicts (a python dict with a list of datetimes for a given variable) – (key) to set as NaNs.
- path (file path to hdf5 file.)
Returns: None
Return type: None
-
ulmo.usgs.nwis.hdf5.
repack
(path, complevel=None, complib=None)¶ Repack the hdf5 file at path. This is the same as running the pytables ptrepack command on the file.
Parameters: - path (file path) – Path to the hdf5 file.
- complevel (
None
or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is. - complib (
None
or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. IfNone
(default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
Returns: None
Return type: None
-
ulmo.usgs.nwis.hdf5.
update_site_data
(site_code, start=None, end=None, period=None, path=None, methods=None, input_file=None, complevel=None, complib=None, autorepack=True)¶ Update cached site data.
Parameters: - site_code (str) – The site code of the site you want to query data for.
- start (
None
or datetime (see note on dates and times)) – Start of a date range for a query. This parameter is mutually exclusive with period (you cannot use both). - end (
None
or datetime (see note on dates and times)) – End of a date range for a query. This parameter is mutually exclusive with period (you cannot use both). - period ({
None
, str, datetime.timedelta}) – Period of time to use for requesting data. This will be passed along as the period parameter. This can either be ‘all’ to signal that you’d like the entire period of record, or string in ISO 8601 period format (e.g. ‘P1Y2M21D’ for a period of one year, two months and 21 days) or it can be a datetime.timedelta object representing the period of time. This parameter is mutually exclusive with start/end dates. - path (
None
or file path) – Path to the hdf5 file to be queried, ifNone
then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking. - methods (
None
, str or Python dict) – IfNone
(default), it’s assumed that there is a single method for each parameter. This raises an error if more than one method ids are encountered. If str, this is the method id for the requested parameter/s and can use “all” if method ids are not known beforehand. If dict, provide the parameter_code to method id mapping. Parameter’s method id is specific to site. - input_file (
None
, file path or file object) – IfNone
(default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services. - autorepack (bool) – Whether or not to automatically repack the h5 file(s) after updating. There is a tradeoff between performance and disk space here: large files take a longer time to repack but also tend to grow larger faster, the default of True conserves disk space because untamed file growth can become quite destructive. If you set this to False, you can manually repack files with repack().
Returns: None
Return type: None
-
ulmo.usgs.nwis.hdf5.
update_site_list
(sites=None, state_code=None, huc=None, bounding_box=None, county_code=None, parameter_code=None, site_type=None, service=None, input_file=None, complevel=None, complib=None, autorepack=True, path=None, **kwargs)¶ Update cached site information.
See ulmo.usgs.nwis.core.get_sites() for description of regular parameters, only extra parameters used for caching are listed below.
Parameters: - path (
None
or file path) – Path to the hdf5 file to be queried, ifNone
then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking. - input_file (
None
, file path or file object) – IfNone
(default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services. - complevel (
None
or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is. - complib (
None
or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. IfNone
(default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used. - autorepack (bool) – Whether or not to automatically repack the h5 file after updating. There is a tradeoff between performance and disk space here: large files take a longer time to repack but also tend to grow larger faster, the default of True conserves disk space because untamed file growth can become quite destructive. If you set this to False, you can manually repack files with repack().
Returns: None
Return type: None
- path (
USGS National Elevation Dataset (NED) raster services¶
National Elevation Dataset (NED) services (Raster)
-
ulmo.usgs.ned.
get_available_layers
()¶ return list of available data layers
-
ulmo.usgs.ned.
get_raster
(layer, bbox, path=None, update_cache=False, check_modified=False, mosaic=False)¶ downloads National Elevation Dataset raster tiles that cover the given bounding box for the specified data layer.
Parameters: - layer (str) – dataset layer name. (see get_available_layers for list)
- bbox ((sequence of float|str)) – bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)
- path (
None
or path) – ifNone
default path will be used - update_cache (
True
orFalse
(default)) – ifFalse
and output file already exists use it. - check_modified (
True
orFalse
(default)) – if tile exists in path, check if newer file exists online and download if available. - mosaic (
True
orFalse
(default)) – ifTrue
, mosaic and clip downloaded tiles to the extents of the bbox provided. Requires rasterio package and GDAL.
Returns: raster_tiles – metadata as a FeatureCollection. local url of downloaded data is in feature[‘properties’][‘file’]
Return type: geojson FeatureCollection
-
ulmo.usgs.ned.
get_raster_availability
(layer, bbox=None)¶ retrieve metadata for raster tiles that cover the given bounding box for the specified data layer.
Parameters: - layer (str) – dataset layer name. (see get_available_layers for list)
- bbox ((sequence of float|str)) – bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)
Returns: metadata – returns metadata including download urls as a FeatureCollection
Return type: geojson FeatureCollection
Readers for USA regional (sub-national) data¶
California Department of Water Resources Historical Data¶
-
ulmo.cdec.historical.
get_stations
()¶ Fetches information on all CDEC sites.
Returns: df – a pandas DataFrame (indexed on site id) with station information. Return type: pandas DataFrame
-
ulmo.cdec.historical.
get_sensors
(sensor_id=None)¶ Gets a list of sensor ids as a DataFrame indexed on sensor number. Can be limited by a list of numbers.
Usage example:
from ulmo import cdec # to get all available sensor info sensors = cdec.historical.get_sensors() # or to get just one sensor sensor = cdec.historical.get_sensors([1])
Parameters: sites (iterable of integers or None
)Returns: df – a python dict with site codes mapped to site information Return type: pandas DataFrame
-
ulmo.cdec.historical.
get_station_sensors
(station_ids=None, sensor_ids=None, resolutions=None)¶ Gets available sensors for the given stations, sensor ids and time resolutions. If no station ids are provided, all available stations will be used (this is not recommended, and will probably take a really long time).
The list can be limited by a list of sensor numbers, or time resolutions if you already know what you want. If none of the provided sensors or resolutions are available, an empty DataFrame will be returned for that station.
Usage example:
from ulmo import cdec # to get all available sensors available_sensors = cdec.historical.get_station_sensors(['NEW'])
Parameters: - station_ids (iterable of strings or
None
) - sensor_ids (iterable of integers or
None
) – check out or use theget_sensors()
function to see a list of available sensor numbers - resolutions (iterable of strings or
None
) – Possible values are ‘event’, ‘hourly’, ‘daily’, and ‘monthly’ but not all of these time resolutions are available at every station.
Returns: dict – a python dict with site codes as keys with values containing pandas DataFrames of available sensor numbers and metadata.
Return type: a python dict
- station_ids (iterable of strings or
-
ulmo.cdec.historical.
get_data
(station_ids=None, sensor_ids=None, resolutions=None, start=None, end=None)¶ Downloads data for a set of CDEC station and sensor ids. If either is not provided, all available data will be downloaded. Be really careful with choosing hourly resolution as the data sets are big, and CDEC’s servers are slow as molasses in winter.
Usage example:
from ulmo import cdec dat = cdec.historical.get_data(['PRA'],resolutions=['daily'])
Parameters: - station_ids (iterable of strings or
None
) - sensor_ids (iterable of integers or
None
) – check out or use theget_sensors()
function to see a list of available sensor numbers - resolutions (iterable of strings or
None
) – Possible values are ‘event’, ‘hourly’, ‘daily’, and ‘monthly’ but not all of these time resolutions are available at every station.
Returns: dict – a python dict with site codes as keys. Values will be nested dicts containing all of the sensor/resolution combinations.
Return type: a python dict
- station_ids (iterable of strings or
Lower Colorado River Authority (LCRA)¶
LCRA Hydromet Data¶
Access to hydrologic and climate data in the Colorado River Basin (Texas) provided by the Hydromet web site and web service from the Lower Colorado River Authority.
-
ulmo.lcra.hydromet.
get_sites_by_type
(site_type)¶ Gets list of the hydromet site codes and description for site.
Parameters: site_type (str) – In all but lake sites, this is the parameter code collected at the site. For lake sites, it is ‘lake’. See site_types
andPARAMETERS
Returns: sites_dict – A python dict with four char long site codes mapped to site information. Return type: dict
-
ulmo.lcra.hydromet.
get_site_data
(site_code, parameter_code, as_dataframe=True, start_date=None, end_date=None, dam_site_location='head')¶ Fetches site’s parameter data
Parameters: - site_code (str) – The LCRA site code (four chars long) of the site you want to query data for.
- parameter_code (str) – LCRA parameter code. see
PARAMETERS
- start_date (
None
or datetime) – Start of a date range for a query. - end_date (
None
or datetime) – End of a date range for a query. - as_dataframe (
True
(default) orFalse
) – This determines what format values are returned as. IfTrue
(default) then the values will be a pandas.DataFrame object with the values timestamp as the index. IfFalse
, the format will be Python dictionary. - dam_site_location (‘head’ (default) or ‘tail’) – The site location relative to the dam.
Returns: - df (pandas.DataFrame or)
- values_dict (dict)
-
ulmo.lcra.hydromet.
get_all_sites
()¶ Returns list of all LCRA hydromet sites as geojson featurecollection.
-
ulmo.lcra.hydromet.
get_current_data
(service, as_geojson=False)¶ fetches the current (near real-time) river stage and flow values from LCRA web service.
Parameters: - service (str) – The web service providing data. see current_data_services. Currently we have GetUpperBasin and GetLowerBasin.
- as_geojson (‘True’ or ‘False’ (default)) – If True the data is returned as geojson featurecollection and if False data is returned as list of dicts.
Returns: - current_values_dicts (a list of dicts or)
- current_values_geojson (a geojson featurecollection.)
LCRA Water Quality Data¶
Access to water quality data in the Colorado River Basin (Texas) provided by the Water Quality web site and web service from the Lower Colorado River Authority.
-
ulmo.lcra.waterquality.
get_sites
(source_agency=None)¶ Fetches a list of sites with location and available metadata.
Parameters: source_agency (str) – LCRA used code of the that collects the data. There are sites whose sources are not listed so this filter may not return all sites of a certain source. See source_map
.Returns: sites_geojson Return type: geojson FeatureCollection
-
ulmo.lcra.waterquality.
get_historical_data
(site_code, start=None, end=None, as_dataframe=False)¶ Fetches data for a site at a given date.
Parameters: - site_code (str) – The site code to fetch data for. A list of sites can be retrieved with
get_sites()
- date (
None
or date (see note on dates and times)) – The date of the data to be queried. If date isNone
(default), then all data will be returned. - as_dataframe (bool) – This determines what format values are returned as. If
False
(default), the values dict will be a dict with timestamps as keys mapped to a dict of gauge variables and values. IfTrue
then the values dict will be a pandas.DataFrame object containing the equivalent information.
Returns: data_dict – A dict containing site information and values.
Return type: - site_code (str) – The site code to fetch data for. A list of sites can be retrieved with
-
ulmo.lcra.waterquality.
get_recent_data
(site_code, as_dataframe=False)¶ fetches near real-time instantaneous water quality data for the LCRA bay sites.
Parameters: - site_code (str) – The bay site to fetch data for. see real_time_sites
- as_dataframe (bool) – This determines what format values are returned as. If
False
(default), the values will be list of value dicts. IfTrue
then values are returned as pandas.DataFrame.
Returns: list of values or dataframe.
Return type:
Texas Weather Connection Daily Keetch-Byram Drought Index (KBDI)¶
ulmo.twc.kbdi.core¶
This module provides direct access to Texas Weather Connection - Daily Keetch-Byram Drought Index (KBDI) dataset.
-
ulmo.twc.kbdi.
get_data
(county=None, start=None, end=None, as_dataframe=False, data_dir=None)¶ Retreives data.
Parameters: - county (
None
or str) – If specified, results will be limited to the county corresponding to the given 5-character Texas county fips code i.e. 48???. - end (
None
or date (see note on dates and times)) – Results will be limited to data on or before this date. Default is the current date. - start (
None
or date (see note on dates and times)) – Results will be limited to data on or after this date. Default is the start of the calendar year for the end date. - as_dataframe (bool) – If
False
(default), a dict with a nested set of dicts will be returned with data indexed by 5-character Texas county FIPS code. IfTrue
then a pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this toTrue
is a little bit faster as it skips a serialization step. - data_dir (
None
or directory path) – Directory for holding downloaded data files. If no path is provided (default), then a user-specific directory for holding application data will be used (the directory will depend on the platform/operating system).
Returns: data – A dict or pandas.DataFrame representing the data. See the
as_dataframe
parameter for more.Return type: dict or pandas.Dataframe
- county (
US Army Corps of Engineers (USACE) - Tulsa District Water Control¶
Access to data provided by the United States Army Corps of Engineers - Tulsa District Water Control web site.
-
ulmo.usace.swtwc.
get_stations
()¶ Fetches a list of station codes and descriptions.
Returns: stations_dict – a python dict with station codes mapped to station information Return type: dict
-
ulmo.usace.swtwc.
get_station_data
(station_code, date=None, as_dataframe=False)¶ Fetches data for a station at a given date.
Parameters: - station_code (str) – The station code to fetch data for. A list of stations can be retrieved with
get_stations()
- date (
None
or date (see note on dates and times)) – The date of the data to be queried. If date isNone
(default), then data for the current day is retreived. - as_dataframe (bool) – This determines what format values are returned as. If
False
(default), the values dict will be a dict with timestamps as keys mapped to a dict of gauge variables and values. IfTrue
then the values dict will be a pandas.DataFrame object containing the equivalent information.
Returns: data_dict – A dict containing station information and values.
Return type: - station_code (str) – The station code to fetch data for. A list of stations can be retrieved with