Tutorial

The Waterinfo module facilitates access to waterinfo.be, a website managed by the Flanders Environment Agency (VMM) and Flanders Hydraulics Research (HIC). The website provides access to real-time water and weather related environmental variables for Flanders (Belgium), such as rainfall, air pressure, discharge, and water level. The package provides functions to search for stations and variables, and download time series.

The API is a product of Kisters and is called KIWIS. Hence, the code would work on other deployments of this API as well. As VMM and HIC have each another deployment of the API, the documentation could be slightly different for VMM versus HIC.

Introduction

The waterinfo.be API uses a system of identifiers, called ts_id to define individual time series. For example, the identifier ts_id = 78073042 corresponds to the time series of air pressure data for the measurement station in Liedekerke, with a 15 min time resolution. Hence, the ts_id identifier defines a variable of interest from a measurement station of interest with a specific frequency (e.g. 15 min, hourly,…). The knowledge of the proper identifier is essential to be able to download the corresponding data.

In order to get started, make sure to define the source of the data: VMM or HIC:

from pywaterinfo import Waterinfo
vmm = Waterinfo("vmm") # look for data from VMM
hic = Waterinfo("hic") # look for data from HIC

One of the reasons is that tokens are provided by them separately. If you have a token available, add this to the initiation to make sure all session requests are using the token:

from pywaterinfo import Waterinfo
vmm_token = "DUMMY"
vmm = Waterinfo("vmm",  token=vmm_token)

Download with known ts identifier

In case you already know the ts_id identifier that defines your time series, the class Waterinfo provides the method get_timeseries_values() to download a specific period of the time series. For example, to download the air pressure time series data of Liedekerke with a 15 min resolution (ts_id = 78073042) for the first of January 2016:

from pywaterinfo import Waterinfo
vmm = Waterinfo("vmm")
vmm.get_timeseries_values("78073042", start="2016-01-01", end="2016-01-02")

Mostly, you do not know these identifiers. Hence, to search for the required identifiers, different methods are provided to support this, as described in the following sections.

The datetime inputs (start and end) are assumed to be ‘UTC’ by default. To request data in another (supported) time zone (e.g. CET, GMT, Etc/GMT+1,…), add the timezone parameter, e.g. timezone='CET'.

Warning

This behavior is different to the KIWIS API itself, which interprets the incoming date format always as CET. Hence, requesting data to the REST API directly from ‘2019-05-01 14:00:00’ with timezone ‘UTC’ will return data starting from ‘2019-05-01 12:00:00+00’ (UTC). In the pywaterinfo package, the start and end parameters are assumed in the timezone of the request parameter timezone (unless the start and end already contain time zone info).

Apart from the start and end configuration, the usage of the period is a convenient way of requesting time series. See the get_timeseries_values() for more information and examples.

When interested in all available data of a time series (! watch out with credit limits) or using the start/end of the time series in the request, one can find these in the metadata of a time series as illustrated in the following example:

from pywaterinfo import Waterinfo

hic = Waterinfo("hic")

# Request the start/end of the time series
station_metadata = hic.get_timeseries_list(ts_id = 51814010)
start, end = station_metadata[["from", "to"]].values[0]

# Get data from start of time series up to next two days
df = hic.get_timeseries_values(51814010, start=start, period="P2D")

Note

If you want ‘naive’ timestamps in the returned time series, use the tz_localize function of Pandas, e.g. df["Timestamp"] = df["Timestamp"].dt.tz_localize(None).

Time series groups

A lot of the time series and stations are bundled in so-called timeseriesgroup_id’s. They represent for example all available station of rainfall at a given frequency (e.g. 15 Min). To get an overview of the available groups, use the method get_group_list(), e.g. for the HIC stations:

from pywaterinfo import Waterinfo
hic = Waterinfo("hic")
hic.get_group_list()

Note

A number of these group identifiers are described in the available documentation of VMM/HIC and are the preferred option to query for the provided variables. For an overview, see the Timeseriesgroup_ids page.

Time series group data

To get all the available time series identifiers (ts_id) within a given group, use the get_timeseries_value_layer() method. It provides the metadata of these stations and (by default) the latest measured value. The group identifier for conductivity measured by HIC is 156173:

from pywaterinfo import Waterinfo
hic = Waterinfo("hic")
hic.get_timeseries_value_layer(timeseriesgroup_id="156173")

Multiple identifiers can be combined in a single statement:

from pywaterinfo import Waterinfo
hic = Waterinfo("hic")
# combine oxygen and conductivity in a single call
hic.get_timeseries_value_layer(timeseriesgroup_id="156207,156173")

Note

When requesting only a subset of the fields using returnfields, the resulting dataframe still contains a lot of metadata fields added by default. To exclude these in the respond, use the metadata parameter equal to False. For example:

water_level = vmm.get_timeseries_value_layer("192780",
    returnfields="timestamp,ts_value",
    metadata="false")

Search identifier based on parameter or station name

In the situation you are looking for the identifiers of all measured parameters at a station or all the stations measuring a given parameter, use the get_timeseries_list() method. It supports wildcards and supports looking based on station information, parameter information or a combination of both:

vmm = Waterinfo("vmm")
# for given station ME09_012, which time series are available?
vmm.get_timeseries_list(station_no="ME09_012")
# for a given parameter PET, which time series are available?
vmm.get_timeseries_list(parametertype_name="PET")

An example use case is to get the available parameters (in waterinfo also called stationparameter) at a given station? As pywaterinfo returns a Pandas DataFrame, combine pywaterinfo with the functionalities from Pandas (e.g. unique method):

vmm = Waterinfo("vmm")
# for station L11_518, which station parameters are available?
station_l11_518 = vmm.get_timeseries_list(station_no="L11_518",
                                          returnfields="ts_id,station_name,stationparameter_longname")
station_l11_518["stationparameter_longname"].unique()

Custom queries

The VMM and HIC APIs provide more API paths. Whereas no specialized functions are available, use the request_kiwis() method to do custom calls to the KIWIS API. For example, using the getStationList query for stations starting with a P:

vmm = Waterinfo("vmm")
vmm.request_kiwis({"request": "getStationList", "station_no": "P*"})