The new REST nodes: Get Request 


REST Operations and GET Request
Until KNIME Analytics Platform 3.2 the only way for KNIME to connect to a REST service was to install the community KREST extension and use their REST access nodes. The nodes were great! However, they required an extra-installation step and were not under the KNIME development team control.

Starting with KNIME Analytics Platform 3.2 a few native nodes to access external REST services have been added to the KNIME Labs: GET Request, POST Request, PUT Request, and DELETE Request node. Each one of these nodes implements a special request to a web service.

The most commonly used REST requests are search requests. Search requests are like SELECT queries in database. They allow you to retrieve information but not to alter the REST service structure or underlying dataset. A search request is implemented via a “GET Request” node.

The “GET Request” node issues one or more HTTP GET Requests to a REST service. GET Requests retrieve data from the REST service without sending any data other than (optional) request parameters.

As an example, let’s query the World Health Organization (WHO) for the most frequent causes of death in high income countries. If together with the causes of death we would like to retrieve a few additional parameters, like sex or age group, we can use the following “GET Request”:
http://apps.who.int/gho/athena/data/GHO/MORT_400.xml?profile=simple&filter=AGEGROUP:*;MGHEREG:WBDCP_HIINCOME;GHECAUSES:*;SEX:*

This GET Request URL has been built by following the instructions provided on the WHO Data API description page http://apps.who.int/gho/data/node.resources.api
Let’s use now a “GET Request” node to send this request to the WHO Data API and to import and parse the response data into a KNIME workflow.

The configuration window of the “GET Request” node has 4 tabs: “Connection Settings”; “Authentication”; “Request Headers”; and “Response Headers”.

Connection Settings
The GET Request(s) to send to a REST service is (are) identified either through one single manually inserted fixed URL or through a list of dynamic URLs in an input data column. Two options in the Connection Settings tab respectively enable these two modes: URL and URL column.

Options “Delay” and “Concurrency” both deal with multiple requests. Delay specifies a delay between two consecutive requests, e.g. in order to avoid overloading the web service. Concurrency allows for N parallel GET Requests to be sent, if the REST service allows it.

The flags in the SSL section push for a higher tolerance in security when checking the REST host SSL certificates. Even if some SSL certificates are not perfect the returning response is accepted.

When something fails in the REST service, the output data table will contain just one row with the status of the request. In some cases we might desire the workflow to stop and signal the REST error. In this case, the two options “Fail on Connection problems …” and “Fail on HTTP errors” should be enabled.

Flag “Follow Redirects”, if enabled, forces the “GET Request” to be redirected, if so specified in the REST service; “Timeout” sets the number of seconds to wait before declaring a connection timed out; and “Body” contains the name of the response column in the output data table.

In our case, we just need to insert the “GET Request” URL from above in the “URL” field and leave everything as default, since we are not particularly strict about failing error messages and SSL certificates.

Authentication
The “Authentication” tab sets the authentication credentials, if required by the REST service. The WHO Data API however does not require any authentication, at least just to pull data from it, and therefore we select the option “None”.

The node supports several authentication methods, e.g. BASIC and DIGEST. Username and password can be provided manually or via workflow credentials. As for the database nodes, workflow credentials are automatically encrypted while manual typing of username and password requires a Master Key for the encryption process. The Master Key as usual can be set in the Preferences page.

Picture
Request Headers
Every request being shipped off to the REST service may contain a header. The default from the “GET Request” node is that the generated request contains none.
Custom Request Headers though can be defined in the “Request Headers” tab. A request header consists of many parameters and every parameter consists of 3 fields: key, value, and kind. Three request headers are available as templates: none, generic REST headers, and web page related headers.

Selecting one of these templates loads the corresponding parameter list. Any header then can be modified through parameter deletion, insertion, and editing and through merging and replacing of the selected header template with the current one. Editing, deletion, and insertion of parameters is obtained with the buttons at the end of the configuration window’s tab. Merging and replacement of the current header is obtained through the 2 buttons at the top of the configuration window’s tab.

To keep things simple, and because it was not required by the WHO REST service, we shipped no extra-headers in the request.

Picture
Response Headers
The response object that comes back will also contain headers. Normally only the status and content-type are retrieved. All other headers are usually ignored. However, we could import all of them through the flag named “Extract all Headers” at the very top of the “Response Headers” tab in the configuration window.

If you prefer not to extract all headers from the response, but just some, you can set the key names in the tab table.  The value associated with such keys will be extracted from the response object and placed in a data column of the output table. The name of this data column is also set in the “Response Headers” tab.

Picture
Even here, we decided to keep the default list and not to make things too complex.

Output Table
If we now execute the node, the Request URL in the “Configuration Settings” tab is sent to the REST service, the response comes back, and gets displayed at the output port of the node. Usually the response comes back in XML or JSON format.

We also selected to extract the Status and content-type of the request from the response header (Figure 4). So the node output table will consist of a JSON or XML response object, a Status code, and a content-type encoding specifying whether the response came back as XML or as JSON. Additional response headers could be extracted, if we add the corresponding structure in the “Response Headers” tab of the configuration window.

Whether the response format is XML or JSON is usually decided by the REST service itself. However, the “GET Request” node can automatically identify that and place the body of the response in a data cell of appropriate type. In case no automatic conversion is possible, binary cells will be created.

Picture
The output data table in this case contains only one data row, since only one request had been sent. If we were supplying a list of dynamic URL requests through an input data column, the output data table would contain as many data rows as many requests had been sent.

Parsing the Response Body through the XPath Node
If the response body comes back as JSON, the node JSON to Table would be used to convert the body content into a KNIME data table. If the response body comes back as XML, the node XPath can parse the different parts of the XML cell. The WHO Data API returns responses in XML format. So, we will talk here about the XPath node.

The XPath node requires the column containing the XML objects and the XPath to extract values and parameters from such XML objects. If you have worked a bit with XML objects, you probably know how hard it is to define the right path. In order to help us a bit with the XPath definition, the configuration window of the XPath node provides a preview of the XML structure as soon as the XML input column has been set.

The XML structure preview is interactive. By clicking or double-clicking an item, its path is shown in green immediately under the XPath summary table. By clicking the button “Add XPath” the Xpath gets automatically added to the list of existing XPaths and the corresponding items in the XML structure get extracted during node execution.

In our case, through this procedure, we have selected: GHO, GHECAUSES, AGEGROUP, MGHEREG, SEX, YEAR, and Numeric. Those variables then get extracted from the response during the execution of the XPath node and presented at the node output port.
When parsing, it is also possible to add the namespace for the root element. This can be done in the “Namespaces” tab in the configuration window.

The output table of this workflow contains 4740 data rows and 9 data columns. The 9 data columns come from the number of items selected through the XPath node, while the 4740 data rows are the number of records found by the WHO REST service at our request.

Picture


Conclusions

In this blog post I have shown you in detail how to query a REST service using the “GET Request” node from the newest addition to KNIME Analytics Platform 3.2 and how to parse the XML response using the XPath node from the XML category.
The final workflow can be found in figure 7.

Picture

To conclude I would like to remind you that the most of the web works on REST services. The “GET Request” node opens the door on a new horizon of possibilities for data access and integration.

This entry was posted in Data Mining. Bookmark the permalink.