Ristoski, P., Bizer, C., & Paulheim, H. (2015). Mining the web of linked data with rapidminer.
Web Semantics: Science, Services and Agents on the World Wide Web,
35, 142-151.
Lots of data from different domains is published as Linked Open Data (LOD). While there are quite a few browsers
for such data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by
mining theWeb of Linked Data is still missing. In this system paper, we introduce the RapidMiner Linked Open Data
extension. The extension hooks into the powerful data mining and analysis platform RapidMiner, and offers operators
for accessing Linked Open Data in RapidMiner, allowing for using it in sophisticated data analysis workflows without
the need for expert knowledge in SPARQL or RDF. The extension allows for autonomously exploring the Web of
Data by following links, thereby discovering relevant datasets on the fly, as well as for integrating overlapping data
found in different datasets. As an example, we show how statistical data from the World Bank on scientific publications,
published as an RDF data cube, can be automatically linked to further datasets and analyzed using additional
background knowledge from ten different LOD datasets.