WInte.r - a web data integration framework

Lehmberg, Oliver ; Brinkmann, Alexander ; Bizer, Christian

Additional URL:
Document Type: Conference or workshop publication
Year of publication: 2017
Book title: ISWC-P&D-Industry 2017 : Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017) Vienna, Austria, October 23rd to 25th, 2017
The title of a journal, publication series: CEUR Workshop Proceedings
Volume: 1963
Page range: Paper 506
Conference title: 16th International Semantic Web Conference (ISWC 2017)
Location of the conference venue: Vienna, Austria
Date of the conference: October 23-25,2017
Publisher: Nikitina, Nadeschda
Place of publication: Aachen
Publishing house: RWTH
ISSN: 1613-0073
Publication language: English
Institution: School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer)
Subject: 004 Computer science, internet
Keywords (English): Data Integration , Schema Matching , Identity Resolution , Data Fusion , Web Data
Abstract: The Web provides a plethora of structured data, such as semantic annotations in web pages, data from HTML tables, datasets from open data portals, or linked data from the Linked Open Data Cloud. For many use cases, it is necessary to integrate such web data with existing local datasets. This integration entails schema matching, identity resolution, as well as data fusion. As an alternative to using a combination of partial or ad hoc solutions, this poster presents the Web Data Integration Framework (WInte.r ), which supports end-to-end data integration by providing algorithms and building blocks for data pre-processing, schema matching, and identity resolution, as well as data fusion. While being fully usable out-of-the box, the framework is highly customisable and allows for the composition of sophisticated integration architectures such as T2K Match, which is used to match millions of web tables against DBpedia. A second use case for which WInte.r was employed is the task of stitching (combining) web tables from the same web site into larger tables as a preprocessing step before matching. The WInte.r framework is written in Java and is available as open source under the Apache 2.0 license.

