Fusing time-dependent web table data

Oulabi, Yaser ; Meusel, Robert ; Bizer, Christian

DOI: https://doi.org/10.1145/2932194.2932197
URL: http://doi.acm.org/10.1145/2932194.2932197
Document Type: Conference or workshop publication
Year of publication: 2016
Book title: WebDB '16 : Proceedings of the 19th International Workshop on Web and Databases, San Francisco, CA, USA, June 26, 2016 : co-located with ACM SIGMOD 2016
Page range: Article 3, 1-7
Conference title: 19th International Workshop on Web and Databases, WebDB '16
Location of the conference venue: San Francisco, CA
Date of the conference: June 26th, 2016
Place of publication: New York, NY
Publishing house: ACM
ISBN: 978-1-4503-4310-7
Publication language: English
Institution: School of Business Informatics and Mathematics > Wirtschaftsinformatik V (Bizer)
Subject: 004 Computer science, internet
Keywords (English): Data Fusion , Conflict Resolution , Web Tables , Web Data
Abstract: A subset of the HTML tables on the Web contains relational data. The data in these tables covers a multitude of topics and is thus very useful for complementing or validating cross-domain knowledge bases, such as DBpedia, YAGO, or the Google Knowledge Graph. A large fraction of the data in these knowledge bases is time-dependent, meaning that the correctness of an attribute value depends on a point in time. Fusing data from web tables in order to determine correct values for time-dependent attributes is challenging as most web tables do not contain timestamp information. A possibility to deal with this sparsity is to exploit timestamps which appear in different locations on the web page around the table. But as these timestamps might not apply to the web table value in question, this approach introduces noise. This paper investigates the extent to which the performance of data fusion strategies that rely on voting, PageRank, and Knowledge-Based-Trust can be improved by incorporating noisy and sparse timestamp information. For this, we present a machine-learning-based approach which considers different types of noisy timestamps in the data fusion process, and experiment with propagating timestamp information between web tables in order to overcome sparsity. We evaluate the data fusion strategies using a large public corpus of web tables and a public gold standard of time-dependent attribute values. We find that our methods effectively choose and weigh timestamp information per attribute and reduce sparsity using propagation. By incorporating timestamp information into data fusion strategies that previously did not exploit temporal meta information, we are able to increase F1-measure on average by 5%.

Dieser Eintrag ist Teil der Universitätsbibliographie.

+ Export and Citation

Choose a citation style.

+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item