Eliminating fuzzy duplicates in crowdsourced lexical resources


Kiselev, Yuri ; Ustalov, Dmitry ; Porshnev, Sergey


[img]
Preview
PDF
Kiselev-GWC2016.pdf - Published

Download (215kB)

URL: https://ub-madoc.bib.uni-mannheim.de/43369
URN: urn:nbn:de:bsz:180-madoc-433699
Document Type: Conference or workshop publication
Year of publication: 2016
Book title: Proceedings of the Eighth Global WordNet Conference (GWC-16) : January 27-30, Bucharest, Romania
Page range: 161-167
Conference title: Global WordNet Conference 2016
Location of the conference venue: Bucharest, Romania
Date of the conference: January 27-30, 2016
Publisher: Barbu Mititelu, Verginica
Place of publication: Bucarest
Publishing house: Global WordNet Association
ISBN: 978-606-714-239-6 , 978-973-0-20728-6
Publication language: English
Institution: School of Business Informatics and Mathematics > Wirtschaftsinformatik III (Ponzetto 2016-)
Subject: 004 Computer science, internet
Abstract: Collaboratively created lexical resources is a trending approach to creating high quality thesauri in a short time span at a remarkably low price. The key idea is to invite non-expert participants to express and share their knowledge with the aim of constructing a resource. However, this approach tends to be noisy and error-prone, thus making data cleansing a highly topical task to perform. In this paper, we study different techniques for synset deduplication including machine- and crowd-based ones. Eventually, we put forward an approach that can solve the deduplication problem fully automatically, with the quality comparable to the expert-based approach.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.

Dieser Datensatz wurde nicht während einer Tätigkeit an der Universität Mannheim veröffentlicht, dies ist eine Externe Publikation.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item