Extending a multilingual Lexical Resource by bootstrapping Named Entity Classification using Wikipedia's Category System


Knopp, Johannes


[img]
Preview
PDF
Knopp11ExtendingHeiNER.pdf - Published

Download (345kB)

URL: https://ub-madoc.bib.uni-mannheim.de/29542
Additional URL: http://www.aclweb.org/anthology-new/W/W11/W11-3607...
URN: urn:nbn:de:bsz:180-madoc-295426
Document Type: Conference or workshop publication
Year of publication: 2011
Book title: Proceedings of the Fifth International Workshop On Cross Lingual Information Access
Page range: 35-43
Date of the conference: 8.-13. Nov 2011
Place of publication: Chiang Mai, Thailand
Publishing house: Asian Federation of Natural Language Processing
Publication language: English
Institution: School of Business Informatics and Mathematics > Praktische Informatik II (Stuckenschmidt 2009-)
Subject: 004 Computer science, internet
Classification: CCS:
Individual keywords (German): Named Entities, Wikipedia, HeiNER, NERC
Abstract: Named Entity Recognition and Classification (NERC) is a well-studied NLP task which is typically approached using machine learning algorithms that rely on training data whose creation usually is expensive. The high costs result in the lack of NERC training data for many languages. An approach to create a multilingual NE corpus was presented in Wentland et al. (2008). The resulting resource called HeiNER describes a valuable number of NEs but does not include their types. We present a bootstrap approach based on Wikipedia’s category system to classify the NEs contained in HeiNER that is able to classify more than two million named entities to improve the resource’s quality.
Additional information: Online Ressource

Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item