Suche

Wo soll gesucht werden?
Erweiterte Literatursuche

Ariadne Pfad:

Inhalt

Literaturnachweis - Detailanzeige

 
Autor/inLeeman-Munk, Samuel Paul
TitelMorphosyntactic Neural Analysis for Generalized Lexical Normalization
Quelle(2016), (126 Seiten)
PDF als Volltext Verfügbarkeit 
Ph.D. Dissertation, North Carolina State University
Spracheenglisch
Dokumenttypgedruckt; online; Monographie
ISBN978-1-3696-3848-6
SchlagwörterHochschulschrift; Dissertation; Morphology (Languages); Syntax; Content Analysis; Data Analysis; Lexicology; Assistive Technology; Electronic Equipment; Optics; Evaluation Methods; Computational Linguistics; Translation; Models; Engineering
AbstractThe phenomenal growth of social media, web forums, and online reviews has spurred a growing interest in automated analysis of user-generated text. At the same time, a proliferation of voice recordings and efforts to archive culture heritage documents are fueling demand for effective automatic speech recognition (ASR) and optical character recognition (OCR). These sources of text all have two qualities in common: they are high in volume, and they frequently diverge from standard language in their surface forms, making them difficult to analyze using conventional methods. To address these challenges, we either need to update our analysis methods to be robust to noisy text, or we need to design a technique to convert such text into a predetermined standard form, or "normalize" it. This document introduces an instance of the latter approach. Many techniques have been proposed to normalize ASR, OCR, and Twitter data, but they have always been treated as separate tasks despite having much in common. To our knowledge, the work presented here is the first to unite these tasks under a single umbrella task of generalized lexical normalization and develop an approach to this task based on deep learning. We introduce two architectures for this purpose. The first uses a simple feed-forward neural network to perform Twitter normalization. This approach is context-insensitive and achieved third place in the Lexical Normalization of English Tweets Challenge conducted with the ACL Workshop on Noisy User Text at the 2015 Annual Meeting of the Association for Computational Linguistics. Our second architecture is an extension of the first that, using concepts from neural machine translation, adds a gated bidirectional recurrent neural network to use the context in which a word appears as well as the characters in the word itself to normalize both Twitter and other sources of noisy text. We evaluate this second architecture on optical character recognition post-processing, automatic speech recognition post-processing, and Twitter text normalization. In comparison with specialized tools for OCR postprocessing and Twitter normalization, we find that our model performs comparably on each of these tasks to the competing model specialized for it and significantly outperforms the model specialized for the other task. This indicates the ability for our model to learn to normalize different types of noise from data, and suggests that it could similarly learn to be effective on other unseen types of noise without the need for expensive feature engineering. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.] (As Provided).
AnmerkungenProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Erfasst vonERIC (Education Resources Information Center), Washington, DC
Update2020/1/01
Literaturbeschaffung und Bestandsnachweise in Bibliotheken prüfen
 

Standortunabhängige Dienste
Die Wikipedia-ISBN-Suche verweist direkt auf eine Bezugsquelle Ihrer Wahl.
Tipps zum Auffinden elektronischer Volltexte im Video-Tutorial

Trefferlisten Einstellungen

Permalink als QR-Code

Permalink als QR-Code

Inhalt auf sozialen Plattformen teilen (nur vorhanden, wenn Javascript eingeschaltet ist)

Teile diese Seite: