Central Library, Indian Institute of Technology Delhi
केंद्रीय पुस्तकालय, भारतीय प्रौद्योगिकी संस्थान दिल्ली

Discriminative learning for speech recognition theory and practice /

He, Xiaodong, 1973-

Discriminative learning for speech recognition theory and practice / [electronic resource] : Xiaodong He and Li Deng. - San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, c2008. - 1 electronic text (vii, 112 p. : ill.) : digital file. - Synthesis lectures on speech and audio processing, #4 1932-1678 ; . - Synthesis lectures on speech and audio processing, #4. .

Part of: Synthesis digital library of engineering and computer science. Series from website.

Includes bibliographical references (p. 107-110).

Introduction and background -- What is discriminative learning? -- What is speech recognition? -- Roles of discriminative learning in speech recognition -- Background: basic probability distributions -- Background: basic optimization concepts and techniques -- Organization of the book -- Statistical speech recognition: a tutorial -- Language modeling -- Acoustic modeling and HMMs -- Discriminative learning: a unified objective function -- A unified discriminative training criterion -- MMI and its unified form -- MCE and its unified form -- Minimum phone/word error and its unified form -- Discussions and comparisons -- Discriminative learning algorithm for exponential-family distributions -- Exponential-family models for classification -- Construction of auxiliary functions -- GT learning for exponential-family distributions -- Estimation formulas for two exponential-family distributions -- Discriminative learning algorithm for hidden Markov model -- Estimation formulas for discrete HMM -- Estimation formulas for CDHMM -- Relationship with gradient-based methods -- Setting constant D for GT-based optimization -- Practical implementation of discriminative learning -- Computing Dg (i, r, t) in growth-transform formulas -- Computing Dg (i, r, t) using lattices -- Arbitrary exponent scaling in MCE implementation -- Arbitrary slope in defining MCE cost function -- Selected experimental results -- Experimental results on small ASR tasks TIDIGITS -- Telephony LV-ASR applications -- Epilogue -- Summary of book contents -- Summary of contributions -- Remaining theoretical issue and future direction.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Compendex INSPEC Google scholar Google book search

In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-function form. This common form enables the use of the growth transformation (or extended Baum-Welch) optimization framework in discriminative learning of model parameters. In addition to all the necessary introduction of the background and tutorial material on the subject, we also included technical details on the derivation of the parameter optimization formulas for exponential-family distributions, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminative learning. Selected experimental results obtained by the authors in firsthand are presented to show that discriminative learning can lead to superior speech recognition performance over conventional parameter learning. Details on major algorithmic implementation issues with practical significance are provided to enable the practitioners to directly reduce the theory in the earlier part of the book into engineering practice.

Mode of access: World Wide Web.
System requirements: Adobe Acrobat Reader.

9781598293098 (ebook) 9781598293081 (pbk.)

10.2200/S00134ED1V01Y200807SAP004 doi

Automatic speech recognition--Statistical methods.

TK7895.S65 / H43 2008

Copyright © 2022 Central Library, Indian Institute of Technology Delhi. All Rights Reserved.

Powered by Koha