Feature-based hybrid strategies for gradient descent optimization in end-to-end speech recognition

dc.authoridDokuz, Yesim/0000-0001-7202-2899
dc.contributor.authorDokuz, Yesim
dc.contributor.authorTufekci, Zekeriya
dc.date.accessioned2024-11-07T13:31:26Z
dc.date.available2024-11-07T13:31:26Z
dc.date.issued2022
dc.departmentNiğde Ömer Halisdemir Üniversitesi
dc.description.abstractWith the increasing popularity of deep learning, deep learning architectures are being utilized in speech recognition. Deep learning based speech recognition became the state-of-the-art method for speech recognition tasks due to their outstanding performance over other methods. Generally, deep learning architectures are trained with a variant of gradient descent optimization. Mini-batch gradient descent is a variant of gradient descent optimization which updates network parameters after traversing a number of training instances. One limitation of mini-batch gradient descent is the random selection of mini-batch samples from training set. This situation is not preferred in speech recognition which requires training features to collapse all possible variations in speech databases. In this study, to overcome this limitation, hybrid mini-batch sample selection strategies are proposed. The proposed hybrid strategies use gender and accent features of speech databases in a hybrid way to select mini-batch samples when training deep learning architectures. Experimental results justify that using hybrid of gender and accent features is more successful in terms of speech recognition performance than using only one feature. The proposed hybrid mini-batch sample selection strategies would benefit other application areas that have metadata information, including image recognition and machine vision.
dc.identifier.doi10.1007/s11042-022-12304-5
dc.identifier.endpage9988
dc.identifier.issn1380-7501
dc.identifier.issn1573-7721
dc.identifier.issue7
dc.identifier.scopus2-s2.0-85124713489
dc.identifier.scopusqualityQ1
dc.identifier.startpage9969
dc.identifier.urihttps://doi.org/10.1007/s11042-022-12304-5
dc.identifier.urihttps://hdl.handle.net/11480/14847
dc.identifier.volume81
dc.identifier.wosWOS:000756497800020
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherSpringer
dc.relation.ispartofMultimedia Tools and Applications
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_20241106
dc.subjectSpeech recognition
dc.subjectDeep learning
dc.subjectMini-batch gradient descent
dc.subjectHybrid sample selection strategies
dc.subjectLSTM
dc.titleFeature-based hybrid strategies for gradient descent optimization in end-to-end speech recognition
dc.typeArticle

Dosyalar