Feature-based hybrid strategies for gradient descent optimization in end-to-end speech recognition

Dokuz, YesimTufekci, Zekeriya2024-11-072024-11-0720221380-75011573-7721https://doi.org/10.1007/s11042-022-12304-5https://hdl.handle.net/11480/14847With the increasing popularity of deep learning, deep learning architectures are being utilized in speech recognition. Deep learning based speech recognition became the state-of-the-art method for speech recognition tasks due to their outstanding performance over other methods. Generally, deep learning architectures are trained with a variant of gradient descent optimization. Mini-batch gradient descent is a variant of gradient descent optimization which updates network parameters after traversing a number of training instances. One limitation of mini-batch gradient descent is the random selection of mini-batch samples from training set. This situation is not preferred in speech recognition which requires training features to collapse all possible variations in speech databases. In this study, to overcome this limitation, hybrid mini-batch sample selection strategies are proposed. The proposed hybrid strategies use gender and accent features of speech databases in a hybrid way to select mini-batch samples when training deep learning architectures. Experimental results justify that using hybrid of gender and accent features is more successful in terms of speech recognition performance than using only one feature. The proposed hybrid mini-batch sample selection strategies would benefit other application areas that have metadata information, including image recognition and machine vision.eninfo:eu-repo/semantics/closedAccessSpeech recognitionDeep learningMini-batch gradient descentHybrid sample selection strategiesLSTMFeature-based hybrid strategies for gradient descent optimization in end-to-end speech recognitionArticle8179969998810.1007/s11042-022-12304-52-s2.0-85124713489Q1WOS:000756497800020Q2