Yazar "Tüfekçi, Zekeriya" seçeneğine göre listele
Listeleniyor 1 - 2 / 2
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe A Review on Deep Learning Architectures for Speech Recognition(2020) Dokuz, Yeşim; Tüfekçi, ZekeriyaDeep learning is a branch of machine learning that uses several algorithms which tries to model datasets by using deep architectureswith many processing layers. With the popularity and successful applications of deep learning architectures, they are being used inspeech recognition, as well. Researchers utilized these architectures for speech recognition and its applications, such as speechemotion recognition, voice activity detection, and speaker recognition and verification to better model speech inputs with outputs andto reduce error rates of speech recognition systems. Many studies are performed in the literature that use deep learning architecturesfor speech recognition systems. The literature studies show that using deep learning architectures for speech recognition and itsapplications provide benefits for many speech recognition areas and have ability to reduce error rates and provide better performance.In this study, first of all, we explained speech recognition problem and the steps of speech recognition. Then, we analyzed the studiesrelated to deep learning based speech recognition. In particular, deep learning architectures of Deep Neural Networks, ConvolutionalNeural Networks, and Recurrent Neural Networks and hybrid approaches that use these architectures are evaluated and the literaturestudies related to these architectures for speech recognition and the application areas of speech recognition are investigated. As aresult, we observed that RNNs are the most utilized and powerful deep learning architecture among all of the deep learningarchitectures in terms of error rates and speech recognition performance. CNNs are other successful deep learning architectures andhave closer results with RNN in terms of error rates and speech recognition performance. Also, we observed that new deeparchitectures that use either hybrid of DNNs, CNNs, and RNNs or other deep learning architectures are getting attention and haveincreasing performance and could reduce error rates in speech recognition.Öğe Investigation of the Effect of LSTM Hyperparameters on Speech Recognition Performance(2020) Dokuz, Yeşim; Tüfekçi, ZekeriyaWith the recent advances in hardware technologies and computational methods, computers became more powerful for analyzingdifficult tasks, such as speech recognition and image processing. Speech recognition is the task of extraction of text representation ofa speech signal using computational or analytical methods. Speech recognition is a challenging problem due to variations in accents and languages, powerful hardware requirements, big dataset needs for generating accurate models, and environmental factors thataffect signal quality. Recently, with the increasing processing ability of hardware devices, such as Graphical Processing Units, deeplearning methods became more prevalent and state-of-the-art method for speech recognition, especially Recurrent Neural Networks(RNNs) and Long-Short Term Memory (LSTMs) networks which is a variant of RNNs. In the literature, RNNs and LSTMs are usedfor speech recognition and the applications of speech recognition with various parameters, i.e. number of layers, number of hiddenunits, and batch size. It is not investigated that how the parameter values of the literature are selected and whether these values couldbe used in future studies. In this study, we investigated the effect of LSTMs hyperparameters on speech recognition performance interms of error rates and deep architecture cost. Each parameter is investigated separately while other parameters remain constant andthe effect of each parameter is observed on a speech corpus. Experimental results show that each parameter has its specific values forthe selected number of training instances to provide lower error rates and better speech recognition performance. It is shown in thisstudy that before selecting appropriate values for each LSTM parameters, there should be several experiments performed on thespeech corpus to find the most eligible value for each parameter.