Abstract: Considering the power-hungry nature of speech processing, a keyword spotting (KWS) unit, used to detect multiple spoken words, is often integrated as a front-end layer. KWS systems are ...
Abstract: This study proposes an innovative speech translation method based on Pix2PixGAN, which maps the Mel spectrograms of speech produced by deaf individuals to those of normal-hearing individuals ...
A major direction of Deep Learning in audio, especially generative models, is using features in frequency domain because directly model raw time signal is hard. But this require an extra process to ...
This repository contains the code to generate images that sound, a special spectrogram that can be seen as images and played as sound. Note: our method does not have a high success rate since it's ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results