nnAudio 0.2.0¶
Welcome to nnAudio 0.2.0. This new version changes the syntax of the spectrogram layers creation,
such that stft_layer.to(device)
can be used. This new version is more stable
than the previous version since it is more compatible with other torch modules.
nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.
Other GPU audio processing tools are
torchaudio and
tf.signal.
But they are not using the neural network approach, and hence the
Fourier basis can not be trained. As of PyTorch 1.6.0, torchaudio is
still very difficult to install under the Windows environment due to
sox
. nnAudio is a more compatible audio processing tool across
different operating systems since it relies mostly on PyTorch
convolutional neural network. The name of nnAudio comes from
torch.nn
.
The implementation details for nnAudio have also been published in IEEE Access, people who are interested can read the paper.
The source code for nnAudio can be found in GitHub.