nnAudio.Spectrogram.MelSpectrogram¶
- class nnAudio.Spectrogram.MelSpectrogram(sr=22050, n_fft=2048, n_mels=128, hop_length=512, window='hann', center=True, pad_mode='reflect', power=2.0, htk=False, fmin=0.0, fmax=None, norm=1, trainable_mel=False, trainable_STFT=False, verbose=True, device='cpu', **kwargs)¶
Bases:
torch.nn.modules.module.ModuleThis function is to calculate the Melspectrogram of the input signal. Input signal should be in either of the following shapes.
(len_audio)(num_audio, len_audio)(num_audio, 1, len_audio)
The correct shape will be inferred automatically if the input follows these 3 shapes. Most of the arguments follow the convention from librosa. This class inherits from
torch.nn.Module, therefore, the usage is same astorch.nn.Module.- Parameters
sr (int) – The sampling rate for the input audio. It is used to calculate the correct
fminandfmax. Setting the correct sampling rate is very important for calculating the correct frequency.n_fft (int) – The window size for the STFT. Default value is 2048
n_mels (int) – The number of Mel filter banks. The filter banks maps the n_fft to mel bins. Default value is 128.
hop_length (int) – The hop (or stride) size. Default value is 512.
window (str) – The windowing function for STFT. It uses
scipy.signal.get_window, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’.center (bool) – Putting the STFT keneral at the center of the time-step or not. If
False, the time index is the beginning of the STFT kernel, ifTrue, the time index is the center of the STFT kernel. Default value ifTrue.pad_mode (str) – The padding method. Default value is ‘reflect’.
htk (bool) – When
Falseis used, the Mel scale is quasi-logarithmic. WhenTrueis used, the Mel scale is logarithmic. The default value isFalse.fmin (int) – The starting frequency for the lowest Mel filter bank.
fmax (int) – The ending frequency for the highest Mel filter bank.
trainable_mel (bool) – Determine if the Mel filter banks are trainable or not. If
True, the gradients for Mel filter banks will also be calculated and the Mel filter banks will be updated during model training. Default value isFalse.trainable_STFT (bool) – Determine if the STFT kenrels are trainable or not. If
True, the gradients for STFT kernels will also be caluclated and the STFT kernels will be updated during model training. Default value isFalse.verbose (bool) – If
True, it shows layer information. IfFalse, it suppresses all prints.device (str) – Choose which device to initialize this layer. Default value is ‘cpu’.
- Returns
spectrogram – It returns a tensor of spectrograms. shape =
(num_samples, freq_bins,time_steps).- Return type
torch.tensor
Examples
>>> spec_layer = Spectrogram.MelSpectrogram() >>> specs = spec_layer(x)
Methods
__init__Initializes internal Module state, shared by both nn.Module and ScriptModule.
Convert a batch of waveforms to Mel spectrograms.
- forward(x)¶
Convert a batch of waveforms to Mel spectrograms.
- Parameters
x (torch tensor) –
Input signal should be in either of the following shapes.
(len_audio)(num_audio, len_audio)
3.
(num_audio, 1, len_audio)It will be automatically broadcast to the right shape