nnAudio.Spectrogram.MFCC¶
- class nnAudio.Spectrogram.MFCC(sr=22050, n_mfcc=20, norm='ortho', verbose=True, ref=1.0, amin=1e-10, top_db=80.0, **kwargs)¶
Bases:
torch.nn.modules.module.Module
This function is to calculate the Mel-frequency cepstral coefficients (MFCCs) of the input signal. This algorithm first extracts Mel spectrograms from the audio clips, then the discrete cosine transform is calcuated to obtain the final MFCCs. Therefore, the Mel spectrogram part can be made trainable using
trainable_mel
andtrainable_STFT
. It only support type-II DCT at the moment. Input signal should be in either of the following shapes.(len_audio)
(num_audio, len_audio)
(num_audio, 1, len_audio)
The correct shape will be inferred autommatically if the input follows these 3 shapes. Most of the arguments follow the convention from librosa. This class inherits from
torch.nn.Module
, therefore, the usage is same astorch.nn.Module
.- Parameters
sr (int) – The sampling rate for the input audio. It is used to calculate the correct
fmin
andfmax
. Setting the correct sampling rate is very important for calculating the correct frequency.n_mfcc (int) – The number of Mel-frequency cepstral coefficients
norm (string) – The default value is ‘ortho’. Normalization for DCT basis
**kwargs – Other arguments for Melspectrogram such as n_fft, n_mels, hop_length, and window
- Returns
MFCCs – It returns a tensor of MFCCs. shape =
(num_samples, n_mfcc, time_steps)
.- Return type
torch.tensor
Examples
>>> spec_layer = Spectrogram.MFCC() >>> mfcc = spec_layer(x)
Methods
__init__
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Set the extra representation of the module
Convert a batch of waveforms to MFCC.
- extra_repr() → str¶
Set the extra representation of the module
To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x)¶
Convert a batch of waveforms to MFCC.
- Parameters
x (torch tensor) –
Input signal should be in either of the following shapes.
(len_audio)
(num_audio, len_audio)
3.
(num_audio, 1, len_audio)
It will be automatically broadcast to the right shape