nnAudio.Spectrogram.CQT1992¶
- class nnAudio.Spectrogram.CQT1992(sr=22050, hop_length=512, fmin=220, fmax=None, n_bins=84, trainable_STFT=False, trainable_CQT=False, bins_per_octave=12, filter_scale=1, output_format='Magnitude', norm=1, window='hann', center=True, pad_mode='reflect')¶
Bases:
torch.nn.modules.module.ModuleThis alogrithm uses the method proposed in [1], which would run extremely slow if low frequencies (below 220Hz) are included in the frequency bins. Please refer to
CQT1992v2()for a more computational and memory efficient version. [1] Brown, Judith C.C. and Miller Puckette. “An efficient algorithm for the calculation of a constant Q transform.” (1992).This function is to calculate the CQT of the input signal. Input signal should be in either of the following shapes.
(len_audio)(num_audio, len_audio)(num_audio, 1, len_audio)
The correct shape will be inferred autommatically if the input follows these 3 shapes. Most of the arguments follow the convention from librosa. This class inherits from
torch.nn.Module, therefore, the usage is same astorch.nn.Module.- Parameters
sr (int) – The sampling rate for the input audio. It is used to calucate the correct
fminandfmax. Setting the correct sampling rate is very important for calculating the correct frequency.hop_length (int) – The hop (or stride) size. Default value is 512.
fmin (float) – The frequency for the lowest CQT bin. Default is 32.70Hz, which coresponds to the note C0.
fmax (float) – The frequency for the highest CQT bin. Default is
None, therefore the higest CQT bin is inferred from then_binsandbins_per_octave. Iffmaxis notNone, then the argumentn_binswill be ignored andn_binswill be calculated automatically. Default isNonen_bins (int) – The total numbers of CQT bins. Default is 84. Will be ignored if
fmaxis notNone.bins_per_octave (int) – Number of bins per octave. Default is 12.
trainable_STFT (bool) – Determine if the time to frequency domain transformation kernel for the input audio is trainable or not. Default is
Falsetrainable_CQT (bool) – Determine if the frequency domain CQT kernel is trainable or not. Default is
Falsenorm (int) – Normalization for the CQT kernels.
1means L1 normalization, and2means L2 normalization. Default is1, which is same as the normalization used in librosa.window (str) – The windowing function for CQT. It uses
scipy.signal.get_window, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’.center (bool) – Putting the CQT keneral at the center of the time-step or not. If
False, the time index is the beginning of the CQT kernel, ifTrue, the time index is the center of the CQT kernel. Default value ifTrue.pad_mode (str) – The padding method. Default value is ‘reflect’.
trainable (bool) –
- Determine if the CQT kernels are trainable or not. If
True, the gradients for CQT kernels will also be caluclated and the CQT kernels will be updated during model training. Default value is
False.- output_formatstr
Determine the return type.
Magnitudewill return the magnitude of the STFT result, shape =(num_samples, freq_bins,time_steps);Complexwill return the STFT result in complex number, shape =(num_samples, freq_bins,time_steps, 2);Phasewill return the phase of the STFT reuslt, shape =(num_samples, freq_bins,time_steps, 2). The complex number is stored as(real, imag)in the last axis. Default value is ‘Magnitude’.
- Determine if the CQT kernels are trainable or not. If
verbose (bool) – If
True, it shows layer information. IfFalse, it suppresses all prints
- Returns
spectrogram (torch.tensor)
It returns a tensor of spectrograms.
shape =
(num_samples, freq_bins,time_steps)ifoutput_format='Magnitude';shape =
(num_samples, freq_bins,time_steps, 2)ifoutput_format='Complex' or 'Phase';
Examples
>>> spec_layer = Spectrogram.CQT1992v2() >>> specs = spec_layer(x)
Methods
__init__Initializes internal Module state, shared by both nn.Module and ScriptModule.
Set the extra representation of the module
Convert a batch of waveforms to CQT spectrograms.
- extra_repr() → str¶
Set the extra representation of the module
To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x, output_format=None, normalization_type='librosa')¶
Convert a batch of waveforms to CQT spectrograms.
- Parameters
x (torch tensor) –
Input signal should be in either of the following shapes.
(len_audio)(num_audio, len_audio)
3.
(num_audio, 1, len_audio)It will be automatically broadcast to the right shape