nnAudio.Spectrogram.CQT1992¶
- class nnAudio.Spectrogram.CQT1992(sr=22050, hop_length=512, fmin=220, fmax=None, n_bins=84, trainable_STFT=False, trainable_CQT=False, bins_per_octave=12, filter_scale=1, output_format='Magnitude', norm=1, window='hann', center=True, pad_mode='reflect')¶
Bases:
torch.nn.modules.module.Module
This alogrithm uses the method proposed in [1], which would run extremely slow if low frequencies (below 220Hz) are included in the frequency bins. Please refer to
CQT1992v2()
for a more computational and memory efficient version. [1] Brown, Judith C.C. and Miller Puckette. “An efficient algorithm for the calculation of a constant Q transform.” (1992).This function is to calculate the CQT of the input signal. Input signal should be in either of the following shapes.
(len_audio)
(num_audio, len_audio)
(num_audio, 1, len_audio)
The correct shape will be inferred autommatically if the input follows these 3 shapes. Most of the arguments follow the convention from librosa. This class inherits from
torch.nn.Module
, therefore, the usage is same astorch.nn.Module
.- Parameters
sr (int) – The sampling rate for the input audio. It is used to calucate the correct
fmin
andfmax
. Setting the correct sampling rate is very important for calculating the correct frequency.hop_length (int) – The hop (or stride) size. Default value is 512.
fmin (float) – The frequency for the lowest CQT bin. Default is 32.70Hz, which coresponds to the note C0.
fmax (float) – The frequency for the highest CQT bin. Default is
None
, therefore the higest CQT bin is inferred from then_bins
andbins_per_octave
. Iffmax
is notNone
, then the argumentn_bins
will be ignored andn_bins
will be calculated automatically. Default isNone
n_bins (int) – The total numbers of CQT bins. Default is 84. Will be ignored if
fmax
is notNone
.bins_per_octave (int) – Number of bins per octave. Default is 12.
trainable_STFT (bool) – Determine if the time to frequency domain transformation kernel for the input audio is trainable or not. Default is
False
trainable_CQT (bool) – Determine if the frequency domain CQT kernel is trainable or not. Default is
False
norm (int) – Normalization for the CQT kernels.
1
means L1 normalization, and2
means L2 normalization. Default is1
, which is same as the normalization used in librosa.window (str) – The windowing function for CQT. It uses
scipy.signal.get_window
, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’.center (bool) – Putting the CQT keneral at the center of the time-step or not. If
False
, the time index is the beginning of the CQT kernel, ifTrue
, the time index is the center of the CQT kernel. Default value ifTrue
.pad_mode (str) – The padding method. Default value is ‘reflect’.
trainable (bool) –
- Determine if the CQT kernels are trainable or not. If
True
, the gradients for CQT kernels will also be caluclated and the CQT kernels will be updated during model training. Default value is
False
.- output_formatstr
Determine the return type.
Magnitude
will return the magnitude of the STFT result, shape =(num_samples, freq_bins,time_steps)
;Complex
will return the STFT result in complex number, shape =(num_samples, freq_bins,time_steps, 2)
;Phase
will return the phase of the STFT reuslt, shape =(num_samples, freq_bins,time_steps, 2)
. The complex number is stored as(real, imag)
in the last axis. Default value is ‘Magnitude’.
- Determine if the CQT kernels are trainable or not. If
verbose (bool) – If
True
, it shows layer information. IfFalse
, it suppresses all prints
- Returns
spectrogram (torch.tensor)
It returns a tensor of spectrograms.
shape =
(num_samples, freq_bins,time_steps)
ifoutput_format='Magnitude'
;shape =
(num_samples, freq_bins,time_steps, 2)
ifoutput_format='Complex' or 'Phase'
;
Examples
>>> spec_layer = Spectrogram.CQT1992v2() >>> specs = spec_layer(x)
Methods
__init__
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Set the extra representation of the module
Convert a batch of waveforms to CQT spectrograms.
- extra_repr() → str¶
Set the extra representation of the module
To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x, output_format=None, normalization_type='librosa')¶
Convert a batch of waveforms to CQT spectrograms.
- Parameters
x (torch tensor) –
Input signal should be in either of the following shapes.
(len_audio)
(num_audio, len_audio)
3.
(num_audio, 1, len_audio)
It will be automatically broadcast to the right shape