nnAudio.Spectrogram.CQT1992v2¶
- class nnAudio.Spectrogram.CQT1992v2(sr=22050, hop_length=512, fmin=32.7, fmax=None, n_bins=84, bins_per_octave=12, norm=1, window='hann', center=True, pad_mode='reflect', trainable=False, output_format='Magnitude', verbose=True)¶
Bases:
torch.nn.modules.module.Module
This function is to calculate the CQT of the input signal. Input signal should be in either of the following shapes.
(len_audio)
(num_audio, len_audio)
(num_audio, 1, len_audio)
The correct shape will be inferred autommatically if the input follows these 3 shapes. Most of the arguments follow the convention from librosa. This class inherits from
torch.nn.Module
, therefore, the usage is same astorch.nn.Module
.This alogrithm uses the method proposed in [1]. I slightly modify it so that it runs faster than the original 1992 algorithm, that is why I call it version 2. [1] Brown, Judith C.C. and Miller Puckette. “An efficient algorithm for the calculation of a constant Q transform.” (1992).
- Parameters
sr (int) – The sampling rate for the input audio. It is used to calucate the correct
fmin
andfmax
. Setting the correct sampling rate is very important for calculating the correct frequency.hop_length (int) – The hop (or stride) size. Default value is 512.
fmin (float) – The frequency for the lowest CQT bin. Default is 32.70Hz, which coresponds to the note C0.
fmax (float) – The frequency for the highest CQT bin. Default is
None
, therefore the higest CQT bin is inferred from then_bins
andbins_per_octave
. Iffmax
is notNone
, then the argumentn_bins
will be ignored andn_bins
will be calculated automatically. Default isNone
n_bins (int) – The total numbers of CQT bins. Default is 84. Will be ignored if
fmax
is notNone
.bins_per_octave (int) – Number of bins per octave. Default is 12.
norm (int) – Normalization for the CQT kernels.
1
means L1 normalization, and2
means L2 normalization. Default is1
, which is same as the normalization used in librosa.window (str) – The windowing function for CQT. It uses
scipy.signal.get_window
, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’.center (bool) – Putting the CQT keneral at the center of the time-step or not. If
False
, the time index is the beginning of the CQT kernel, ifTrue
, the time index is the center of the CQT kernel. Default value ifTrue
.pad_mode (str) – The padding method. Default value is ‘reflect’.
trainable (bool) –
- Determine if the CQT kernels are trainable or not. If
True
, the gradients for CQT kernels will also be caluclated and the CQT kernels will be updated during model training. Default value is
False
.- output_formatstr
Determine the return type.
Magnitude
will return the magnitude of the STFT result, shape =(num_samples, freq_bins,time_steps)
;Complex
will return the STFT result in complex number, shape =(num_samples, freq_bins,time_steps, 2)
;Phase
will return the phase of the STFT reuslt, shape =(num_samples, freq_bins,time_steps, 2)
. The complex number is stored as(real, imag)
in the last axis. Default value is ‘Magnitude’.
- Determine if the CQT kernels are trainable or not. If
verbose (bool) – If
True
, it shows layer information. IfFalse
, it suppresses all prints
- Returns
spectrogram (torch.tensor)
It returns a tensor of spectrograms.
shape =
(num_samples, freq_bins,time_steps)
ifoutput_format='Magnitude'
;shape =
(num_samples, freq_bins,time_steps, 2)
ifoutput_format='Complex' or 'Phase'
;
Examples
>>> spec_layer = Spectrogram.CQT1992v2() >>> specs = spec_layer(x)
Methods
__init__
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Convert a batch of waveforms to CQT spectrograms.
Method for debugging
- forward(x, output_format=None)¶
Convert a batch of waveforms to CQT spectrograms.
- Parameters
x (torch tensor) –
Input signal should be in either of the following shapes.
(len_audio)
(num_audio, len_audio)
3.
(num_audio, 1, len_audio)
It will be automatically broadcast to the right shape
- forward_manual(x)¶
Method for debugging