nnAudio.features.cqt.CQT2010

class nnAudio.features.cqt.CQT2010(sr=22050, hop_length=512, fmin=32.7, fmax=None, n_bins=84, bins_per_octave=12, norm=True, basis_norm=1, window='hann', pad_mode='reflect', trainable_STFT=False, filter_scale=1, trainable_CQT=False, output_format='Magnitude', earlydownsample=True, verbose=True)

Bases: torch.nn.modules.module.Module

This algorithm is using the resampling method proposed in [1]. Instead of convoluting the STFT results with a gigantic CQT kernel covering the full frequency spectrum, we make a small CQT kernel covering only the top octave. Then we keep downsampling the input audio by a factor of 2 to convoluting it with the small CQT kernel. Everytime the input audio is downsampled, the CQT relative to the downsampled input is equavalent to the next lower octave. The kernel creation process is still same as the 1992 algorithm. Therefore, we can reuse the code from the 1992 alogrithm [2] [1] Schörkhuber, Christian. “CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING.” (2010). [2] Brown, Judith C.C. and Miller Puckette. “An efficient algorithm for the calculation of a constant Q transform.” (1992). early downsampling factor is to downsample the input audio to reduce the CQT kernel size. The result with and without early downsampling are more or less the same except in the very low frequency region where freq < 40Hz.

Methods

__init__

Initializes internal Module state, shared by both nn.Module and ScriptModule.

extra_repr

Set the extra representation of the module

forward

Convert a batch of waveforms to CQT spectrograms.

extra_repr()str

Set the extra representation of the module

To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x, output_format=None, normalization_type='librosa')

Convert a batch of waveforms to CQT spectrograms.

Parameters

x (torch tensor) –

Input signal should be in either of the following shapes.

  1. (len_audio)

  2. (num_audio, len_audio)

3. (num_audio, 1, len_audio) It will be automatically broadcast to the right shape