nnAudio.Spectrogram.CQT2010

class nnAudio.Spectrogram.CQT2010(sr=22050, hop_length=512, fmin=32.7, fmax=None, n_bins=84, bins_per_octave=12, norm=True, basis_norm=1, window='hann', pad_mode='reflect', earlydownsample=True, verbose=True, device='cpu')

Bases: torch.nn.modules.module.Module

This algorithm is using the resampling method proposed in [1]. Instead of convoluting the STFT results with a gigantic CQT kernel covering the full frequency spectrum, we make a small CQT kernel covering only the top octave. Then we keep downsampling the input audio by a factor of 2 to convoluting it with the small CQT kernel. Everytime the input audio is downsampled, the CQT relative to the downsampled input is equavalent to the next lower octave.

The kernel creation process is still same as the 1992 algorithm. Therefore, we can reuse the code from the 1992 alogrithm [2] [1] Schörkhuber, Christian. “CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING.” (2010). [2] Brown, Judith C.C. and Miller Puckette. “An efficient algorithm for the calculation of a constant Q transform.” (1992).

early downsampling factor is to downsample the input audio to reduce the CQT kernel size. The result with and without early downsampling are more or less the same except in the very low frequency region where freq < 40Hz.

Methods

__init__

Initializes internal Module state, shared by both nn.Module and ScriptModule.

early_downsample

Return new sampling rate and hop length after early dowansampling

early_downsample_count

Compute the number of early downsampling operations

forward

Convert a batch of waveforms to CQT spectrograms.

get_cqt

Multiplying the STFT result with the cqt_kernal, check out the 1992 CQT paper [1] for how to multiple the STFT result with the CQT kernel.

get_early_downsample_params

early_downsample(sr, hop_length, n_octaves, nyquist, filter_cutoff)

Return new sampling rate and hop length after early dowansampling

early_downsample_count(nyquist, filter_cutoff, hop_length, n_octaves)

Compute the number of early downsampling operations

forward(x)

Convert a batch of waveforms to CQT spectrograms.

Parameters

x (torch tensor) –

Input signal should be in either of the following shapes.

  1. (len_audio)

  2. (num_audio, len_audio)

3. (num_audio, 1, len_audio) It will be automatically broadcast to the right shape

get_cqt(x, hop_length, padding)

Multiplying the STFT result with the cqt_kernal, check out the 1992 CQT paper [1] for how to multiple the STFT result with the CQT kernel. [2] Brown, Judith C.C. and Miller Puckette. “An efficient algorithm for the calculation of a constant Q transform.” (1992).