nnAudio.utils.create_fourier_kernels¶

nnAudio.utils.create_fourier_kernels(n_fft, win_length=None, freq_bins=None, fmin=50, fmax=6000, sr=44100, freq_scale='linear', window='hann', verbose=True)¶

This function creates the Fourier Kernel for STFT, Melspectrogram and CQT. Most of the parameters follow librosa conventions. Part of the code comes from pytorch_musicnet. https://github.com/jthickstun/pytorch_musicnet

Parameters

n_fft (int) – The window size
freq_bins (int) – Number of frequency bins. Default is None, which means n_fft//2+1 bins
fmin (int) – The starting frequency for the lowest frequency bin. If freq_scale is no, this argument does nothing.
fmax (int) – The ending frequency for the highest frequency bin. If freq_scale is no, this argument does nothing.
sr (int) – The sampling rate for the input audio. It is used to calculate the correct fmin and fmax. Setting the correct sampling rate is very important for calculating the correct frequency.
freq_scale ('linear', 'log', 'log2', or 'no') – Determine the spacing between each frequency bin. When ‘linear’, ‘log’ or ‘log2’ is used, the bin spacing can be controlled by fmin and fmax. If ‘no’ is used, the bin will start at 0Hz and end at Nyquist frequency with linear spacing.

Returns

wsin (numpy.array) – Imaginary Fourier Kernel with the shape (freq_bins, 1, n_fft)
wcos (numpy.array) – Real Fourier Kernel with the shape (freq_bins, 1, n_fft)
bins2freq (list) – Mapping each frequency bin to frequency in Hz.
binslist (list) – The normalized frequency k in digital domain. This k is in the Discrete Fourier Transform equation $$