nnAudio.utils.create_fourier_kernels

nnAudio.utils.create_fourier_kernels(n_fft, win_length=None, freq_bins=None, fmin=50, fmax=6000, sr=44100, freq_scale='linear', window='hann', verbose=True)

This function creates the Fourier Kernel for STFT, Melspectrogram and CQT. Most of the parameters follow librosa conventions. Part of the code comes from pytorch_musicnet. https://github.com/jthickstun/pytorch_musicnet

Parameters
  • n_fft (int) – The window size

  • freq_bins (int) – Number of frequency bins. Default is None, which means n_fft//2+1 bins

  • fmin (int) – The starting frequency for the lowest frequency bin. If freq_scale is no, this argument does nothing.

  • fmax (int) – The ending frequency for the highest frequency bin. If freq_scale is no, this argument does nothing.

  • sr (int) – The sampling rate for the input audio. It is used to calculate the correct fmin and fmax. Setting the correct sampling rate is very important for calculating the correct frequency.

  • freq_scale ('linear', 'log', 'log2', or 'no') – Determine the spacing between each frequency bin. When ‘linear’, ‘log’ or ‘log2’ is used, the bin spacing can be controlled by fmin and fmax. If ‘no’ is used, the bin will start at 0Hz and end at Nyquist frequency with linear spacing.

Returns

  • wsin (numpy.array) – Imaginary Fourier Kernel with the shape (freq_bins, 1, n_fft)

  • wcos (numpy.array) – Real Fourier Kernel with the shape (freq_bins, 1, n_fft)

  • bins2freq (list) – Mapping each frequency bin to frequency in Hz.

  • binslist (list) – The normalized frequency k in digital domain. This k is in the Discrete Fourier Transform equation $$