nnAudio.utils.create_fourier_kernels¶
- nnAudio.utils.create_fourier_kernels(n_fft, win_length=None, freq_bins=None, fmin=50, fmax=6000, sr=44100, freq_scale='linear', window='hann', verbose=True)¶
This function creates the Fourier Kernel for STFT, Melspectrogram and CQT. Most of the parameters follow librosa conventions. Part of the code comes from pytorch_musicnet. https://github.com/jthickstun/pytorch_musicnet
- Parameters
n_fft (int) – The window size
freq_bins (int) – Number of frequency bins. Default is
None
, which meansn_fft//2+1
binsfmin (int) – The starting frequency for the lowest frequency bin. If freq_scale is
no
, this argument does nothing.fmax (int) – The ending frequency for the highest frequency bin. If freq_scale is
no
, this argument does nothing.sr (int) – The sampling rate for the input audio. It is used to calculate the correct
fmin
andfmax
. Setting the correct sampling rate is very important for calculating the correct frequency.freq_scale ('linear', 'log', or 'no') – Determine the spacing between each frequency bin. When ‘linear’ or ‘log’ is used, the bin spacing can be controlled by
fmin
andfmax
. If ‘no’ is used, the bin will start at 0Hz and end at Nyquist frequency with linear spacing.
- Returns
wsin (numpy.array) – Imaginary Fourier Kernel with the shape
(freq_bins, 1, n_fft)
wcos (numpy.array) – Real Fourier Kernel with the shape
(freq_bins, 1, n_fft)
bins2freq (list) – Mapping each frequency bin to frequency in Hz.
binslist (list) – The normalized frequency
k
in digital domain. Thisk
is in the Discrete Fourier Transform equation $$