nnAudio.features.stft.STFT¶
- class nnAudio.features.stft.STFT(n_fft=2048, win_length=None, freq_bins=None, hop_length=None, window='hann', freq_scale='no', center=True, pad_mode='reflect', iSTFT=False, fmin=50, fmax=6000, sr=22050, trainable=False, output_format='Complex', verbose=True)¶
Bases:
nnAudio.features.stft.STFTBase
This function is to calculate the short-time Fourier transform (STFT) of the input signal. Input signal should be in either of the following shapes.
(len_audio)
(num_audio, len_audio)
(num_audio, 1, len_audio)
The correct shape will be inferred automatically if the input follows these 3 shapes. Most of the arguments follow the convention from librosa. This class inherits from
nn.Module
, therefore, the usage is same asnn.Module
.- Parameters
n_fft (int) – Size of Fourier transform. Default value is 2048.
win_length (int) – the size of window frame and STFT filter. Default: None (treated as equal to n_fft)
freq_bins (int) – Number of frequency bins. Default is
None
, which meansn_fft//2+1
bins.hop_length (int) – The hop (or stride) size. Default value is
None
which is equivalent ton_fft//4
.window (str) – The windowing function for STFT. It uses
scipy.signal.get_window
, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’.freq_scale ('linear', 'log', 'log2' or 'no') – Determine the spacing between each frequency bin. When linear, ‘log’ or log2 is used, the bin spacing can be controlled by
fmin
andfmax
. If ‘no’ is used, the bin will start at 0Hz and end at Nyquist frequency with linear spacing.center (bool) – Putting the STFT keneral at the center of the time-step or not. If
False
, the time index is the beginning of the STFT kernel, ifTrue
, the time index is the center of the STFT kernel. Default value ifTrue
.pad_mode (str) – The padding method. Default value is ‘reflect’.
iSTFT (bool) – To activate the iSTFT module or not. By default, it is False to save GPU memory. Note: The iSTFT kernel is not trainable. If you want a trainable iSTFT, use the iSTFT module.
fmin (int) – The starting frequency for the lowest frequency bin. If freq_scale is
no
, this argument does nothing.fmax (int) – The ending frequency for the highest frequency bin. If freq_scale is
no
, this argument does nothing.sr (int) – The sampling rate for the input audio. It is used to calucate the correct
fmin
andfmax
. Setting the correct sampling rate is very important for calculating the correct frequency.trainable (bool) – Determine if the STFT kenrels are trainable or not. If
True
, the gradients for STFT kernels will also be caluclated and the STFT kernels will be updated during model training. Default value isFalse
output_format (str) – Control the spectrogram output type, either
Magnitude
,Complex
, orPhase
. The output_format can also be changed during theforward
method.verbose (bool) – If
True
, it shows layer information. IfFalse
, it suppresses all prints
- Returns
spectrogram – It returns a tensor of spectrograms.
shape = (num_samples, freq_bins,time_steps)
ifoutput_format='Magnitude'
;shape = (num_samples, freq_bins,time_steps, 2)
ifoutput_format='Complex' or 'Phase'
;- Return type
torch.tensor
Examples
>>> spec_layer = Spectrogram.STFT() >>> specs = spec_layer(x)
Methods
__init__
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Set the extra representation of the module
Convert a batch of waveforms to spectrograms.
This function is same as the
iSTFT()
class, which is to convert spectrograms back to waveforms.- extra_repr() → str¶
Set the extra representation of the module
To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x, output_format=None)¶
Convert a batch of waveforms to spectrograms.
- Parameters
x (torch tensor) –
Input signal should be in either of the following shapes.
(len_audio)
(num_audio, len_audio)
3.
(num_audio, 1, len_audio)
It will be automatically broadcast to the right shapeoutput_format (str) – Control the type of spectrogram to be return. Can be either
Magnitude
orComplex
orPhase
. Default value isComplex
.
- inverse(X, onesided=True, length=None, refresh_win=True)¶
This function is same as the
iSTFT()
class, which is to convert spectrograms back to waveforms. It only works for the complex value spectrograms. If you have the magnitude spectrograms, please useGriffin_Lim()
.- Parameters
onesided (bool) – If your spectrograms only have
n_fft//2+1
frequency bins, please useonesided=True
, else useonesided=False
length (int) – To make sure the inverse STFT has the same output length of the original waveform, please set length as your intended waveform length. By default,
length=None
, which will removen_fft//2
samples from the start and the end of the output.refresh_win (bool) – Recalculating the window sum square. If you have an input with fixed number of timesteps, you can increase the speed by setting
refresh_win=False
. Else please keeprefresh_win=True