nnAudio.Spectrogram.iSTFT¶
- class nnAudio.Spectrogram.iSTFT(n_fft=2048, win_length=None, freq_bins=None, hop_length=None, window='hann', freq_scale='no', center=True, fmin=50, fmax=6000, sr=22050, trainable_kernels=False, trainable_window=False, verbose=True, refresh_win=True)¶
 Bases:
torch.nn.modules.module.ModuleThis class is to convert spectrograms back to waveforms. It only works for the complex value spectrograms. If you have the magnitude spectrograms, please use
Griffin_Lim(). The parameters (e.g. n_fft, window) need to be the same as the STFT in order to obtain the correct inverse. If trainability is not required, it is recommended to use theinversemethod under theSTFTclass to save GPU/RAM memory.When
trainable=Trueandfreq_scale!='no', there is no guarantee that the inverse is perfect, please use with extra care.- Parameters
 n_fft (int) – The window size. Default value is 2048.
freq_bins (int) – Number of frequency bins. Default is
None, which meansn_fft//2+1bins Please make sure the value is the same as the forward STFT.hop_length (int) – The hop (or stride) size. Default value is
Nonewhich is equivalent ton_fft//4. Please make sure the value is the same as the forward STFT.window (str) – The windowing function for iSTFT. It uses
scipy.signal.get_window, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’. Please make sure the value is the same as the forward STFT.freq_scale ('linear', 'log', or 'no') – Determine the spacing between each frequency bin. When linear or log is used, the bin spacing can be controlled by
fminandfmax. If ‘no’ is used, the bin will start at 0Hz and end at Nyquist frequency with linear spacing. Please make sure the value is the same as the forward STFT.center (bool) – Putting the iSTFT keneral at the center of the time-step or not. If
False, the time index is the beginning of the iSTFT kernel, ifTrue, the time index is the center of the iSTFT kernel. Default value ifTrue. Please make sure the value is the same as the forward STFT.fmin (int) – The starting frequency for the lowest frequency bin. If freq_scale is
no, this argument does nothing. Please make sure the value is the same as the forward STFT.fmax (int) – The ending frequency for the highest frequency bin. If freq_scale is
no, this argument does nothing. Please make sure the value is the same as the forward STFT.sr (int) – The sampling rate for the input audio. It is used to calucate the correct
fminandfmax. Setting the correct sampling rate is very important for calculating the correct frequency.trainable_kernels (bool) – Determine if the STFT kenrels are trainable or not. If
True, the gradients for STFT kernels will also be caluclated and the STFT kernels will be updated during model training. Default value isFalse.trainable_window (bool) – Determine if the window function is trainable or not. Default value is
False.verbose (bool) – If
True, it shows layer information. IfFalse, it suppresses all prints.device (str) – Choose which device to initialize this layer. Default value is ‘cpu’.
- Returns
 spectrogram – It returns a batch of waveforms.
- Return type
 torch.tensor
Examples
>>> spec_layer = Spectrogram.iSTFT() >>> specs = spec_layer(x)
Methods
__init__Initializes internal Module state, shared by both nn.Module and ScriptModule.
If your spectrograms only have
n_fft//2+1frequency bins, please useonesided=True, else useonesided=FalseTo make sure the inverse STFT has the same output length of the original waveform, please set length as your intended waveform length.- forward(X, onesided=False, length=None, refresh_win=None)¶
 If your spectrograms only have
n_fft//2+1frequency bins, please useonesided=True, else useonesided=FalseTo make sure the inverse STFT has the same output length of the original waveform, please set length as your intended waveform length. By default,length=None, which will removen_fft//2samples from the start and the end of the output. If your input spectrograms X are of the same length, please userefresh_win=Noneto increase computational speed.