nnAudio.features.stft.iSTFT

class nnAudio.features.stft.iSTFT(n_fft=2048, win_length=None, freq_bins=None, hop_length=None, window='hann', freq_scale='no', center=True, fmin=50, fmax=6000, sr=22050, trainable_kernels=False, trainable_window=False, verbose=True, refresh_win=True)

Bases: nnAudio.features.stft.STFTBase

This class is to convert spectrograms back to waveforms. It only works for the complex value spectrograms. If you have the magnitude spectrograms, please use Griffin_Lim(). The parameters (e.g. n_fft, window) need to be the same as the STFT in order to obtain the correct inverse. If trainability is not required, it is recommended to use the inverse method under the STFT class to save GPU/RAM memory.

When trainable=True and freq_scale!='no', there is no guarantee that the inverse is perfect, please use with extra care.

Parameters
  • n_fft (int) – The window size. Default value is 2048.

  • freq_bins (int) – Number of frequency bins. Default is None, which means n_fft//2+1 bins Please make sure the value is the same as the forward STFT.

  • hop_length (int) – The hop (or stride) size. Default value is None which is equivalent to n_fft//4. Please make sure the value is the same as the forward STFT.

  • window (str) – The windowing function for iSTFT. It uses scipy.signal.get_window, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’. Please make sure the value is the same as the forward STFT.

  • freq_scale ('linear', 'log', or 'no') – Determine the spacing between each frequency bin. When linear or log is used, the bin spacing can be controlled by fmin and fmax. If ‘no’ is used, the bin will start at 0Hz and end at Nyquist frequency with linear spacing. Please make sure the value is the same as the forward STFT.

  • center (bool) – Putting the iSTFT keneral at the center of the time-step or not. If False, the time index is the beginning of the iSTFT kernel, if True, the time index is the center of the iSTFT kernel. Default value if True. Please make sure the value is the same as the forward STFT.

  • fmin (int) – The starting frequency for the lowest frequency bin. If freq_scale is no, this argument does nothing. Please make sure the value is the same as the forward STFT.

  • fmax (int) – The ending frequency for the highest frequency bin. If freq_scale is no, this argument does nothing. Please make sure the value is the same as the forward STFT.

  • sr (int) – The sampling rate for the input audio. It is used to calucate the correct fmin and fmax. Setting the correct sampling rate is very important for calculating the correct frequency.

  • trainable_kernels (bool) – Determine if the STFT kenrels are trainable or not. If True, the gradients for STFT kernels will also be caluclated and the STFT kernels will be updated during model training. Default value is False.

  • trainable_window (bool) – Determine if the window function is trainable or not. Default value is False.

  • verbose (bool) – If True, it shows layer information. If False, it suppresses all prints.

Returns

spectrogram – It returns a batch of waveforms.

Return type

torch.tensor

Examples

>>> spec_layer = Spectrogram.iSTFT()
>>> specs = spec_layer(x)

Methods

__init__

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward

If your spectrograms only have n_fft//2+1 frequency bins, please use onesided=True, else use onesided=False To make sure the inverse STFT has the same output length of the original waveform, please set length as your intended waveform length.

forward(X, onesided=False, length=None, refresh_win=None)

If your spectrograms only have n_fft//2+1 frequency bins, please use onesided=True, else use onesided=False To make sure the inverse STFT has the same output length of the original waveform, please set length as your intended waveform length. By default, length=None, which will remove n_fft//2 samples from the start and the end of the output. If your input spectrograms X are of the same length, please use refresh_win=None to increase computational speed.