nnAudio.Spectrogram.Gammatonegram

class nnAudio.Spectrogram.Gammatonegram(sr=44100, n_fft=2048, n_bins=64, hop_length=512, window='hann', center=True, pad_mode='reflect', power=2.0, htk=False, fmin=20.0, fmax=None, norm=1, trainable_bins=False, trainable_STFT=False, verbose=True)

Bases: torch.nn.modules.module.Module

This function is to calculate the Gammatonegram of the input signal. Input signal should be in either of the following shapes. 1. (len_audio), 2. (num_audio, len_audio), 3. (num_audio, 1, len_audio). The correct shape will be inferred autommatically if the input follows these 3 shapes. This class inherits from torch.nn.Module, therefore, the usage is same as torch.nn.Module.

Parameters
  • sr (int) – The sampling rate for the input audio. It is used to calucate the correct fmin and fmax. Setting the correct sampling rate is very important for calculating the correct frequency.

  • n_fft (int) – The window size for the STFT. Default value is 2048

  • n_mels (int) – The number of Gammatonegram filter banks. The filter banks maps the n_fft to Gammatone bins. Default value is 64

  • hop_length (int) – The hop (or stride) size. Default value is 512.

  • window (str) – The windowing function for STFT. It uses scipy.signal.get_window, please refer to scipy documentation for possible windowing functions. The default value is ‘hann’

  • center (bool) – Putting the STFT keneral at the center of the time-step or not. If False, the time index is the beginning of the STFT kernel, if True, the time index is the center of the STFT kernel. Default value if True.

  • pad_mode (str) – The padding method. Default value is ‘reflect’.

  • htk (bool) – When False is used, the Mel scale is quasi-logarithmic. When True is used, the Mel scale is logarithmic. The default value is False

  • fmin (int) – The starting frequency for the lowest Gammatone filter bank

  • fmax (int) – The ending frequency for the highest Gammatone filter bank

  • trainable_mel (bool) – Determine if the Gammatone filter banks are trainable or not. If True, the gradients for Mel filter banks will also be caluclated and the Mel filter banks will be updated during model training. Default value is False

  • trainable_STFT (bool) – Determine if the STFT kenrels are trainable or not. If True, the gradients for STFT kernels will also be caluclated and the STFT kernels will be updated during model training. Default value is False

  • verbose (bool) – If True, it shows layer information. If False, it suppresses all prints

Returns

spectrogram – It returns a tensor of spectrograms. shape = (num_samples, freq_bins,time_steps).

Return type

torch.tensor

Examples

>>> spec_layer = Spectrogram.Gammatonegram()
>>> specs = spec_layer(x)

Methods

__init__

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward