nnAudio.Spectrogram.Combined_Frequency_Periodicity

class nnAudio.Spectrogram.Combined_Frequency_Periodicity(fr=2, fs=16000, hop_length=320, window_size=2049, fc=80, tc=0.001, g=[0.24, 0.6, 1], NumPerOct=48)

Bases: torch.nn.modules.module.Module

Vectorized version of the code in https://github.com/leo-so/VocalMelodyExtPatchCNN/blob/master/MelodyExt.py. This feature is described in ‘Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music’ https://ieeexplore.ieee.org/document/7118691

Parameters
  • fr (int) – Frequency resolution. The higher the number, the lower the resolution is. Maximum frequency resolution occurs when fr=1. The default value is 2

  • fs (int) – Sample rate of the input audio clips. The default value is 16000

  • hop_length (int) – The hop (or stride) size. The default value is 320.

  • window_size (str) – It is same as n_fft in other Spectrogram classes. The default value is 2049

  • fc (int) – Starting frequency. For example, fc=80 means that Z starts at 80Hz. The default value is 80.

  • tc (int) – Inverse of ending frequency. For example tc=1/8000 means that Z ends at 8000Hz. The default value is 1/8000.

  • g (list) – Coefficients for non-linear activation function. len(g) should be the number of activation layers. Each element in g is the activation coefficient, for example [0.24, 0.6, 1].

  • device (str) – Choose which device to initialize this layer. Default value is ‘cpu’

Returns

  • Z (torch.tensor) – The Combined Frequency and Period Feature. It is equivalent to tfrLF * tfrLQ

  • tfrL0 (torch.tensor) – STFT output

  • tfrLF (torch.tensor) – Frequency Feature

  • tfrLQ (torch.tensor) – Period Feature

Examples

>>> spec_layer = Spectrogram.Combined_Frequency_Periodicity()
>>> Z, tfrL0, tfrLF, tfrLQ = spec_layer(x)

Methods

__init__

Initializes internal Module state, shared by both nn.Module and ScriptModule.

create_logfreq_matrix

forward

nonlinear_func