IJCNN 2021 Demo Page

This demo page is for the paper Revisiting Onsets and Frames Model with Additive Attention. High resolution figures and the audio samples for the transcription results can be found here. Source code for the paper is available at https://github.com/KinWaiCheuk/IJCNN2021.github.io

Model Archetecture

Left: Onsets and Frames model with an additive attention mechanism
Right: Linear model with an additive attention mechanism

For Onsets and Frames model, the attention mechanism attends to only one of the three features: $\boldsymbol{x_{\text{spec}}}$ or $\hat{\boldsymbol{y}}_{\text{onset}}$ or $\boldsymbol{\hat{y}_{\text{feat}}}$

Transcription Results

The transcription results corresponding to the four sample spectrograms above are shown here. Piano rolls generated by the model is converted to midi files, and the WAV files are rendered from the midi files using Garritan Personal Orchestra: Concert D Grand Piano

Original Audio

Ground Truth 1:
Ground Truth 2:
Ground Truth 3:
Ground Truth 4:

Onsets & Frames Model

w/ Everything w/o BiLSTM w/o Inference w/o $F_{\text{onset}}$
Spec1:
Spec2:
Spec3:
Spec4:









Onsets & Frames Model with Additive Attention

w/ Everything w/o BiLSTM w/o Inference w/o $F_{\text{onset}}$
Spec1:
Spec2:
Spec3:
Spec4:









Linear Model

$D=5$ w/ inference $D=5$ w/o inference $D=0$ w/ inference $D=0$ w/o inference
Spec1:
Spec2:
Spec3:
Spec4:









Attention Maps

Onsets and Frames Model with Attention D=30

This is Figure 2 in the paper. Right click and view each image in full resolution in the new tab.

Row 1: Attedning on $\boldsymbol{x_{\text{spec}}} \in [0,1]^{T\times N}$
Row 2: Attedning on $\boldsymbol{\hat{y}_{\text{onset}}} \in [0,1]^{T\times 88}$
Row 3: Attedning on $\boldsymbol{\hat{y}_{\text{feat}}} \in [0,1]^{T\times 88}$

Onsets and Frames Model with Varying Attention Size

From top row to bottom row: D=60, D=30, D=20, D=5

Linear Model with Varying Attention Size

From top row to bottom row: D=60, D=30, D=20, D=5