This demo page is for the paper Revisiting Onsets and Frames Model with Additive Attention. High resolution figures and the audio samples for the transcription results can be found here. Source code for the paper is available at https://github.com/KinWaiCheuk/IJCNN2021.github.io
Left: Onsets and Frames model with an additive attention mechanism
Right: Linear model with an additive attention mechanism
For Onsets and Frames model, the attention mechanism attends to only one of the three features: $\boldsymbol{x_{\text{spec}}}$ or $\hat{\boldsymbol{y}}_{\text{onset}}$ or $\boldsymbol{\hat{y}_{\text{feat}}}$

The transcription results corresponding to the four sample spectrograms above are shown here. Piano rolls generated by the model is converted to midi files, and the WAV files are rendered from the midi files using Garritan Personal Orchestra: Concert D Grand Piano
Ground Truth 1:
Ground Truth 2:
Ground Truth 3:
Ground Truth 4:
| w/ Everything | w/o BiLSTM | w/o Inference | w/o $F_{\text{onset}}$ |
| Spec1: Spec2: Spec3: Spec4: |
| w/ Everything | w/o BiLSTM | w/o Inference | w/o $F_{\text{onset}}$ |
| Spec1: Spec2: Spec3: Spec4: |
| $D=5$ w/ inference | $D=5$ w/o inference | $D=0$ w/ inference | $D=0$ w/o inference |
| Spec1: Spec2: Spec3: Spec4: |
This is Figure 2 in the paper. Right click and view each image in full resolution in the new tab.

Row 1: Attedning on $\boldsymbol{x_{\text{spec}}} \in [0,1]^{T\times N}$
Row 2: Attedning on $\boldsymbol{\hat{y}_{\text{onset}}} \in [0,1]^{T\times 88}$
Row 3: Attedning on $\boldsymbol{\hat{y}_{\text{feat}}} \in [0,1]^{T\times 88}$
From top row to bottom row: D=60, D=30, D=20, D=5

From top row to bottom row: D=60, D=30, D=20, D=5
