[bugfix] firered conv2dsubsampling4 and transformer cmvn for padded inputs by shen9712 · Pull Request #2806 · wenet-e2e/wenet

shen9712 · 2026-01-21T03:07:43Z

Incorrect mask computation in conv2dsubsampling4

Current implementation:

mask = x_mask[:, :, :-2:2][:, :, :-2:2]

This slicing-based mask downsampling is inaccurate and does not correctly reflect the actual output lengths of conv2d subsampling.

I propose to recompute the mask from sequence lengths instead:

x_lens = torch.floor((torch.floor((x_lens - 1) / 2) - 1) / 2).to(x_lens.dtype)
mask = make_non_pad_mask(x_lens).unsqueeze(1)

This matches the real length transformation of conv2dsubsampling4 and avoids accumulated alignment errors.

CMVN is applied to padded positions before convolution

Current code:

if self.global_cmvn is not None:
    xs = self.global_cmvn(xs)
xs, pos_emb, masks = self.embed(xs, masks)

Here CMVN is applied to padded frames. Since the embedding module contains convolution with right context, padded positions (after CMVN) will leak into valid frames, leading to incorrect features.

CMVN should ignore padded positions, or padded frames should be explicitly zeroed after CMVN before convolution.

if self.global_cmvn is not None:
    xs = self.global_cmvn(xs)
xs = xs * masks.transpose(1, 2)
xs, pos_emb, masks = self.embed(xs, masks)

shen9712 and others added 5 commits January 21, 2026 10:56

[firered] fix conv2dsubsampling4 mask compute

170c043

[transformer] fix cmvn for padded input

eca4a6a

[firered] remove whitespace

b0a09f7

Update openai-whisper version specification

2965908

Update torch and torchaudio versions in requirements

777b3ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] firered conv2dsubsampling4 and transformer cmvn for padded inputs#2806

[bugfix] firered conv2dsubsampling4 and transformer cmvn for padded inputs#2806
shen9712 wants to merge 5 commits intowenet-e2e:mainfrom
shen9712:main

shen9712 commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shen9712 commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant