Skip to content

BUG in the logarithm detection algorithm - unreliable DES score #55

@geroldcsendes

Description

@geroldcsendes

The issue originates from the guess_is_log function of pdex. This function checks whether the sum of preprocessed counts is below 15, under the assumption that e¹⁵ − 1 ≈ 3.26M counts per cell is an unlikely value.

However, the function incorrectly assumes that the sum of counts is log-transformed, whereas in reality, each gene’s count is log-transformed individually.

Example:
Assume your median UMI count per cell is 10k (the challenge uses 50k+). If a cell has 1 count for each gene and is median-normalized, you get 1 normalized count per gene. Applying log1p yields 0.69 for each gene. Summing these up—as guess_is_log does—returns 6931, which is (incorrectly) detected as non-log-transformed data. Consequently, your log-transformed data is treated as non-log-transformed. Even if you submit count data through cell-eval, as cell-eval correctly guesses it is on count level (it uses its own function) and log-normalizes it and pushes it to so to pdex.

Impact:
This will almost certainly distort fold-change estimates—and, crucially, their rankings. This is particularly harmful when your model detects more DEGs than the ground truth, as the top N DEGs (ranked by absolute fold change) will be misordered. Such models are unfairly penalized in the DES score due to this bug.

This problem was reported in previous issues:

We identify that the guess_is_log function is not just not optimal but rather totally inappropriate. Also, this is especially a problem for the VCC challenge, where the cell-eval is used for evaluation and one has no way of directly specifying pdex-kwargs because the submissions are evaluated on the server.

Thanks a lot for looking into our bug report. We will report this to cell-eval as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions