Correlation Coefficient Calculation

I ran the `test_pretrained.py` script to calculate the correlation coefficient on a validation sample, and got `0.5963` as expected. However, when I inspected the target and predictions, the shapes were each `(896, 5313)`, i.e. missing the batch dimension. The `pearson_corr_coef` function computes similarity over `dim=1`, so the calculated number `0.5963` is actually a measure of correlation over the different cell lines, rather than over the track positions per cell line. When you unsqueeze the batch dimension, then the correlation is calculated over track positions, and yields a value of `0.4721`. This is the way that Enformer reports correlation, so does it make sense to update the README and `test_pretrained.py` with this procedure? Also, were the reported correlation coefficients `0.625` and `0.65` on the train/test sets calculated on samples with missing batch dimension? If so, a recalculation would be necessary. Am I missing something?

Here is the modified `test_pretrained.py` script I have used:

```py
import torch
from enformer_pytorch import Enformer

enformer = Enformer.from_pretrained('EleutherAI/enformer-official-rough').cuda()
enformer.eval()

data = torch.load('./data/test-sample.pt')
seq, target = data['sequence'].cuda(), data['target'].cuda()
print(seq.shape) # torch.Size([131072, 4])
print(target.shape) # torch.Size([896, 5313])
seq = seq.unsqueeze(0)
target = target.unsqueeze(0)

# Note: you will find prediction shape is also `torch.Size([896, 5313])`.

with torch.no_grad():
    corr_coef = enformer(
        seq,
        target = target,
        return_corr_coef = True,
        head = 'human'
    )

print(corr_coef) # tensor([0.4721], device='cuda:0')
assert corr_coef > 0.1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correlation Coefficient Calculation #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Correlation Coefficient Calculation #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions