data leakage problem in your model

The design of your adjacency matrix `adj_mats_orig` and the way you split the train/test set will cause a huge **data leakage problem** in your training, because your validation and test set is created independently for `gene_adj` and `gene_adj.transpose(copy=True)`, and therefore the edges from the validation / test set in `gene_adj` is actually included in the training set of `gene_adj.transpose(copy=True)`.

Same problem goes for the train / validate set between `gene_drug_adj` and `drug_gene_adj`. The validation edges from `gene_drug_adj` are actually used for training  in `drug_gene_adj`, and vise versa. 

Could you please clarify?
Thanks!

_Originally posted by @hurleyLi in https://github.com/marinkaz/decagon/issues/7#issuecomment-519645774_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data leakage problem in your model #9

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

data leakage problem in your model #9

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions