The design of your adjacency matrix adj_mats_orig and the way you split the train/test set will cause a huge data leakage problem in your training, because your validation and test set is created independently for gene_adj and gene_adj.transpose(copy=True), and therefore the edges from the validation / test set in gene_adj is actually included in the training set of gene_adj.transpose(copy=True).
Same problem goes for the train / validate set between gene_drug_adj and drug_gene_adj. The validation edges from gene_drug_adj are actually used for training in drug_gene_adj, and vise versa.
Could you please clarify?
Thanks!
Originally posted by @hurleyLi in #7 (comment)
The design of your adjacency matrix
adj_mats_origand the way you split the train/test set will cause a huge data leakage problem in your training, because your validation and test set is created independently forgene_adjandgene_adj.transpose(copy=True), and therefore the edges from the validation / test set ingene_adjis actually included in the training set ofgene_adj.transpose(copy=True).Same problem goes for the train / validate set between
gene_drug_adjanddrug_gene_adj. The validation edges fromgene_drug_adjare actually used for training indrug_gene_adj, and vise versa.Could you please clarify?
Thanks!
Originally posted by @hurleyLi in #7 (comment)