Issue description
We have a dataset with records which will be either have one label or multiple labels.
To verify the label model predictions, we filtered out from the original data, the records with only one label. Doing labelmodel.fit on the single-labelled data was giving accuracy of more than 90%.
But when we did labelmodel.fit on the whole data the above accuracy for singlelabelled datapoints dropped drastically to 30%.
Code example/repro steps
i was able to reproduce the bug with some generated label matrix https://github.com/srimugunthan/snorkeldebugging/blob/master/snorkeldebug.ipynb
Although here the accuracy drop in the generated data is not drastic, it illustrates the scenario
Expected behavior
the subset of data with single labels should have the same accuracy.
System info
used snorkel 0.9.3 on linux
Issue description
We have a dataset with records which will be either have one label or multiple labels.
To verify the label model predictions, we filtered out from the original data, the records with only one label. Doing labelmodel.fit on the single-labelled data was giving accuracy of more than 90%.
But when we did labelmodel.fit on the whole data the above accuracy for singlelabelled datapoints dropped drastically to 30%.
Code example/repro steps
i was able to reproduce the bug with some generated label matrix https://github.com/srimugunthan/snorkeldebugging/blob/master/snorkeldebug.ipynb
Although here the accuracy drop in the generated data is not drastic, it illustrates the scenario
Expected behavior
the subset of data with single labels should have the same accuracy.
System info
used snorkel 0.9.3 on linux