Skip to content

labelmodel.fit on a superset of data changes predictions of subset #1581

@srimugunthan

Description

@srimugunthan

Issue description

We have a dataset with records which will be either have one label or multiple labels.
To verify the label model predictions, we filtered out from the original data, the records with only one label. Doing labelmodel.fit on the single-labelled data was giving accuracy of more than 90%.

But when we did labelmodel.fit on the whole data the above accuracy for singlelabelled datapoints dropped drastically to 30%.

Code example/repro steps

i was able to reproduce the bug with some generated label matrix https://github.com/srimugunthan/snorkeldebugging/blob/master/snorkeldebug.ipynb
Although here the accuracy drop in the generated data is not drastic, it illustrates the scenario

Expected behavior

the subset of data with single labels should have the same accuracy.

System info

used snorkel 0.9.3 on linux

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions