Issue description
I'm using PandasParallelLFApplier to apply labeling functions to a pandas dataframe with 5000 rows.
Code example/repro steps
Using PandasParallelLFApplier
# Apply the LFs to the unlabeled training data
applier = PandasParallelLFApplier(lfs)
topic_labeling = applier.apply(df[:5000])
topic_labeling
output: array([
[-1, -1, -1, -1],
[-1, -1, -1, -1],
[-1, -1, -1, -1],
...,
[-1, -1, -1, -1],
[-1, -1, -1, -1],
[-1, -1, -1, 1]])
Same code using PandasLFApplier
output: array([
[-1, -1, -1, -1],
[-1, 1, -1, -1],
[-1, -1, -1, -1],
...,
[-1, -1, -1, -1],
[-1, -1, -1, -1],
[-1, -1, -1, 1]])
Second row is different.
Expected behavior
I would expect the same result for both. Labeling coverage and overlaps is the same for both. Because of that the problem has to be the order of the rows.
System info
- How you installed Snorkel (conda, pip, source):
- Build command you used (if compiling from source):
- OS:
- Python version: 3.6.8
- Snorkel version: 0.9.3
- Versions of any other relevant libraries:
dask==2.8.1
pandas==0.25.3
numpy==1.16.4
Issue description
I'm using PandasParallelLFApplier to apply labeling functions to a pandas dataframe with 5000 rows.
Code example/repro steps
Using PandasParallelLFApplier
Same code using PandasLFApplier
Second row is different.
Expected behavior
I would expect the same result for both. Labeling coverage and overlaps is the same for both. Because of that the problem has to be the order of the rows.
System info
dask==2.8.1
pandas==0.25.3
numpy==1.16.4