Skip to content

PandasParallelLFApplier does not preserve the order of the rows #1524

@hcvazquez

Description

@hcvazquez

Issue description

I'm using PandasParallelLFApplier to apply labeling functions to a pandas dataframe with 5000 rows.

Code example/repro steps

Using PandasParallelLFApplier

# Apply the LFs to the unlabeled training data
applier = PandasParallelLFApplier(lfs)
topic_labeling = applier.apply(df[:5000])
topic_labeling
output: array([
       [-1, -1, -1, -1],
       [-1, -1, -1, -1],
       [-1, -1, -1, -1],
       ...,
       [-1, -1, -1, -1],
       [-1, -1, -1, -1],
       [-1, -1, -1,  1]])

Same code using PandasLFApplier

output: array([
       [-1, -1, -1, -1],
       [-1,  1, -1, -1],
       [-1, -1, -1, -1],
       ...,
       [-1, -1, -1, -1],
       [-1, -1, -1, -1],
       [-1, -1, -1,  1]])

Second row is different.

Expected behavior

I would expect the same result for both. Labeling coverage and overlaps is the same for both. Because of that the problem has to be the order of the rows.

System info

  • How you installed Snorkel (conda, pip, source):
  • Build command you used (if compiling from source):
  • OS:
  • Python version: 3.6.8
  • Snorkel version: 0.9.3
  • Versions of any other relevant libraries:
    dask==2.8.1
    pandas==0.25.3
    numpy==1.16.4

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions