Skip to content

Skops integration: Load tabular classification and regression models from the hub#2126

Merged
freddyaboulton merged 7 commits into
mainfrom
2015-load-skops-models
Sep 6, 2022
Merged

Skops integration: Load tabular classification and regression models from the hub#2126
freddyaboulton merged 7 commits into
mainfrom
2015-load-skops-models

Conversation

@freddyaboulton
Copy link
Copy Markdown
Collaborator

@freddyaboulton freddyaboulton commented Aug 30, 2022

Description

Ability to load tabular classification and regression models from the hub and turn it into a demo.

Closes: #2015

Tabular Regression

import gradio as gr

gr.Interface.load("models/skops-ci/test-3255bd22-bb75-4641-8655-824cd25d140f").launch()

Users can edit the dataframe to get new predictions

tabular_regression_demo

Tabular classification

import gradio as gr

gr.Interface.load("models/scikit-learn/tabular-playground").launch()

image

Demo with missing data in the input widget

import gradio as gr

gr.Interface.load("models/demo-org/tabular-playground").launch()

image

What happens when api request fails

image

Checklist:

  • I have performed a self-review of my own code
  • My code follows the style guidelines of this project
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@github-actions
Copy link
Copy Markdown
Contributor

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2126-all-demos

@BenjaminBossan
Copy link
Copy Markdown

BenjaminBossan commented Aug 31, 2022

For skops models, instead of reading the sample data from the README, it would also be possible to read the config.json, which also contains the sample data (or is there any reason not to @adrinjalali?). That way, you can get rid of the regex and pyyaml dependency.

Example: https://huggingface.co/scikit-learn/tabular-playground/blob/main/config.json

@adrinjalali
Copy link
Copy Markdown

as long as skops is concerned, we only deal with the config.json file, so we don't have pyyaml and regex work in it. Those are only in the README file since the widget expects them.

@freddyaboulton
Copy link
Copy Markdown
Collaborator Author

Good point about the config.json vs README @BenjaminBossan @adrinjalali ! Since reading the input data from the README will work for models that were uploaded with and without skops, e.g. (https://huggingface.co/julien-c/wine-quality), I think I will stick with that.

@freddyaboulton freddyaboulton force-pushed the 2015-load-skops-models branch from c6285d2 to 06db951 Compare August 31, 2022 22:48
Comment thread gradio/external.py
example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]]))
example_data = example_yaml.get("widget", {}).get("structuredData", {})
if not example_data:
raise ValueError(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for error-ing if there is not example data in the repo is that without it we'd display a bare dataframe as input and it'd be cumbersome for users to type out all the feature names and inputs. Cumbersome enough that it defeats the shareability of gradio demos.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: ended up filing the issue we talked about #2155 . Once this is fixed it may be possible to show an empty dataframe and have users type in all the values themselves.

Comment thread gradio/external.py


def get_tabular_examples(model_name) -> Dict[str, List[float]]:
readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either get the example data from the README or the config.json but the config.json will only have the example data if the model was uploaded with skops.

I think it would be better if gradio could create a demo for any tabular model and not just those created with skops. Downside is that it introduces a pyyaml dependency.

In the future, once the skops config json file contains richer metadata about feature types (categorical vs null) etc we can read from the config.json if it's present.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton

Comment thread test/test_external.py
]


@pytest.mark.parametrize(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cover most of the weirdness from malformed READMEs. Haven't actually come across a repo with bad README data but it's possible.

@freddyaboulton freddyaboulton marked this pull request as ready for review September 1, 2022 14:27
@freddyaboulton
Copy link
Copy Markdown
Collaborator Author

@BenjaminBossan @adrinjalali @merveenoyan Can't officially tag you as a reviewer but feel free to give this a look when you get a chance!

Copy link
Copy Markdown
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much of a feedback but to help you out with perspective 🙂

Comment thread gradio/external.py
example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]]))
example_data = example_yaml.get("widget", {}).get("structuredData", {})
if not example_data:
raise ValueError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.

Comment thread test/test_external.py


def test_cols_to_rows():
assert cols_to_rows({"a": [1, 2, "NaN"], "b": [1, "NaN", 3]}) == (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small question, if there's a cell left empty, how is it handled? Do you impute "NaN" directly? (how is it sent to inference?)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it would be left empty but that doesn't work for the inference API so I'll replace with "NaN" instead! Thank you

Copy link
Copy Markdown

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. I know too little about what we can expect from the input data to judge whether cols_to_rows and rows_to_cols cover all edge cases, but I couldn't spot anything obviously incorrect.

Copy link
Copy Markdown
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR, I will make sure this is well adopted 🙂❤️

Comment thread gradio/external.py


def get_tabular_examples(model_name) -> Dict[str, List[float]]:
readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton

@merveenoyan
Copy link
Copy Markdown
Contributor

@freddyaboulton when would this get merged? I'm planning to do a blog post on skops want to include gradio :)

@freddyaboulton freddyaboulton merged commit eb81fa2 into main Sep 6, 2022
@freddyaboulton freddyaboulton deleted the 2015-load-skops-models branch September 6, 2022 13:15
@freddyaboulton
Copy link
Copy Markdown
Collaborator Author

@merveenoyan just merged! Looking forward to the post :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Gradio Demo from skops pipeline

4 participants