Skops integration: Load tabular classification and regression models from the hub by freddyaboulton · Pull Request #2126 · gradio-app/gradio

freddyaboulton · 2022-08-30T17:20:23Z

Description

Ability to load tabular classification and regression models from the hub and turn it into a demo.

Closes: #2015

Tabular Regression

import gradio as gr

gr.Interface.load("models/skops-ci/test-3255bd22-bb75-4641-8655-824cd25d140f").launch()

Users can edit the dataframe to get new predictions

Tabular classification

import gradio as gr

gr.Interface.load("models/scikit-learn/tabular-playground").launch()

Demo with missing data in the input widget

import gradio as gr

gr.Interface.load("models/demo-org/tabular-playground").launch()

What happens when api request fails

Checklist:

I have performed a self-review of my own code
My code follows the style guidelines of this project
I have commented my code in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

github-actions · 2022-08-30T17:22:33Z

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2126-all-demos

BenjaminBossan · 2022-08-31T09:42:13Z

For skops models, instead of reading the sample data from the README, it would also be possible to read the config.json, which also contains the sample data (or is there any reason not to @adrinjalali?). That way, you can get rid of the regex and pyyaml dependency.

Example: https://huggingface.co/scikit-learn/tabular-playground/blob/main/config.json

adrinjalali · 2022-08-31T13:51:09Z

as long as skops is concerned, we only deal with the config.json file, so we don't have pyyaml and regex work in it. Those are only in the README file since the widget expects them.

freddyaboulton · 2022-08-31T14:05:10Z

Good point about the config.json vs README @BenjaminBossan @adrinjalali ! Since reading the input data from the README will work for models that were uploaded with and without skops, e.g. (https://huggingface.co/julien-c/wine-quality), I think I will stick with that.

freddyaboulton · 2022-09-01T14:19:32Z

+        example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]]))
+        example_data = example_yaml.get("widget", {}).get("structuredData", {})
+    if not example_data:
+        raise ValueError(


My reasoning for error-ing if there is not example data in the repo is that without it we'd display a bare dataframe as input and it'd be cumbersome for users to type out all the feature names and inputs. Cumbersome enough that it defeats the shareability of gradio demos.

Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.

Heads up: ended up filing the issue we talked about #2155 . Once this is fixed it may be possible to show an empty dataframe and have users type in all the values themselves.

freddyaboulton · 2022-09-01T14:23:12Z



+def get_tabular_examples(model_name) -> Dict[str, List[float]]:
+    readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md")


We can either get the example data from the README or the config.json but the config.json will only have the example data if the model was uploaded with skops.

I think it would be better if gradio could create a demo for any tabular model and not just those created with skops. Downside is that it introduces a pyyaml dependency.

In the future, once the skops config json file contains richer metadata about feature types (categorical vs null) etc we can read from the config.json if it's present.

it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton

freddyaboulton · 2022-09-01T14:25:18Z

+        ]
+
+
+@pytest.mark.parametrize(


Should cover most of the weirdness from malformed READMEs. Haven't actually come across a repo with bad README data but it's possible.

freddyaboulton · 2022-09-01T14:29:09Z

@BenjaminBossan @adrinjalali @merveenoyan Can't officially tag you as a reviewer but feel free to give this a look when you get a chance!

merveenoyan

Not much of a feedback but to help you out with perspective 🙂

merveenoyan · 2022-09-01T14:40:26Z

+        example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]]))
+        example_data = example_yaml.get("widget", {}).get("structuredData", {})
+    if not example_data:
+        raise ValueError(


Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.

merveenoyan · 2022-09-01T14:44:54Z



+def test_cols_to_rows():
+    assert cols_to_rows({"a": [1, 2, "NaN"], "b": [1, "NaN", 3]}) == (


Just a small question, if there's a cell left empty, how is it handled? Do you impute "NaN" directly? (how is it sent to inference?)

Right now it would be left empty but that doesn't work for the inference API so I'll replace with "NaN" instead! Thank you

BenjaminBossan

LGTM overall. I know too little about what we can expect from the input data to judge whether cols_to_rows and rows_to_cols cover all edge cases, but I couldn't spot anything obviously incorrect.

merveenoyan

Thanks a lot for this PR, I will make sure this is well adopted 🙂❤️

merveenoyan · 2022-09-05T09:04:32Z



+def get_tabular_examples(model_name) -> Dict[str, List[float]]:
+    readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md")


it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton

merveenoyan · 2022-09-06T12:49:11Z

@freddyaboulton when would this get merged? I'm planning to do a blog post on skops want to include gradio :)

freddyaboulton · 2022-09-06T13:16:04Z

@merveenoyan just merged! Looking forward to the post :)

freddyaboulton mentioned this pull request Aug 30, 2022

Create Gradio Demo from skops pipeline #2015

Closed

1 task

freddyaboulton force-pushed the 2015-load-skops-models branch from 0a62f8f to c6285d2 Compare August 30, 2022 18:57

freddyaboulton added 2 commits August 31, 2022 16:54

MVP of skops integration

4566c86

Add unit tests

06db951

freddyaboulton force-pushed the 2015-load-skops-models branch from c6285d2 to 06db951 Compare August 31, 2022 22:48

One more case

aae0305

freddyaboulton commented Sep 1, 2022

View reviewed changes

freddyaboulton requested review from abidlabs and dawoodkhan82 September 1, 2022 14:26

freddyaboulton marked this pull request as ready for review September 1, 2022 14:27

merveenoyan reviewed Sep 1, 2022

View reviewed changes

freddyaboulton added 2 commits September 1, 2022 13:14

Fix NaNs in widget data

798fbcb

Remove breakpoint

c0b4265

freddyaboulton mentioned this pull request Sep 1, 2022

Dataframe column headers are reset when you add a new column #2155

Closed

1 task

Fix typo

54c5424

BenjaminBossan approved these changes Sep 2, 2022

View reviewed changes

merveenoyan approved these changes Sep 5, 2022

View reviewed changes

Merge branch 'main' into 2015-load-skops-models

6a29df7

freddyaboulton merged commit eb81fa2 into main Sep 6, 2022

freddyaboulton deleted the 2015-load-skops-models branch September 6, 2022 13:15



		def get_tabular_examples(model_name) -> Dict[str, List[float]]:
		readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md")



		def test_cols_to_rows():
		assert cols_to_rows({"a": [1, 2, "NaN"], "b": [1, "NaN", 3]}) == (

		]


		@pytest.mark.parametrize(

Conversation

freddyaboulton commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tabular Regression

Tabular classification

Demo with missing data in the input widget

What happens when api request fails

Checklist:

Uh oh!

github-actions Bot commented Aug 30, 2022

Uh oh!

BenjaminBossan commented Aug 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Aug 31, 2022

Uh oh!

freddyaboulton commented Aug 31, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

freddyaboulton commented Sep 1, 2022

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merveenoyan commented Sep 6, 2022

Uh oh!

freddyaboulton commented Sep 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

freddyaboulton commented Aug 30, 2022 •

edited

Loading

BenjaminBossan commented Aug 31, 2022 •

edited

Loading