Skops integration: Load tabular classification and regression models from the hub#2126
Conversation
|
All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2126-all-demos |
0a62f8f to
c6285d2
Compare
|
For skops models, instead of reading the sample data from the README, it would also be possible to read the Example: https://huggingface.co/scikit-learn/tabular-playground/blob/main/config.json |
|
as long as |
|
Good point about the config.json vs README @BenjaminBossan @adrinjalali ! Since reading the input data from the README will work for models that were uploaded with and without skops, e.g. (https://huggingface.co/julien-c/wine-quality), I think I will stick with that. |
c6285d2 to
06db951
Compare
| example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]])) | ||
| example_data = example_yaml.get("widget", {}).get("structuredData", {}) | ||
| if not example_data: | ||
| raise ValueError( |
There was a problem hiding this comment.
My reasoning for error-ing if there is not example data in the repo is that without it we'd display a bare dataframe as input and it'd be cumbersome for users to type out all the feature names and inputs. Cumbersome enough that it defeats the shareability of gradio demos.
There was a problem hiding this comment.
Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.
There was a problem hiding this comment.
Heads up: ended up filing the issue we talked about #2155 . Once this is fixed it may be possible to show an empty dataframe and have users type in all the values themselves.
|
|
||
|
|
||
| def get_tabular_examples(model_name) -> Dict[str, List[float]]: | ||
| readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md") |
There was a problem hiding this comment.
We can either get the example data from the README or the config.json but the config.json will only have the example data if the model was uploaded with skops.
I think it would be better if gradio could create a demo for any tabular model and not just those created with skops. Downside is that it introduces a pyyaml dependency.
In the future, once the skops config json file contains richer metadata about feature types (categorical vs null) etc we can read from the config.json if it's present.
There was a problem hiding this comment.
it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton
| ] | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Should cover most of the weirdness from malformed READMEs. Haven't actually come across a repo with bad README data but it's possible.
|
@BenjaminBossan @adrinjalali @merveenoyan Can't officially tag you as a reviewer but feel free to give this a look when you get a chance! |
merveenoyan
left a comment
There was a problem hiding this comment.
Not much of a feedback but to help you out with perspective 🙂
| example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]])) | ||
| example_data = example_yaml.get("widget", {}).get("structuredData", {}) | ||
| if not example_data: | ||
| raise ValueError( |
There was a problem hiding this comment.
Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.
|
|
||
|
|
||
| def test_cols_to_rows(): | ||
| assert cols_to_rows({"a": [1, 2, "NaN"], "b": [1, "NaN", 3]}) == ( |
There was a problem hiding this comment.
Just a small question, if there's a cell left empty, how is it handled? Do you impute "NaN" directly? (how is it sent to inference?)
There was a problem hiding this comment.
Right now it would be left empty but that doesn't work for the inference API so I'll replace with "NaN" instead! Thank you
BenjaminBossan
left a comment
There was a problem hiding this comment.
LGTM overall. I know too little about what we can expect from the input data to judge whether cols_to_rows and rows_to_cols cover all edge cases, but I couldn't spot anything obviously incorrect.
merveenoyan
left a comment
There was a problem hiding this comment.
Thanks a lot for this PR, I will make sure this is well adopted 🙂❤️
|
|
||
|
|
||
| def get_tabular_examples(model_name) -> Dict[str, List[float]]: | ||
| readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md") |
There was a problem hiding this comment.
it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton
|
@freddyaboulton when would this get merged? I'm planning to do a blog post on skops want to include gradio :) |
|
@merveenoyan just merged! Looking forward to the post :) |
Description
Ability to load tabular classification and regression models from the hub and turn it into a demo.
Closes: #2015
Tabular Regression
Users can edit the dataframe to get new predictions
Tabular classification
Demo with missing data in the input widget
What happens when api request fails
Checklist: