A Guide on Configuring a Gradio Queue for High-Volume Traffic by abidlabs · Pull Request #2558 · gradio-app/gradio

abidlabs · 2022-10-28T05:33:36Z

Adds a Guide on Setting Up a Gradio Demo for Maximum Performance (closes: Write an Advanced Guide about Queue, Concurrency, Batching in Gradio #2016)
Renames the Guides to remove the parentheses

github-actions · 2022-10-28T05:35:30Z

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2558-all-demos

… performance-guide

freddyaboulton

Awesome guide @abidlabs ! I love the structure and concrete recommendations. Noticed a couple of typos but that's about it.

Is it worth mentioning api_open in the sense that "closing" the api may help with scaling by making sure the queue isn't skipped?

freddyaboulton · 2022-11-04T20:35:47Z

+
+If you write your function to process a batch of samples, Gradio will automatically batch incoming requests together and pass them into your function as a batch of samples. You need to set `batch` to `True` (by default it is `False`) and set a `max_batch_size` (by default it is `4`) based on the maximum number of samples your function is able to handle. These two parameters can be passed into  `gr.Interface()` or to an event in Blocks such as `.click()`. 
+
+While setting a batch is conceptually similar to having workers process requests in parallel, it is often *faster* than setting the `concurrency_count` for deep learning models. The downside is that you might need to adapt your function a little bit to accept batches of samples instead of individual samples. 


Do we have an answer as to whether concurrency_count plays well on gpus?

I wonder if we should mention that concurrency_count is better suited to IO-bound demos and batch_size is better suited to CPU/GPU bound demos. You kind of hint at that below where you say that high batch size likely means that concurrency count should be set to 1.

Checking this right now

I assigned a GPU to this Space and it seems to work just fine: https://huggingface.co/spaces/abidlabs/image-classifier. Specifically, I was able to process requests in parallel and cut the latency in half on average by using concurrency_count=2!

A user just needs to keep in mind that their GPU memory might be different than their CPU memory so they need to ensure that multiple workers will not OOM their GPU. I'll add a note in the hardware section

abidlabs · 2022-11-04T23:49:02Z

Thanks for the review @freddyaboulton! Will fix the typos and add a section on api_open as well as we've seen at least one concrete example of that being an issue

abidlabs · 2022-11-05T00:10:00Z

@aliabd I'm getting a weird behavior when I try to add my Guide to the Interface class and build the website locally: only this Guide is showing up underneath the docs for the Interface class:

What's weird is that if I remove my additional guide from the Interface docstring (in other words, revert all changes to the interface.py file), not a single Guide shows up:

Even though the Interface class lists 6 guides here:

    Guides: quickstart, key_features, sharing_your_app, interface_state, reactive_interfaces, advanced_interface_features

Do you know what might be going on?

omerXfaruq · 2022-11-07T12:51:51Z

+
+So why not set this parameter much higher? Keep in mind that since requests are processed in parallel, each request will consume memory to store the data and weights for processing. This means that you might get out-of-memory errors if you increase the the `concurrency_count` too high.
+
+**Recommendation**: Increase the `concurrency_count` parameter as high as you can until you hit memory limits on your machine. You can [read about Hugging Face Spaces machine specs here](https://huggingface.co/docs/hub/spaces-overview). 


Hello @abidlabs, a beautiful guide as always, good work!

IMO this recommendation is not a good one, because increasing concurrency does not directly translate into performance due to various reasons as costs associated with context switching and GIL limitations. So I would suggest to change it to smt like,

'Increase concurrency as long as you see improvement in the throughput.'

Ah I see, thanks for the suggestion @farukozderim! I'll update the Guide to reflect that

renamed guides and created new guide

c8c60cd

abidlabs added 14 commits November 3, 2022 14:16

rename merge

fbdc59f

changelog

e40792e

more

590f5c2

more text

7bb5bf6

Merge branch 'main' into performance-guide

02473fa

Merge branch 'main' into performance-guide

4d6b52f

Merge branch 'performance-guide' of github.com:gradio-app/gradio into…

0df6bcd

… performance-guide

conc

bc3f2cb

conc

565da1f

max performance

79f8f02

guide

83123c0

finish guide

d1a1892

guide

e2c7adc

tweaks

84b3485

abidlabs marked this pull request as ready for review November 4, 2022 20:17

abidlabs requested review from aliabd, aliabid94 and freddyaboulton November 4, 2022 20:17

freddyaboulton approved these changes Nov 4, 2022

View reviewed changes

tweaks

2274f71

abidlabs added 3 commits November 4, 2022 17:29

Merge branch 'main' into performance-guide

6176e4b

guide

4fed863

guide

bba198a

abidlabs merged commit 05dcd87 into main Nov 6, 2022

abidlabs deleted the performance-guide branch November 6, 2022 01:36

omerXfaruq reviewed Nov 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Guide on Configuring a Gradio Queue for High-Volume Traffic#2558

A Guide on Configuring a Gradio Queue for High-Volume Traffic#2558
abidlabs merged 19 commits into
mainfrom
performance-guide

abidlabs commented Oct 28, 2022 •

edited

Loading

Uh oh!

github-actions Bot commented Oct 28, 2022

Uh oh!

freddyaboulton left a comment

Uh oh!

Uh oh!

Uh oh!

freddyaboulton Nov 4, 2022

Uh oh!

abidlabs Nov 4, 2022

Uh oh!

abidlabs Nov 6, 2022

Uh oh!

Uh oh!

abidlabs commented Nov 4, 2022

Uh oh!

abidlabs commented Nov 5, 2022

Uh oh!

omerXfaruq Nov 7, 2022 •

edited

Loading

Uh oh!

abidlabs Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		If you write your function to process a batch of samples, Gradio will automatically batch incoming requests together and pass them into your function as a batch of samples. You need to set `batch` to `True` (by default it is `False`) and set a `max_batch_size` (by default it is `4`) based on the maximum number of samples your function is able to handle. These two parameters can be passed into `gr.Interface()` or to an event in Blocks such as `.click()`.

		While setting a batch is conceptually similar to having workers process requests in parallel, it is often faster than setting the `concurrency_count` for deep learning models. The downside is that you might need to adapt your function a little bit to accept batches of samples instead of individual samples.


		So why not set this parameter much higher? Keep in mind that since requests are processed in parallel, each request will consume memory to store the data and weights for processing. This means that you might get out-of-memory errors if you increase the the `concurrency_count` too high.

		Recommendation: Increase the `concurrency_count` parameter as high as you can until you hit memory limits on your machine. You can [read about Hugging Face Spaces machine specs here](https://huggingface.co/docs/hub/spaces-overview).

Conversation

abidlabs commented Oct 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Oct 28, 2022

Uh oh!

freddyaboulton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

freddyaboulton Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

abidlabs Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

abidlabs Nov 6, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abidlabs commented Nov 4, 2022

Uh oh!

abidlabs commented Nov 5, 2022

Uh oh!

omerXfaruq Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abidlabs Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abidlabs commented Oct 28, 2022 •

edited

Loading

omerXfaruq Nov 7, 2022 •

edited

Loading