How to increase Inference speed #2089
Unanswered
Dammerzone
asked this question in
Q&A
Replies: 1 comment
-
|
did you find the solution? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello guys,
I'm working on a local version of the last Gemma 3 collab on a Jetson AGX Orin card with 64Go of memory.
Everything is working fine and with only 17Go of reserved VRAM but the token generation is quite slow even if I used the for_inference() method.
My question is: Is there a way to speed up the token generation by increasing the VRAM allocation as I only use 17Go on my 61 available ?
Here is how I load my model
Beta Was this translation helpful? Give feedback.
All reactions