Add cancel() method to interrupt a stream#733
Add cancel() method to interrupt a stream#733simonchatts wants to merge 1 commit intoabetlen:mainfrom
Conversation
|
please accept this pr @abetlen |
|
Actually.. I found an issue with this method.. this will only cancel after a token is generated but if the llm is slow or gets stuck processing the prompt, this doesn't cancel it.. We need a better method. |
|
I'm coming back to this because I need to figure out a better method to interrupt the generation programmatically.. For a console-based scenario it's pretty easy in python, all I have to do is surround the code with try except KeyboardInterrupt: .. then I can just press ctrl+c at any point to gracefully interrupt the llm.. But.. if I'm using a front-end user interface, I haven't managed to make it work properly let's say with a button "Stop generating" that can call a python function.. because of the issue I mentioned in the previous post.. @abetlen sorry to bother again but do you have any suggestions/ideas on how to accomplish this? |
8c93cf8 to
cc0fe43
Compare
|
Why not add it now and improve if there is a better solution. For now this would work in most cases. |
|
has anyone found a reasonable solution for this? Or am I the only one not willing to wait until the model finishes without killing the job and losing context? |
|
Any chance this gets merged for now? |
|
It indeed blocks until the first token is produced, but cancelling it after that is trivial. The other similar issue is cancelling a model that is loading. |
|
gpt4all python bindings offer a similar way which allows stopping with the next token |
|
+1 can we merge this? |
|
Take a look at ggml-org/llama.cpp#10509 which should permanently solve this problem on lcpp's side |
Fixes #599.
Thanks for all your work on this project!