I've been experimenting with llm-subtrans and love the results so far.
My current problem is, that I often use that kind of software when I need subtitles for hard to find foreign movies or TV shows.
Sometimes they may be Chinese or Korean, other times I may have subtitles from a dub in a different foreign language.
Often times that means my subtitles will be generated via OCR or from audio using something like whisper.
That means they won't be perfect.
The way I've been working around this by changing the instructions for the LLM to ignore lines with clear misidentifactions by the OCR software or Whisper. For example, when OCR subs in a different script, I tell it to return no translation for non-Roman lettering.
However, empty lines obviously trigger retries. I'd love for a flag to enable not retrying on empty line or on a special character (or string). If I can instruct the LLM to respond with "", I can either filter those very easily afterwards or you could do so yourself.
I understand that I could write my own code to pre-process some of those easier to handle cases, but just letting the LLM do it has been working for me too.
I've been experimenting with llm-subtrans and love the results so far.
My current problem is, that I often use that kind of software when I need subtitles for hard to find foreign movies or TV shows.
Sometimes they may be Chinese or Korean, other times I may have subtitles from a dub in a different foreign language.
Often times that means my subtitles will be generated via OCR or from audio using something like whisper.
That means they won't be perfect.
The way I've been working around this by changing the instructions for the LLM to ignore lines with clear misidentifactions by the OCR software or Whisper. For example, when OCR subs in a different script, I tell it to return no translation for non-Roman lettering.
However, empty lines obviously trigger retries. I'd love for a flag to enable not retrying on empty line or on a special character (or string). If I can instruct the LLM to respond with "", I can either filter those very easily afterwards or you could do so yourself.
I understand that I could write my own code to pre-process some of those easier to handle cases, but just letting the LLM do it has been working for me too.