Distinguishing model failure vs task inadmissibility in evaluation #1089
Replies: 2 comments
-
|
I would treat those as separate outcomes. A model failure says the system had a well-posed task and produced the wrong or low-quality result. Task inadmissibility says the evaluation target itself was not coherent enough to judge meaningfully. |
Beta Was this translation helpful? Give feedback.
-
|
Great point, @aniruddhaadak80. I’m actually working on similar evaluation patterns over at my profile (github.com), and separating 'model failure' from 'task inadmissibility' has been a game-changer for debugging prompts. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In some evaluation workflows, a task can become ill-posed rather than simply failed.
For example:
Do you distinguish between model failure and task inadmissibility during evaluation?
I’m curious whether this is tracked explicitly in Kiln workflows or treated as a regular failure.
Beta Was this translation helpful? Give feedback.
All reactions