Distinguishing model failure vs task inadmissibility in evaluation #1089

finkeissen · 2026-02-26T12:06:07Z

finkeissen
Feb 26, 2026

In some evaluation workflows, a task can become ill-posed rather than simply failed.

For example:

missing or contradictory context,
ambiguous labeling criteria,
evaluation signals that cannot be interpreted meaningfully.

Do you distinguish between model failure and task inadmissibility during evaluation?

I’m curious whether this is tracked explicitly in Kiln workflows or treated as a regular failure.

aniruddhaadak80 · 2026-03-10T05:44:09Z

aniruddhaadak80
Mar 10, 2026

I would treat those as separate outcomes. A model failure says the system had a well-posed task and produced the wrong or low-quality result. Task inadmissibility says the evaluation target itself was not coherent enough to judge meaningfully.nnIf both end up in the same bucket, the signal gets noisy very quickly. You lose the ability to distinguish model weakness from dataset or rubric weakness, which matters a lot when people start making tuning decisions from the evaluation results.

0 replies

finkeissen · 2026-03-14T12:28:00Z

finkeissen
Mar 14, 2026
Author

Great point, @aniruddhaadak80. I’m actually working on similar evaluation patterns over at my profile (github.com), and separating 'model failure' from 'task inadmissibility' has been a game-changer for debugging prompts.
In Kiln, having an explicit 'Inadmissible' label would prevent these cases from unfairly dragging down model performance metrics. Is this something that could be added to the Kiln workflow or schema?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguishing model failure vs task inadmissibility in evaluation #1089

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Distinguishing model failure vs task inadmissibility in evaluation #1089

Uh oh!

finkeissen Feb 26, 2026

Replies: 2 comments

Uh oh!

aniruddhaadak80 Mar 10, 2026

Uh oh!

finkeissen Mar 14, 2026 Author

finkeissen
Feb 26, 2026

aniruddhaadak80
Mar 10, 2026

finkeissen
Mar 14, 2026
Author