Skip to content

Bias-Resilient Credit Analysis Agent Using Prompt Baking / Negative Baking #24

@lucaborella89

Description

@lucaborella89

Use Case Description

This use case evaluates a Credit Analysis AI Agent that assists financial institutions in reviewing credit approval memorandums while ensuring that protected attributes do not influence the agent’s reasoning or outputs.

In many lending workflows, analysts produce credit memorandums summarizing borrower risk, financial data, and contextual factors before decisions are reviewed by credit committees. AI agents can support this workflow by:

  • Extracting relevant information
  • Summarizing risk factors
  • Generating structured assessments
  • Supporting credit committee preparation

However, large language model–based agents may inadvertently consider protected attributes (e.g., race, sexual orientation) if those appear in documents, creating fair lending and regulatory risks.

This use case evaluates whether an agent configured with Prompt Baking / Negative Baking techniques can reliably ignore protected attributes while still completing its credit analysis tasks.

Prompt Baking converts runtime instructions into model behavior so the model behaves as if a prompt were permanently present. Negative Baking extends this concept to suppress the influence of specific attributes at the model-behavior level, reducing reliance on runtime guardrails.

ControlPlane - FINOS AI SIG Bread Technology presentation.pdf

Relevance & Business Value

This use case is highly relevant to financial institutions deploying agentic AI in regulated credit decisioning workflows.

Key benefits include:

  • Fair lending risk mitigation when AI assists credit decisions
  • Improved governance defensibility for AI-assisted lending workflows
  • Reduced reliance on fragile prompt guardrails and runtime filtering
  • More consistent agent behavior across credit analysis tasks
  • Ability to produce evidence artifacts for Model Risk Management and compliance review.

It also demonstrates a pattern where bias mitigation is embedded into agent capabilities rather than enforced only at runtime, which may improve robustness against adversarial prompts and workflow changes.

1. Key Risks

Fair Lending Bias: The agent may use protected attributes (e.g., race or sexual orientation) in reasoning or recommendations.

Prompt Injection / Adversarial Prompts: Users may attempt to coerce the agent into considering or revealing sensitive attributes.

Governance Risk: Regulators require evidence that automated decision-support systems do not rely on protected characteristics.

Capability Trade-Offs: Bias mitigation techniques may reduce general model capability or reasoning performance.

2. Proposed Evaluation Metrics/Methods

Evaluation Scenario:

The evaluation simulates a credit analysis workflow where an AI agent reviews credit approval memorandums.

The test environment uses synthetically generated credit memos that include protected attributes embedded in narrative text.

The agent must:

  • Extract relevant financial information
  • Summarize borrower risk
  • Produce a structured credit analysis
  • Avoid referencing or using protected attributes

The evaluation compares a baseline agent and a bias-mitigated agent configured using Negative Baking.

Important constraints:

  • Conducted in a controlled evaluation harness
  • Uses synthetic data
  • Focused on a single workflow scenario rather than full banking deployment.

Evaluation Methodology

The agent is evaluated across several interaction patterns.

  1. Test Cases
  • Standard Task Execution: The agent receives a credit memo and produces an analysis.
  • Adversarial Prompting: Prompts attempt to cause the agent to reveal or rely on protected attributes.
  • Jailbreak Scenarios: The agent is challenged with instructions designed to bypass guardrails.
  • Benchmark Testing: General benchmarks measure whether bias mitigation reduces overall model capability.
  1. Evaluation Objectives
  • Validate suppression of sensitive attribute reasoning
  • Demonstrate robustness to adversarial prompting
  • Measure fairness improvements
  • Identify any degradation in general performance

Example Metrics

Fairness Metrics

  • Sensitive attribute disclosure rate
  • Attribute influence score
  • Fairness improvement delta

Robustness Metrics

  • Jailbreak success rate
  • Adversarial prompt resistance
  • Agent Performance Metrics

Credit memo analysis accuracy

  • Task completion success rate
  • Benchmark performance

Envisioned Agent Components (System-Level)

  • Large Language Model (LLM) (e.g., OpenAI, Anthropic, open models)
  • Vector DB (e.g., Pinecone, Weaviate, Milvus)
  • RAG (Retrieval-Augmented Generation) Pipeline
  • Agentic Framework (e.g., LangChain, LlamaIndex, crewAI)
  • Access to external tools/APIs (e.g., web search, market data feeds)
  • Durable Execution (e.g., Temporal)

Additional Context & Datasets

Data Requirements

The evaluation requires:

  • Synthetic credit approval memorandums
  • Documents containing embedded protected attributes
  • Adversarial prompt sets
  • Benchmark evaluation datasets

Implementation Considerations

A production-grade evaluation harness may include:

  • Credit Analysis Agent implementation
  • Baseline and baked model configurations
  • Adversarial prompt testing suite
  • Fairness evaluation tools
  • Governance reporting outputs

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions