Skip to content

h33min/raccoon-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฆ Raccoon

LLM ์•ฑ์„ ์œ„ํ•œ pytest โ€” ํ„ฐ๋ฏธ๋„์—์„œ AI ํŒŒ์ดํ”„๋ผ์ธ์„ ํ…Œ์ŠคํŠธํ•˜์„ธ์š”.

npm install -g raccoon-testkit
raccoon llm-test raccoon-llm.config.json
# โ†’ โœ“ RAG ์‘๋‹ต ํ’ˆ์งˆ [87%]  โœ“ ํ™˜๊ฐ ๊ฐ์ง€ [94%]  โœ“ ํ•œ๊ตญ์–ด ํ‰๊ฐ€ [91%]

IDE ํ”Œ๋Ÿฌ๊ทธ์ธ ๋ถˆํ•„์š”. ๋ธŒ๋ผ์šฐ์ € ํƒญ ๋ถˆํ•„์š”. ํ„ฐ๋ฏธ๋„๊ณผ ํ”„๋กœ๋•์…˜ ์ˆ˜์ค€์˜ LLM ํ…Œ์ŠคํŠธ ์ปค๋ฒ„๋ฆฌ์ง€๋งŒ์œผ๋กœ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.


The pytest for LLM apps โ€” test your AI pipelines straight from your terminal.

npm install -g raccoon-testkit
raccoon llm-test raccoon-llm.config.json
# โ†’ โœ“ RAG response quality [87%]  โœ“ Hallucination check [94%]  โœ“ Korean eval [91%]

No IDE plugin. No browser tab. Just your terminal and production-grade LLM test coverage.


์™œ Raccoon์ธ๊ฐ€? / Why Raccoon?

LLM ์•ฑ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธํ•˜๋Š” ๊ฒƒ์€ ๋” ์–ด๋ ต์Šต๋‹ˆ๋‹ค. Raccoon์€ AI ํŒŒ์ดํ”„๋ผ์ธ ์ฝ”๋“œ๋ฅผ ์ฝ๊ณ , ํ”„๋กฌํ”„ํŠธ ์ฒด์ธ๊ณผ LLM ํ˜ธ์ถœ์„ ์ดํ•ดํ•˜์—ฌ, ์‹ค์ œ ๋™์ž‘์„ ๊ฒ€์ฆํ•˜๋Š” ํ…Œ์ŠคํŠธ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค โ€” ์ถœ์‹œ ์ „์— ํšŒ๊ท€๋ฅผ ์žก์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Building LLM apps is hard. Testing them is harder. Raccoon reads your AI pipeline code, understands your prompt chains and LLM calls, and evaluates real behavior โ€” so you can ship with confidence and catch regressions before your users do.

  • ํ•œ๊ตญ์–ด/์•„์‹œ์•„ ์–ธ์–ด ํŠนํ™” โ€” ํ•œ๊ตญ์–ด G-Eval, ์กด๋Œ“๋ง ์ฒดํฌ, code-switching ์ง€์› | Korean/Asian language evaluation built-in
  • ํ™˜๊ฐ ๊ฐ์ง€ โ€” ์†Œ์Šค ๋ฌธ์„œ ๊ธฐ๋ฐ˜ ๊ทธ๋ผ์šด๋”ฉ ํ‰๊ฐ€ | Source-grounded hallucination detection
  • CLI ์šฐ์„  โ€” CI/CD, pre-commit, GitHub Actions ์ง€์› | CI/CD and GitHub Actions ready
  • ๋‹ค์ค‘ ์–ธ์–ด โ€” JavaScript/TypeScript (Jest), Python (pytest) | Multi-language support
  • ๋น ๋ฆ„ โ€” Claude AI ๊ธฐ๋ฐ˜, ๋ช‡ ์ดˆ ์•ˆ์— ํ‰๊ฐ€ ์™„๋ฃŒ | Powered by Claude AI, evaluates in seconds

๋น ๋ฅธ ์‹œ์ž‘ / Quick Start

์„ค์น˜ / Install

npm install -g raccoon-testkit

์ดˆ๊ธฐํ™” / Initialize (optional)

raccoon init

.raccoonrc.json ์„ค์ • ํŒŒ์ผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค / Creates a .raccoonrc.json config in your project root.

LLM ํ…Œ์ŠคํŠธ ์‹คํ–‰ / Run LLM tests

# LLM ์•ฑ ํ…Œ์ŠคํŠธ ์‹คํ–‰ (raccoon-llm.config.json ์ž๋™ ํƒ์ง€)
raccoon llm-test

# ํŠน์ • ์„ค์ • ํŒŒ์ผ ์ง€์ •
raccoon llm-test my-tests.json

# ์ƒ์„ธ ์ถœ๋ ฅ
raccoon llm-test --verbose

# ํ•œ๊ตญ์–ด ์ตœ์ ํ™” ๋ชจ๋“œ
raccoon llm-test --lang ko

์ฝ”๋“œ ํ…Œ์ŠคํŠธ ์ƒ์„ฑ / Generate unit tests

raccoon test src/llm/chat.ts
raccoon test src/

raccoon llm-test โ€” LLM ์•ฑ ํ‰๊ฐ€ ์—”์ง„

์„ค์ • ํŒŒ์ผ ์˜ˆ์‹œ / Config example (raccoon-llm.config.json)

{
  "model": "quality",
  "tests": [
    {
      "name": "RAG ์‘๋‹ต ํ’ˆ์งˆ ํ‰๊ฐ€",
      "type": "prompt-regression",
      "prompt": "๋‹ค์Œ ๋ฌธ์„œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์งˆ๋ฌธ์— ๋‹ตํ•˜์„ธ์š”: {{context}}\n\n์งˆ๋ฌธ: {{question}}",
      "variables": {
        "context": "์ฟ ํŒก์€ 2010๋…„ ์„ค๋ฆฝ๋œ ํ•œ๊ตญ ์ตœ๋Œ€ ์ด์ปค๋จธ์Šค ํ”Œ๋žซํผ์ž…๋‹ˆ๋‹ค.",
        "question": "์ฟ ํŒก์€ ์–ธ์ œ ์„ค๋ฆฝ๋˜์—ˆ๋‚˜์š”?"
      },
      "promptRegression": {
        "goldenKeywords": ["2010", "์„ค๋ฆฝ", "์ฟ ํŒก"],
        "minKeywordScore": 0.8
      }
    },
    {
      "name": "ํ™˜๊ฐ ๊ฐ์ง€ โ€” ์ œํ’ˆ ์ •๋ณด",
      "type": "hallucination-detection",
      "prompt": "์นด์นด์˜คํŽ˜์ด์˜ ์ฃผ์š” ๊ธฐ๋Šฅ์„ ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”.",
      "hallucinationDetection": {
        "sourceDocument": "์นด์นด์˜คํŽ˜์ด๋Š” ์นด์นด์˜ค์˜ ๊ธˆ์œต ์„œ๋น„์Šค๋กœ, ๊ฐ„ํŽธ๊ฒฐ์ œ, ์†ก๊ธˆ, ํˆฌ์ž, ๋ณดํ—˜ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
        "minGroundingScore": 0.7
      }
    },
    {
      "name": "ํ•œ๊ตญ์–ด ๊ณ ๊ฐ ์‘๋Œ€ ํ’ˆ์งˆ (G-Eval)",
      "type": "korean-g-eval",
      "prompt": "๊ณ ๊ฐ์ด '๊ฒฐ์ œ๊ฐ€ ์‹คํŒจํ–ˆ์–ด์š”'๋ผ๊ณ  ํ–ˆ์„ ๋•Œ ๊ณ ๊ฐ์„ผํ„ฐ AI๋กœ์„œ ๋‹ต๋ณ€ํ•ด์ฃผ์„ธ์š”.",
      "koreanGEval": {
        "criteria": ["coherence", "fluency", "relevance"],
        "minScore": 0.75
      }
    }
  ]
}

ํ…Œ์ŠคํŠธ ํƒ€์ž… / Test Types

ํƒ€์ž… / Type ์„ค๋ช… / Description
prompt-regression ํ”„๋กฌํ”„ํŠธ ๋ณ€๊ฒฝ ์‹œ ํ’ˆ์งˆ ํšŒ๊ท€ ๊ฐ์ง€ / Detect quality regressions when prompts change
hallucination-detection ์†Œ์Šค ๋ฌธ์„œ ๊ธฐ๋ฐ˜ ํ™˜๊ฐ ๊ฐ์ง€ / Source-grounded hallucination check
quality-assertion ํ˜•์‹/๊ธธ์ด/๋‚ด์šฉ ์ œ์•ฝ ๊ฒ€์ฆ / Format, length, content validation
korean-g-eval ํ•œ๊ตญ์–ด G-Eval (์ผ๊ด€์„ฑ/์œ ์ฐฝ์„ฑ/๊ด€๋ จ์„ฑ) / Korean G-Eval metrics
korean-hallucination ํ•œ๊ตญ์–ด ํŠนํ™” ํ™˜๊ฐ ๊ฐ์ง€ / Korean-optimized hallucination detection
korean-culture-check ์กด๋Œ“๋ง/๊ฒฝ์–ด ์ผ๊ด€์„ฑ ์ฒดํฌ / Speech level (์กด๋Œ“๋ง) consistency
multilingual-faithfulness ํ•œ์˜/์˜ํ•œ ๋ฒˆ์—ญ ์ถฉ์‹ค๋„ / KOโ†”EN translation faithfulness

CLI ์˜ต์…˜ / CLI Options

ํ”Œ๋ž˜๊ทธ / Flag ์„ค๋ช… / Description
--model fast ๋น ๋ฅธ ๋ชจ๋ธ (๋‚ฎ์€ ํ’ˆ์งˆ) / Faster model, lower quality
--model quality ์ตœ๊ณ  ํ’ˆ์งˆ ๋ชจ๋ธ (๊ธฐ๋ณธ๊ฐ’) / Best model โ€” default
--verbose ์‘๋‹ต ๋ฏธ๋ฆฌ๋ณด๊ธฐ ํฌํ•จ ์ƒ์„ธ ์ถœ๋ ฅ / Show response previews
--lang ko ํ•œ๊ตญ์–ด ์ตœ์ ํ™” / Korean-optimized evaluation
--upload ๊ฒฐ๊ณผ๋ฅผ Raccoon ๋Œ€์‹œ๋ณด๋“œ์— ์—…๋กœ๋“œ / Upload results to dashboard
--label <name> ์—…๋กœ๋“œ ๋ ˆ์ด๋ธ” ์ง€์ • / Label for uploaded results

์‹ค์ „ ์˜ˆ์ œ / Real-World Examples

RAG ์ฑ—๋ด‡ ํ…Œ์ŠคํŠธ / RAG Chatbot Testing

์นด์นด์˜ค, ๋„ค์ด๋ฒ„, ์ฟ ํŒก ๊ฐ™์€ RAG ํŒŒ์ดํ”„๋ผ์ธ ํ’ˆ์งˆ์„ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.

Test RAG pipeline quality like Kakao, Naver, Coupang use in production.

raccoon llm-test examples/rag-chatbot/raccoon-llm.config.json --verbose

โ†’ examples/rag-chatbot/ ์ฐธ์กฐ

ํ•œ๊ตญ์–ด LLM ํ‰๊ฐ€ / Korean LLM Evaluation

ํ•œ๊ตญ์–ด G-Eval, ํ™˜๊ฐ ๊ฐ์ง€, ์กด๋Œ“๋ง ์ฒดํฌ, ๋ฒˆ์—ญ ์ถฉ์‹ค๋„๋ฅผ ํฌํ•จํ•œ ์ „์ฒด ์˜ˆ์ œ.

Full Korean evaluation suite: G-Eval, hallucination, speech-level check, translation faithfulness.

raccoon llm-test examples/llm-test-ko/raccoon-llm.config.json --lang ko --verbose

โ†’ examples/llm-test-ko/ ์ฐธ์กฐ


raccoon test โ€” ์ฝ”๋“œ ํ…Œ์ŠคํŠธ ์ž๋™ ์ƒ์„ฑ

$ raccoon test src/

๐Ÿฆ raccoon test

Found 6 file(s) to process

โœ“ src/llm/chat.ts          โ†’ tests/llm/chat.test.ts
โœ“ src/llm/rag-pipeline.ts  โ†’ tests/llm/rag-pipeline.test.ts
โœ“ src/prompts/system.ts    โ†’ tests/prompts/system.test.ts
โœ“ src/api/routes.ts        โ†’ tests/api/routes.test.ts
โœ“ src/models/user.ts       โ†’ tests/models/user.test.ts
  โ†ฉ src/cli.ts             โ†’ tests/cli.test.ts (already exists)

โœ“ Generated 5 test file(s)
  Skipped 1 file(s)

  Free tier: 5/5 used today
ํ”Œ๋ž˜๊ทธ / Flag ์„ค๋ช… / Description
--model fast ๋น ๋ฅธ ๋ชจ๋ธ / Faster model
--model quality ์ตœ๊ณ  ํ’ˆ์งˆ (๊ธฐ๋ณธ๊ฐ’) / Best quality โ€” default
--out-dir <dir> ์ถœ๋ ฅ ๋””๋ ‰ํ† ๋ฆฌ ์ง€์ • / Custom output directory
--force ๊ธฐ์กด ํŒŒ์ผ ๋ฎ์–ด์“ฐ๊ธฐ / Overwrite existing test files

์š”๊ธˆ์ œ / Pricing

ํ”Œ๋žœ / Plan ๊ฐ€๊ฒฉ / Price ์ƒ์„ฑ ํšŸ์ˆ˜ / Generations
Free $0 5/์ผ / day
Pro $29/์›” / mo ๋ฌด์ œํ•œ / Unlimited
Team $19/seat/์›” / mo ๋ฌด์ œํ•œ, ์ตœ์†Œ 5์„ / Unlimited, min 5 seats
raccoon upgrade

๋˜๋Š” ํ‚ค ์ง์ ‘ ์„ค์ • / Or set your key directly:

export RACCOON_API_KEY=rn_pro_...

์ง€์› ์–ธ์–ด ๋ฐ ํ”„๋ ˆ์ž„์›Œํฌ / Supported Languages & Frameworks

์–ธ์–ด / Language ํ…Œ์ŠคํŠธ ํ”„๋ ˆ์ž„์›Œํฌ / Test Frameworks
TypeScript Jest, Vitest
JavaScript Jest, Vitest
Python pytest

์„ค์ • / Configuration (.raccoonrc.json)

{
  "model": "quality",
  "outDir": "__tests__",
  "raccoonApiKey": "rn_pro_..."
}

CI/CD ํ†ตํ•ฉ / CI/CD Integration

# GitHub Actions โ€” PR๋งˆ๋‹ค LLM ํšŒ๊ท€ ๊ฐ์ง€ / Catch LLM regressions on every PR
- name: Run LLM regression tests
  run: raccoon llm-test raccoon-llm.config.json
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    RACCOON_API_KEY: ${{ secrets.RACCOON_API_KEY }}

vs. ๊ฒฝ์Ÿ์‚ฌ / vs. Alternatives

Raccoon DeepEval RAGAS Promptfoo
ํ•œ๊ตญ์–ด ํŠนํ™” / Korean-native โœ… โŒ โŒ โŒ
์กด๋Œ“๋ง ์ฒดํฌ / Speech-level โœ… โŒ โŒ โŒ
CLI ์šฐ์„  / CLI-first โœ… Partial โŒ โœ…
ํ™˜๊ฐ ๊ฐ์ง€ / Hallucination โœ… โœ… โœ… Partial
RAG ํ‰๊ฐ€ / RAG eval โœ… โœ… โœ… Partial
๊ฐ€๊ฒฉ / Price $29/mo $99/mo OSS $99/mo
์„ค์น˜ / Install npm i -g pip pip npm

ํ”ผ๋“œ๋ฐฑ / Feedback

๋ฒ„๊ทธ ์ œ๋ณด, ๊ธฐ๋Šฅ ์ œ์•ˆ, ํ•œ๊ตญ์–ด ํ‰๊ฐ€ ๊ฐœ์„  ์•„์ด๋””์–ด๋ฅผ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!

Bug reports, feature requests, and Korean evaluation ideas welcome!

โ†’ GitHub Issues์—์„œ ํ”ผ๋“œ๋ฐฑ ๋‚จ๊ธฐ๊ธฐ / Submit feedback on GitHub Issues

ํŠนํžˆ ํ™˜์˜ํ•˜๋Š” ํ”ผ๋“œ๋ฐฑ / Especially looking for:

  • ํ•œ๊ตญ์–ด LLM ํ‰๊ฐ€์—์„œ ๋†“์น˜๊ณ  ์žˆ๋Š” ์ผ€์ด์Šค / Missing Korean LLM evaluation cases
  • ์‹ค์ œ ์‚ฌ์šฉ ์ค‘์ธ ํ”„๋กฌํ”„ํŠธ ํšŒ๊ท€ ํŒจํ„ด / Real-world prompt regression patterns
  • ์ง€์›์ด ํ•„์š”ํ•œ ์ถ”๊ฐ€ ์•„์‹œ์•„ ์–ธ์–ด / Additional Asian languages to support

๊ฐœ๋ฐœ / Development

git clone https://github.com/raccoon-sh/raccoon-cli
cd raccoon-cli
npm install
npm test
npm run dev -- llm-test examples/llm-test/raccoon-llm.config.json

ํ•œ๊ตญ/์•„์‹œ์•„ AI ํŒ€์„ ์œ„ํ•ด, ์‹ค์ œ๋กœ ๋™์ž‘ํ•˜๋Š” LLM ์•ฑ์„ ์œ„ํ•ด.

Made for Korean and Asian AI teams who ship LLM apps that work.

About

๐Ÿฆ The pytest for LLM apps โ€” test your AI pipelines from your terminal. Korean/Asian language evaluation built-in.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

โšก