LLM ์ฑ์ ์ํ pytest โ ํฐ๋ฏธ๋์์ AI ํ์ดํ๋ผ์ธ์ ํ ์คํธํ์ธ์.
npm install -g raccoon-testkit
raccoon llm-test raccoon-llm.config.json
# โ โ RAG ์๋ต ํ์ง [87%] โ ํ๊ฐ ๊ฐ์ง [94%] โ ํ๊ตญ์ด ํ๊ฐ [91%]IDE ํ๋ฌ๊ทธ์ธ ๋ถํ์. ๋ธ๋ผ์ฐ์ ํญ ๋ถํ์. ํฐ๋ฏธ๋๊ณผ ํ๋ก๋์ ์์ค์ LLM ํ ์คํธ ์ปค๋ฒ๋ฆฌ์ง๋ง์ผ๋ก ์ถฉ๋ถํฉ๋๋ค.
The pytest for LLM apps โ test your AI pipelines straight from your terminal.
npm install -g raccoon-testkit
raccoon llm-test raccoon-llm.config.json
# โ โ RAG response quality [87%] โ Hallucination check [94%] โ Korean eval [91%]No IDE plugin. No browser tab. Just your terminal and production-grade LLM test coverage.
LLM ์ฑ์ ๋ง๋๋ ๊ฒ์ ์ด๋ ต์ต๋๋ค. ํ ์คํธํ๋ ๊ฒ์ ๋ ์ด๋ ต์ต๋๋ค. Raccoon์ AI ํ์ดํ๋ผ์ธ ์ฝ๋๋ฅผ ์ฝ๊ณ , ํ๋กฌํํธ ์ฒด์ธ๊ณผ LLM ํธ์ถ์ ์ดํดํ์ฌ, ์ค์ ๋์์ ๊ฒ์ฆํ๋ ํ ์คํธ๋ฅผ ์คํํฉ๋๋ค โ ์ถ์ ์ ์ ํ๊ท๋ฅผ ์ก์ ์ ์์ต๋๋ค.
Building LLM apps is hard. Testing them is harder. Raccoon reads your AI pipeline code, understands your prompt chains and LLM calls, and evaluates real behavior โ so you can ship with confidence and catch regressions before your users do.
- ํ๊ตญ์ด/์์์ ์ธ์ด ํนํ โ ํ๊ตญ์ด G-Eval, ์กด๋๋ง ์ฒดํฌ, code-switching ์ง์ | Korean/Asian language evaluation built-in
- ํ๊ฐ ๊ฐ์ง โ ์์ค ๋ฌธ์ ๊ธฐ๋ฐ ๊ทธ๋ผ์ด๋ฉ ํ๊ฐ | Source-grounded hallucination detection
- CLI ์ฐ์ โ CI/CD, pre-commit, GitHub Actions ์ง์ | CI/CD and GitHub Actions ready
- ๋ค์ค ์ธ์ด โ JavaScript/TypeScript (Jest), Python (pytest) | Multi-language support
- ๋น ๋ฆ โ Claude AI ๊ธฐ๋ฐ, ๋ช ์ด ์์ ํ๊ฐ ์๋ฃ | Powered by Claude AI, evaluates in seconds
npm install -g raccoon-testkitraccoon init.raccoonrc.json ์ค์ ํ์ผ์ ์์ฑํฉ๋๋ค / Creates a .raccoonrc.json config in your project root.
# LLM ์ฑ ํ
์คํธ ์คํ (raccoon-llm.config.json ์๋ ํ์ง)
raccoon llm-test
# ํน์ ์ค์ ํ์ผ ์ง์
raccoon llm-test my-tests.json
# ์์ธ ์ถ๋ ฅ
raccoon llm-test --verbose
# ํ๊ตญ์ด ์ต์ ํ ๋ชจ๋
raccoon llm-test --lang koraccoon test src/llm/chat.ts
raccoon test src/{
"model": "quality",
"tests": [
{
"name": "RAG ์๋ต ํ์ง ํ๊ฐ",
"type": "prompt-regression",
"prompt": "๋ค์ ๋ฌธ์๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ง๋ฌธ์ ๋ตํ์ธ์: {{context}}\n\n์ง๋ฌธ: {{question}}",
"variables": {
"context": "์ฟ ํก์ 2010๋
์ค๋ฆฝ๋ ํ๊ตญ ์ต๋ ์ด์ปค๋จธ์ค ํ๋ซํผ์
๋๋ค.",
"question": "์ฟ ํก์ ์ธ์ ์ค๋ฆฝ๋์๋์?"
},
"promptRegression": {
"goldenKeywords": ["2010", "์ค๋ฆฝ", "์ฟ ํก"],
"minKeywordScore": 0.8
}
},
{
"name": "ํ๊ฐ ๊ฐ์ง โ ์ ํ ์ ๋ณด",
"type": "hallucination-detection",
"prompt": "์นด์นด์คํ์ด์ ์ฃผ์ ๊ธฐ๋ฅ์ ์ค๋ช
ํด์ฃผ์ธ์.",
"hallucinationDetection": {
"sourceDocument": "์นด์นด์คํ์ด๋ ์นด์นด์ค์ ๊ธ์ต ์๋น์ค๋ก, ๊ฐํธ๊ฒฐ์ , ์ก๊ธ, ํฌ์, ๋ณดํ ๊ธฐ๋ฅ์ ์ ๊ณตํฉ๋๋ค.",
"minGroundingScore": 0.7
}
},
{
"name": "ํ๊ตญ์ด ๊ณ ๊ฐ ์๋ ํ์ง (G-Eval)",
"type": "korean-g-eval",
"prompt": "๊ณ ๊ฐ์ด '๊ฒฐ์ ๊ฐ ์คํจํ์ด์'๋ผ๊ณ ํ์ ๋ ๊ณ ๊ฐ์ผํฐ AI๋ก์ ๋ต๋ณํด์ฃผ์ธ์.",
"koreanGEval": {
"criteria": ["coherence", "fluency", "relevance"],
"minScore": 0.75
}
}
]
}| ํ์ / Type | ์ค๋ช / Description |
|---|---|
prompt-regression |
ํ๋กฌํํธ ๋ณ๊ฒฝ ์ ํ์ง ํ๊ท ๊ฐ์ง / Detect quality regressions when prompts change |
hallucination-detection |
์์ค ๋ฌธ์ ๊ธฐ๋ฐ ํ๊ฐ ๊ฐ์ง / Source-grounded hallucination check |
quality-assertion |
ํ์/๊ธธ์ด/๋ด์ฉ ์ ์ฝ ๊ฒ์ฆ / Format, length, content validation |
korean-g-eval |
ํ๊ตญ์ด G-Eval (์ผ๊ด์ฑ/์ ์ฐฝ์ฑ/๊ด๋ จ์ฑ) / Korean G-Eval metrics |
korean-hallucination |
ํ๊ตญ์ด ํนํ ํ๊ฐ ๊ฐ์ง / Korean-optimized hallucination detection |
korean-culture-check |
์กด๋๋ง/๊ฒฝ์ด ์ผ๊ด์ฑ ์ฒดํฌ / Speech level (์กด๋๋ง) consistency |
multilingual-faithfulness |
ํ์/์ํ ๋ฒ์ญ ์ถฉ์ค๋ / KOโEN translation faithfulness |
| ํ๋๊ทธ / Flag | ์ค๋ช / Description |
|---|---|
--model fast |
๋น ๋ฅธ ๋ชจ๋ธ (๋ฎ์ ํ์ง) / Faster model, lower quality |
--model quality |
์ต๊ณ ํ์ง ๋ชจ๋ธ (๊ธฐ๋ณธ๊ฐ) / Best model โ default |
--verbose |
์๋ต ๋ฏธ๋ฆฌ๋ณด๊ธฐ ํฌํจ ์์ธ ์ถ๋ ฅ / Show response previews |
--lang ko |
ํ๊ตญ์ด ์ต์ ํ / Korean-optimized evaluation |
--upload |
๊ฒฐ๊ณผ๋ฅผ Raccoon ๋์๋ณด๋์ ์ ๋ก๋ / Upload results to dashboard |
--label <name> |
์ ๋ก๋ ๋ ์ด๋ธ ์ง์ / Label for uploaded results |
์นด์นด์ค, ๋ค์ด๋ฒ, ์ฟ ํก ๊ฐ์ RAG ํ์ดํ๋ผ์ธ ํ์ง์ ํ ์คํธํฉ๋๋ค.
Test RAG pipeline quality like Kakao, Naver, Coupang use in production.
raccoon llm-test examples/rag-chatbot/raccoon-llm.config.json --verboseโ examples/rag-chatbot/ ์ฐธ์กฐ
ํ๊ตญ์ด G-Eval, ํ๊ฐ ๊ฐ์ง, ์กด๋๋ง ์ฒดํฌ, ๋ฒ์ญ ์ถฉ์ค๋๋ฅผ ํฌํจํ ์ ์ฒด ์์ .
Full Korean evaluation suite: G-Eval, hallucination, speech-level check, translation faithfulness.
raccoon llm-test examples/llm-test-ko/raccoon-llm.config.json --lang ko --verboseโ examples/llm-test-ko/ ์ฐธ์กฐ
$ raccoon test src/
๐ฆ raccoon test
Found 6 file(s) to process
โ src/llm/chat.ts โ tests/llm/chat.test.ts
โ src/llm/rag-pipeline.ts โ tests/llm/rag-pipeline.test.ts
โ src/prompts/system.ts โ tests/prompts/system.test.ts
โ src/api/routes.ts โ tests/api/routes.test.ts
โ src/models/user.ts โ tests/models/user.test.ts
โฉ src/cli.ts โ tests/cli.test.ts (already exists)
โ Generated 5 test file(s)
Skipped 1 file(s)
Free tier: 5/5 used today
| ํ๋๊ทธ / Flag | ์ค๋ช / Description |
|---|---|
--model fast |
๋น ๋ฅธ ๋ชจ๋ธ / Faster model |
--model quality |
์ต๊ณ ํ์ง (๊ธฐ๋ณธ๊ฐ) / Best quality โ default |
--out-dir <dir> |
์ถ๋ ฅ ๋๋ ํ ๋ฆฌ ์ง์ / Custom output directory |
--force |
๊ธฐ์กด ํ์ผ ๋ฎ์ด์ฐ๊ธฐ / Overwrite existing test files |
| ํ๋ / Plan | ๊ฐ๊ฒฉ / Price | ์์ฑ ํ์ / Generations |
|---|---|---|
| Free | $0 | 5/์ผ / day |
| Pro | $29/์ / mo | ๋ฌด์ ํ / Unlimited |
| Team | $19/seat/์ / mo | ๋ฌด์ ํ, ์ต์ 5์ / Unlimited, min 5 seats |
raccoon upgrade๋๋ ํค ์ง์ ์ค์ / Or set your key directly:
export RACCOON_API_KEY=rn_pro_...| ์ธ์ด / Language | ํ ์คํธ ํ๋ ์์ํฌ / Test Frameworks |
|---|---|
| TypeScript | Jest, Vitest |
| JavaScript | Jest, Vitest |
| Python | pytest |
{
"model": "quality",
"outDir": "__tests__",
"raccoonApiKey": "rn_pro_..."
}# GitHub Actions โ PR๋ง๋ค LLM ํ๊ท ๊ฐ์ง / Catch LLM regressions on every PR
- name: Run LLM regression tests
run: raccoon llm-test raccoon-llm.config.json
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
RACCOON_API_KEY: ${{ secrets.RACCOON_API_KEY }}| Raccoon | DeepEval | RAGAS | Promptfoo | |
|---|---|---|---|---|
| ํ๊ตญ์ด ํนํ / Korean-native | โ | โ | โ | โ |
| ์กด๋๋ง ์ฒดํฌ / Speech-level | โ | โ | โ | โ |
| CLI ์ฐ์ / CLI-first | โ | Partial | โ | โ |
| ํ๊ฐ ๊ฐ์ง / Hallucination | โ | โ | โ | Partial |
| RAG ํ๊ฐ / RAG eval | โ | โ | โ | Partial |
| ๊ฐ๊ฒฉ / Price | $29/mo | $99/mo | OSS | $99/mo |
| ์ค์น / Install | npm i -g |
pip | pip | npm |
๋ฒ๊ทธ ์ ๋ณด, ๊ธฐ๋ฅ ์ ์, ํ๊ตญ์ด ํ๊ฐ ๊ฐ์ ์์ด๋์ด๋ฅผ ํ์ํฉ๋๋ค!
Bug reports, feature requests, and Korean evaluation ideas welcome!
โ GitHub Issues์์ ํผ๋๋ฐฑ ๋จ๊ธฐ๊ธฐ / Submit feedback on GitHub Issues
ํนํ ํ์ํ๋ ํผ๋๋ฐฑ / Especially looking for:
- ํ๊ตญ์ด LLM ํ๊ฐ์์ ๋์น๊ณ ์๋ ์ผ์ด์ค / Missing Korean LLM evaluation cases
- ์ค์ ์ฌ์ฉ ์ค์ธ ํ๋กฌํํธ ํ๊ท ํจํด / Real-world prompt regression patterns
- ์ง์์ด ํ์ํ ์ถ๊ฐ ์์์ ์ธ์ด / Additional Asian languages to support
git clone https://github.com/raccoon-sh/raccoon-cli
cd raccoon-cli
npm install
npm test
npm run dev -- llm-test examples/llm-test/raccoon-llm.config.jsonํ๊ตญ/์์์ AI ํ์ ์ํด, ์ค์ ๋ก ๋์ํ๋ LLM ์ฑ์ ์ํด.
Made for Korean and Asian AI teams who ship LLM apps that work.