This project implements an AI-powered chatbot that answers questions based on company policy documents using Retrieval-Augmented Generation (RAG). The chatbot uses a company policy PDF file as its knowledge base and provides accurate, context-aware answers that are strictly grounded in the policy content.
Unlike general-purpose AI chatbots, this system ensures all responses are based solely on the provided policy documents, making it reliable for company-specific queries. It combines modern AI technologies like OpenAI's language models with vector embeddings and semantic search to deliver relevant and accurate information.
Interactive chat interface showing a query about company policies with source citations
Interactive Web Interface - Clean, user-friendly chat UI built with Streamlit
Source Citations - Every answer includes references to the specific policy documents
Conversation History - Maintains context throughout the chat session
Semantic Search - Uses AI embeddings to find the most relevant policy sections
Grounded Responses - Answers are based only on the policy documents, reducing hallucinations
- Document Processing: Policy PDF files are loaded and split into manageable chunks
- Embedding Generation: Each chunk is converted into vector embeddings using OpenAI
- Vector Storage: Embeddings are stored in a Chroma vector database for efficient retrieval
- Query Processing: User questions are converted to embeddings and matched with relevant document chunks
- Answer Generation: The most relevant chunks are sent to GPT-5 to generate accurate, contextual answers
- Install dependencies:
pip install -r requirements.txt- Create a
.envfile with your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
- Add your policy PDF files to the
data/folder
Run this first to process your policy documents:
python create_database.pyThis will:
- Load all PDF files from the
data/folder - Split them into chunks
- Create embeddings using OpenAI
- Store them in a Chroma vector database
Launch the chatbot interface:
streamlit run app.pyThis will open a web browser with an interactive chat interface where you can ask questions about your policies.
Interactive Chat Interface - Clean, modern UI for asking questions Source Citations - See which documents the answers come from Chat History - Keep track of your conversation Semantic Search - Finds relevant information using AI embeddings
- LangChain - For RAG pipeline
- OpenAI - For embeddings and chat completion
- Chroma - Vector database
- Streamlit - Web UI framework
