Hey there! We are so excited that you're interested in Onyx.
- Contribution Opportunities
- Contribution Process
- Development Setup
- Running the Application
- macOS-Specific Notes
- Engineering Best Practices
- Release Process
- Getting Help
- Enterprise Edition Contributions
The GitHub Issues page is a great place to look for and share contribution ideas.
If you have your own feature that you would like to build, please create an issue and community members can provide feedback and upvote if they feel a common need.
To contribute, please follow the "fork and pull request" workflow.
Create a GitHub issue and see if there are upvotes. If you feel the feature is sufficiently value-additive and you would like approval to contribute it to the repo, tag Yuhong to review.
If you do not get a response within a week, feel free to email yuhong@onyx.app and include the issue in the message.
Not all small features and enhancements will be accepted as there is a balance between feature richness and bloat. We strive to provide the best user experience possible so we have to be intentional about what we include in the app.
The Onyx team will either provide a design doc and PRD for the feature or request one from you, the contributor. The scope and detail of the design will depend on the individual feature.
If you are contributing features to Onyx Enterprise Edition, you are required to sign the IP Assignment Agreement.
Your features must pass all tests and all comments must be addressed prior to merging.
If we approve an issue, we are promising you the following:
- Your work will receive timely attention and we will put aside other important items to ensure you are not blocked.
- You will receive necessary coaching on eng quality, system design, etc. to ensure the feature is completed well.
- The Onyx team will pull resources and bandwidth from design, PM, and engineering to ensure that you have all the resources to build the feature to the quality required for merging.
Because this is a large investment from our team, we ask that you:
- Thoroughly read all the requirements of the design docs, engineering best practices, and try to minimize overhead for the Onyx team.
- Complete the feature in a timely manner to reduce context switching and an ongoing resource pull from the Onyx team.
Onyx being a fully functional app, relies on some external software, specifically:
- Postgres (Relational DB)
- OpenSearch (Vector DB/Search Engine)
- Redis (Cache)
- MinIO (File Store)
- Nginx (Not needed for development flows generally)
Note: This guide provides instructions to build and run Onyx locally from source with Docker containers providing the above external software. We believe this combination is easier for development purposes. If you prefer to use pre-built container images, see Running in Docker below.
- Python 3.11 — If using a lower version, modifications will have to be made to the code. Higher versions may have library compatibility issues.
- Docker — Required for running external services (Postgres, OpenSearch, Redis, MinIO).
- Node.js v22 — We recommend using nvm to manage Node installations.
We use uv and recommend creating a virtual environment.
uv venv .venv --python 3.11
source .venv/bin/activateFor Windows, activate the virtual environment using Command Prompt:
.venv\Scripts\activateIf using PowerShell, the command slightly differs:
.venv\Scripts\Activate.ps1Install the required Python dependencies:
uv sync --all-extrasInstall Playwright for Python (headless browser required by the Web Connector):
uv run playwright installnvm install 22 && nvm use 22
node -v # verify your active versionNavigate to onyx/web and run:
npm iSet up pre-commit hooks (black / reorder-python-imports):
uv run pre-commit installWe also use mypy for static type checking. Onyx is fully type-annotated, and we want to keep it that way! To run the mypy checks manually:
uv run mypy . # from onyx/backendWe use prettier for formatting. The desired version will be installed via npm i from the onyx/web directory. To run the formatter:
npx prettier --write . # from onyx/webPre-commit will also run prettier automatically on files you've recently touched. If re-formatted, your commit will fail. Re-stage your changes and commit again.
We highly recommend using VSCode's debugger for development.
- Copy
.vscode/env_template.txtto.vscode/.env - Fill in the necessary environment variables in
.vscode/.env
Before starting, make sure the Docker Daemon is running.
- Open the Debug view in VSCode (Cmd+Shift+D on macOS)
- From the dropdown at the top, select "Clear and Restart External Volumes and Containers" and press the green play button
- From the dropdown at the top, select "Run All Onyx Services" and press the green play button
- Navigate to http://localhost:3000 in your browser to start using the app
- Set breakpoints by clicking to the left of line numbers to help debug while the app is running
- Use the debug toolbar to step through code, inspect variables, etc.
Note: "Clear and Restart External Volumes and Containers" will reset your Postgres and OpenSearch (relational-db and index). Only run this if you are okay with wiping your data.
Features:
- Hot reload is enabled for the web server and API servers
- Python debugging is configured with debugpy
- Environment variables are loaded from
.vscode/.env - Console output is organized in the integrated terminal with labeled tabs
You will need Docker installed to run these containers.
Navigate to onyx/deployment/docker_compose, then start up Postgres/OpenSearch/Redis/MinIO with:
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d index relational_db cache minio(index refers to OpenSearch, relational_db refers to Postgres, and cache refers to Redis)
To start the frontend, navigate to onyx/web and run:
npm run devNext, start the model server which runs the local NLP models. Navigate to onyx/backend and run:
uvicorn model_server.main:app --reload --port 9000For Windows (for compatibility with both PowerShell and Command Prompt):
powershell -Command "uvicorn model_server.main:app --reload --port 9000"The first time running Onyx, you will need to run the DB migrations for Postgres. After the first time, this is no longer required unless the DB models change.
Navigate to onyx/backend and with the venv active, run:
alembic upgrade headNext, start the task queue which orchestrates the background jobs. Still in onyx/backend, run:
python ./scripts/dev_run_background_jobs.pyTo run the backend API server, navigate back to onyx/backend and run:
AUTH_TYPE=basic uvicorn onyx.main:app --reload --port 8080For Windows (for compatibility with both PowerShell and Command Prompt):
powershell -Command "
$env:AUTH_TYPE='basic'
uvicorn onyx.main:app --reload --port 8080
"Note: If you need finer logging, add the additional environment variable
LOG_LEVEL=DEBUGto the relevant services.
You should now have 4 servers running:
- Web server
- Backend API
- Model server
- Background jobs
Now, visit http://localhost:3000 in your browser. You should see the Onyx onboarding wizard where you can connect your external LLM provider to Onyx.
You've successfully set up a local Onyx instance!
You can run the full Onyx application stack from pre-built images including all external software dependencies.
Navigate to onyx/deployment/docker_compose and run:
docker compose up -dAfter Docker pulls and starts these containers, navigate to http://localhost:3000 to use Onyx.
If you want to make changes to Onyx and run those changes in Docker, you can also build a local version of the Onyx container images that incorporates your changes:
docker compose up -d --buildEnsure Homebrew is already set up, then install Python 3.11:
brew install python@3.11Add Python 3.11 to your path by adding the following line to ~/.zshrc:
export PATH="$(brew --prefix)/opt/python@3.11/libexec/bin:$PATH"
Note: You will need to open a new terminal for the path change above to take effect.
On macOS, you will need to install Docker Desktop and ensure it is running before continuing with the docker commands.
macOS will likely require you to remove some quarantine attributes on some of the hooks for them to execute properly. After installing pre-commit, run the following command:
sudo xattr -r -d com.apple.quarantine ~/.cache/pre-commitThese are also what we adhere to as a team internally, we love to build in the open and to uplevel our community and each other through being transparent.
- Use 1-way vs 2-way doors. For 2-way doors, move faster and iterate. For 1-way doors, be more deliberate.
- Consistency > being "right." Prefer consistent patterns across the codebase. If something is truly bad, fix it everywhere.
- Fix what you touch (selectively).
- Don't feel obligated to fix every best-practice issue you notice.
- Don't introduce new bad practices.
- If your change touches code that violates best practices, fix it as part of the change.
- Don't tack features on. When adding functionality, restructure logically as needed to avoid muddying interfaces and accumulating tech debt.
Add clear comments:
- At logical boundaries (e.g., interfaces) so the reader doesn't need to dig 10 layers deeper.
- Wherever assumptions are made or something non-obvious/unexpected is done.
- For complicated flows/functions.
- Wherever it saves time (e.g., nontrivial regex patterns).
- Fail loudly rather than silently skipping work.
- Example: raise and let exceptions propagate instead of silently dropping a document.
- Don't overuse
try/except.- Put
try/exceptat the correct logical level. - Do not mask exceptions unless it is clearly appropriate.
- Put
- Everything should be as strictly typed as possible.
- Use
castfor annoying/loose-typed interfaces (e.g., results ofrun_functions_tuples_in_parallel).- Only
castwhen the type checker seesAnyor types are too loose.
- Only
- Prefer types that are easy to read.
- Avoid dense types like
dict[tuple[str, str], list[list[float]]]. - Prefer domain models, e.g.:
EmbeddingModel(provider_name, model_name)as a Pydantic modeldict[EmbeddingModel, list[EmbeddingVector]]
- Avoid dense types like
- Keep clear logical boundaries for state containers and objects.
- A config object should never contain things like a
db_session. - Avoid state containers that are overly nested, or huge + flat (use judgment).
- Prefer composition and functional style over inheritance/OOP.
- Prefer no mutation unless there's a strong reason.
- State objects should be intentional and explicit, ideally nonmutating.
- Use interfaces/objects to create clear separation of responsibility.
- Prefer simplicity when there's no clear gain.
- Avoid overcomplicated mechanisms like semaphores.
- Prefer hash maps (dicts) over tree structures unless there's a strong reason.
- Name variables carefully and intentionally.
- Prefer long, explicit names when undecided.
- Avoid single-character variables except for small, self-contained utilities (or not at all).
- Keep the same object/name consistent through the call stack and within functions when reasonable.
- Good:
for token in tokens: - Bad:
for msg in tokens:(if iterating tokens)
- Good:
- Function names should bias toward long + descriptive for codebase search.
- IntelliSense can miss call sites; search works best with unique names.
- Prefer self-contained correctness — don't rely on callers to "use it right" if you can make misuse hard.
- Avoid redundancies: if a function takes an arg, it shouldn't also take a state object that contains that same arg.
- No dead code (unless there's a very good reason).
- No commented-out code in main or feature branches (unless there's a very good reason).
- No duplicate logic:
- Don't copy/paste into branches when shared logic can live above the conditional.
- If you're afraid to touch the original, you don't understand it well enough.
- LLMs often create subtle duplicate logic — review carefully and remove it.
- Avoid "nearly identical" objects that confuse when to use which.
- Avoid extremely long functions with chained logic:
- Encapsulate steps into helpers for readability, even if not reused.
- "Pythonic" multi-step expressions are OK in moderation; don't trade clarity for cleverness.
- Avoid holding resources for extended periods (DB sessions, locks/semaphores).
- Validate objects on creation and right before use.
- Connector code (data to Onyx documents):
- Any in-memory structure that can grow without bound based on input must be periodically size-checked.
- If a connector is OOMing (often shows up as "missing celery tasks"), this is a top thing to check retroactively.
- Async and event loops:
- Never introduce new async/event loop Python code, and try to make existing async code synchronous when possible if it makes sense.
- Writing async code without 100% understanding the code and having a concrete reason to do so is likely to introduce bugs and not add any meaningful performance gains.
- Pydantic + data models:
models.pyfiles. - DB interface functions (excluding lazy loading):
db/directory. - LLM prompts:
prompts/directory, roughly mirroring the code layout that uses them. - API routes:
server/directory.
- Prefer Pydantic over dataclasses.
- If absolutely required, use
allow_arbitrary_types.
- Prefer explicit
Noneover sentinel empty strings (usually; depends on intent). - Prefer explicit identifiers: use string enums instead of integer codes.
- Avoid magic numbers (co-location is good when necessary). Always avoid magic strings.
- Log messages where they are created.
- Don't propagate log messages around just to log them elsewhere.
- Don't use private attributes/methods/properties from other classes/modules.
- "Private" is private — respect that boundary.
- Lazy loading is often bad at scale, especially across multiple list relationships.
- Be careful when accessing SQLAlchemy object attributes:
- It can help avoid redundant DB queries,
- but it can also fail if accessed outside an active session,
- and lazy loading can add hidden DB dependencies to otherwise "simple" functions.
- Reference: https://www.reddit.com/r/SQLAlchemy/comments/138f248/joinedload_vs_selectinload/
- PRs should contain no more than 500 lines of real change.
- Merge to main frequently. Avoid long-lived feature branches — they create merge conflicts and integration pain.
- Use feature flags for incremental rollout.
- Large features should be merged in small, shippable increments behind a flag.
- This allows continuous integration without exposing incomplete functionality.
- Keep flags short-lived. Once a feature is fully rolled out, remove the flag and dead code paths promptly.
- Flag at the right level. Prefer flagging at API/UI entry points rather than deep in business logic.
- Test both flag states. Ensure the codebase works correctly with the flag on and off.
- Any TODOs you add in the code must be accompanied by either the name/username of the owner of that TODO, or an issue number for an issue referencing that piece of work.
- Avoid module-level logic that runs on import, which leads to import-time side effects. Essentially every piece of meaningful logic should exist within some function that has to be explicitly invoked. Acceptable exceptions may include loading environment variables or setting up loggers.
- If you find yourself needing something like this, you may want that logic to exist in a file dedicated for manual execution (contains
if __name__ == "__main__":) which should not be imported by anything else.
- If you find yourself needing something like this, you may want that logic to exist in a file dedicated for manual execution (contains
- Do not conflate Python scripts you intend to run from the command line (contains
if __name__ == "__main__":) with modules you intend to import from elsewhere. If for some unlikely reason they have to be the same file, any logic specific to executing the file (including imports) should be contained in theif __name__ == "__main__":block.- Generally these executable files exist in
backend/scripts/.
- Generally these executable files exist in
Onyx loosely follows the SemVer versioning standard. A set of Docker containers will be pushed automatically to DockerHub with every tag. You can see the containers here.
We have support channels and generally interesting discussions on our Discord.
See you there!
If you are contributing features to Onyx Enterprise Edition (code under any ee/ directory), you are required to sign the IP Assignment Agreement (PDF version).