Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions change/change-1c99b11b-149a-4dd4-bb8f-a4d23d98b2cf.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"changes": [
{
"type": "patch",
"comment": "Update readme with link to large repos performance info",
"packageName": "beachball",
"email": "elcraig@microsoft.com",
"dependentChangeType": "patch"
}
]
}
1 change: 1 addition & 0 deletions docs/.vuepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ export default defineUserConfig({
'/concepts/groups',
'/concepts/ci-integration',
'/concepts/ai-integration',
'/concepts/large-repos',
],
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/cli/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@ The options below apply to most CLI commands.
| `--since` | | | only consider changes or change files since this git ref (branch name, commit SHA) |
| `--verbose` | | | prints additional information to the console |

[1]: ../overview/configuration#determining-the-target-branch-and-remote
[1]: ../overview/configuration#specifying-the-target-branch-and-remote
[2]: https://www.npmjs.com/package/cosmiconfig
[3]: ../overview/configuration#scoping
103 changes: 103 additions & 0 deletions docs/concepts/large-repos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
tags:
- overview
category: doc
---

# Optimizing performance in large repos

Beachball has several options that can help improve performance in large to very large monorepos.

All the code snippets below reference `beachball.config.js`. The snippets omit some boilerplate for brevity, but the full config should look something like this (the separate typed declaration provides intellisense):

```js
/** @type {Partial<import('beachball').RepoOptions>} */
const config = {
// your options
};
module.exports = config;
```

## Specifying the remote branch

If no `branch` option is specified, or it doesn't include a remote (recommended for GitHub due to forks), Beachball has to determine the correct remote for comparison using git operations and potentially `package.json` `"repository"`. You can reduce git operations by [providing certain settings](../overview/configuration#specifying-the-target-branch-and-remote). This most noticeably improves the perf of `beachball change` and `beachball check`.

## Concurrency

### Publish and hooks

**`concurrency`** (default: `1`) controls the maximum number of concurrent write operations during publish, including hook calls and `npm publish`. The default of `1` is conservative — if you don't use hooks, or your hooks are safe to run in parallel, increasing this can speed up publishing:

```js
const config = {
concurrency: 5,
};
```

Note that beachball respects topological order (package dependency order) regardless of this setting, so packages that depend on each other will still be published sequentially.

### npm registry read

When syncing or publishing, beachball fetches version information from the npm registry for each package. In large monorepos with many packages, this can be slow.

**`npmReadConcurrency`** (default: `5`) controls how many registry reads happen at once. Increasing this can significantly speed up the fetch step:

```js
const config = {
npmReadConcurrency: 10,
};
```

## Reducing git repository size

Beachball's changelogs and change files can have a [shockingly large impact](https://github.com/microsoft/beachball/issues/978) on git repository size. Some of the related issues have been improved directly in git and/or Azure DevOps, but it's still highly recommended to enable some of these settings in a large repo.

### Disable `CHANGELOG.json` if not using

If you don't have a workflow that uses `CHANGELOG.json` (most common), set **`generateChangelog: 'md'`** to only generate `CHANGELOG.md`.
After enabling, you must **manually** delete existing `CHANGELOG.json` files.

```js
const config = {
generateChangelog: 'md',
};
```

It's also possible to disable changelog generation entirely with `generateChangelog: false`, though this defeats one of the main points of the tool.

### Limit number of versions in changelog

Set **`changelog.maxVersions`** to limit how many versions are included in each package's changelog. This prevents the changelog's history from growing indefinitely. Older versions will still be available from git history, and a note will be added directing people to look there.

```js
const config = {
// You can experiment with values
changelog: { maxVersions: 100 },
};
```

### Add hash to changelog file names

Enable **`changelog.uniqueFilenames`** to add a unique suffix to changelog filenames, based on the hash of the package name: e.g. `CHANGELOG-d7d39c3f.md`/`.json`. [Increasing filename uniqueness](https://github.com/microsoft/beachball/pull/996) can improve git performance - this has been improved in Git itself, but still doesn't hurt to enable.

When this is initially enabled, any existing changelog files will be renamed. If the package name (and therefore the hash) changes, renaming the file should also be handled automatically.

```js
const config = {
changelog: { uniqueFilenames: true },
};
```

## Skipping change commit hashes

By default, beachball records the git commit hash for each change in `CHANGELOG.json`, which adds overhead during bumping. You can disable this with **`changelog.includeCommitHashes`**:

```js
const config = {
changelog: { includeCommitHashes: false },
};
```

## Selectively skipping remote fetch

By default, beachball fetches from the remote before comparing changes. If there's a specific situation where you're **certain** the local branch is already up to date or are willing to accept the tradeoff for performance, you can skip this with `--no-fetch` (or `fetch: false` conditionally in the config).
45 changes: 35 additions & 10 deletions docs/overview/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ For the latest full list of supported options, see `RepoOptions` [in this file](
[2]: https://github.com/microsoft/beachball/blob/main/src/types/ChangelogOptions.ts
[3]: ../concepts/groups#version-groups
[4]: https://github.com/microsoft/beachball/blob/main/src/types/BeachballOptions.ts
[5]: #determining-the-target-branch-and-remote
[5]: #specifying-the-target-branch-and-remote
[6]: #glob-matching

### Glob matching
Expand All @@ -129,6 +129,7 @@ This option takes a list of patterns which are matched against package paths. Pa
Example: with this config, `beachball` will only consider packages under `packages/foo` (excluding `packages/foo/bar`).

```json
// in beachball.config.js or root package.json "beachball"
{
"scope": ["packages/foo/*", "!packages/foo/bar"]
}
Expand All @@ -138,27 +139,51 @@ On the command line, this could be specified as `--scope 'packages/foo/*' --scop

> Note: if you have multiple sets of packages in the repo with different scopes, `groupChanges` is not supported.

### Determining the target branch and remote
### Specifying the target branch and remote

The `branch` option is the official target branch to compare against when determining changes.

In GitHub repos where contributions may come from forks, you should use the **name only (no remote)** and specify `repository` in the repo root `package.json`. This allows finding the official remote by matching the URL (most formats are supported), regardless of what the user decided to call the remote. For example:
#### Repos which may have forks

If you have a public GitHub repo or another situation where **any** contributions might come from a fork, `branch` should use the **name only (no remote)** (since users can choose arbitrary names for their own remote and the official remote):

```json
// in beachball.config.js or root package.json "beachball"
{
"branch": "main"
}
```

To ensure Beachball can reliably determine which local remote name corresponds to the official remote, set `repository` in the repo root `package.json`:

```json
// repo root package.json
{
"name": "my-repo",
"repository": {
"type": "git",
"url": "https://github.com/my-org/my-repo"
},
"beachball": {
"branch": "main"
// your repository URL here (most formats are supported)
"url": "https://github.com/microsoft/beachball"
}
}
```

In private repos that use a single remote with branches instead of forks, you can either include a remote name (e.g. `branch: 'origin/main'`) if you're certain everyone will use the same remote name, or only include the branch name and specify `repository` as above.
#### Repos with a single remote (mostly Azure DevOps)

If **all** your contributors use branches on a single remote (as opposed to forking), you can specify the remote name as part of the `branch` setting. This is almost always the model used in internal Azure DevOps repos. (Do NOT use this approach for public GitHub repos.)

```json
// in beachball.config.js or root package.json "beachball"
{
"branch": "origin/main"
}
```

For safety as a fallback, it's still recommended to set `repository` in your repo root `package.json` as detailed above.

#### How Beachball determines the branch and remote

If `branch` isn't specified, the default branch name is the system default branch name (`main` or `master`).

If `branch` doesn't include a remote and it can't be determined from `package.json` `repository`, the fallback remote is `upstream` if defined, `origin` if defined, or the first defined remote.
If `branch` doesn't include a remote (or isn't specified), Beachball will first look for a remote matching `package.json` `repository`. If that's not found, the fallback remote is `upstream` if defined, `origin` if defined, or the first defined remote.

All the fallback logic involves git operations, so especially in large repos, it's best to give Beachball a hint using one of the approaches specified above for efficiency.
4 changes: 4 additions & 0 deletions packages/beachball/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,10 @@ beachball publish -r http://localhost:4873 -t beta

In large monorepos, the process of fetching versions for sync or before publishing can be time-consuming due to the high number of packages. To optimize performance, you can override the concurrency for fetching from the registry by setting `options.npmReadConcurrency` (default: 5). You can also increase concurrency for hook calls and publish operations via `options.concurrency` (default: 1; respects topological order).

### Optimizing for large monorepos

If you have a large to very large monorepo, there are several configuration options and strategies that can help improve Beachball's performance. For details, see the [large repos guide](https://microsoft.github.io/beachball/concepts/large-repos.html).

### API surface

Beachball **does not** have a public API beyond the provided [options](https://microsoft.github.io/beachball/overview/configuration.html). Usage of private APIs is not supported and may break at any time.
Expand Down
Loading