Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions github-metrics/.drone.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
kind: pipeline
type: kubernetes
name: github-metrics

trigger:
branch:
- main
event:
- push

steps:
- name: check-changes
image: alpine/git
commands:
- |
# Check if any files in github-metrics/ directory changed
git diff --name-only $DRONE_COMMIT_BEFORE $DRONE_COMMIT_AFTER | grep -q "^github-metrics/" && echo "Changes detected" || (echo "No changes in github-metrics/, skipping" && exit 78)

- name: test
image: node:20-alpine
commands:
- cd github-metrics
- npm ci
- node --version
- npm --version
- echo "Validating package.json and dependencies..."

- name: publish
image: plugins/kaniko-ecr
settings:
create_repository: true
registry: 795250896452.dkr.ecr.us-east-1.amazonaws.com
repo: docs/github-metrics
tags:
- git-${DRONE_COMMIT_SHA:0:7}
- latest
access_key:
from_secret: ecr_access_key
secret_key:
from_secret: ecr_secret_key
context: github-metrics
dockerfile: github-metrics/Dockerfile

- name: deploy
image: quay.io/mongodb/drone-helm:v3
settings:
chart: mongodb/cronjobs
chart_version: 1.21.2
add_repos: [ mongodb=https://10gen.github.io/helm-charts ]
namespace: docs
release: github-metrics
values: image.tag=git-${DRONE_COMMIT_SHA:0:7},image.repository=795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/github-metrics
values_files: [ 'github-metrics/cronjobs.yml' ]
api_server: https://api.prod.corp.mongodb.com
kubernetes_token:
from_secret: kubernetes_token
29 changes: 29 additions & 0 deletions github-metrics/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
FROM node:20-alpine

# Set working directory
WORKDIR /app

# Copy package files first (for better Docker layer caching)
COPY package.json package-lock.json ./

# Install dependencies (use ci for reproducible builds)
RUN npm ci --only=production

# Copy the rest of the application files
COPY . .

# Create a non-root user for security best practices
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001 && \
chown -R nodejs:nodejs /app

# Switch to non-root user
USER nodejs

# Set NODE_ENV to production
ENV NODE_ENV=production

# Command to run the application
# This will be executed by the Kubernetes CronJob
CMD ["node", "index.js"]

126 changes: 123 additions & 3 deletions github-metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

This directory contains tooling to enable us to track various GitHub project metrics programmatically.

Currently, it contains a PoC for a simple pipeline to pull metrics from GitHub into MongoDB Atlas.
This tool runs as a Kubernetes CronJob on Kanopy, automatically collecting metrics from GitHub every 14 days and storing them in MongoDB Atlas.

Planned future work:
Planned future work:

- Add logic to work with pulled maintenance metrics once available in the test repo
- Set up Atlas Charts to visualize the data
Expand Down Expand Up @@ -119,7 +119,7 @@ For this project, as a MongoDB org member, you must also auth your PAT with SSO.
npm install
```

3. **Run the utility**
3. **Manually run the utility**

From the root of the directory, run the following command to run the utility:

Expand All @@ -132,3 +132,123 @@ For this project, as a MongoDB org member, you must also auth your PAT with SSO.
```
A document was inserted into mongodb_docs-notebooks with the _id: 678197a0ffe1539ff213bd86
```

## Automated Deployment (Kanopy CronJob)

This tool is deployed as a Kubernetes CronJob on Kanopy that runs automatically every 14 days.

### Deployment Architecture

The deployment consists of three main components:

1. **Dockerfile**: Containerizes the Node.js application
2. **cronjobs.yml**: Helm values file that configures the CronJob schedule and resources
3. **.drone.yml**: CI/CD pipeline that builds, publishes, and deploys the application

### CronJob Schedule

The cronjob is **scheduled to run weekly on Mondays at 8:00 AM UTC** (`0 8 * * 1`), but the application includes smart logic to prevent running too frequently:

- The cronjob triggers every Monday
- The application checks if 14 days have passed since the last successful run
- If less than 14 days have passed, the job exits early without collecting metrics
- If 14 days or more have passed, it collects metrics and updates the timestamp

The last run timestamp is stored in a persistent volume (`/data/last-run.json`) that survives between cronjob executions.

### Required Kubernetes Secrets

The cronjob requires two Kubernetes secrets to be created in the `docs` namespace:

1. **github-token**: Contains the GitHub Personal Access Token
```bash
kubectl create secret generic github-token \
--from-literal=GITHUB_TOKEN='your-github-token' \
-n docs
```

2. **atlas-connection-string**: Contains the MongoDB Atlas connection string
```bash
kubectl create secret generic atlas-connection-string \
--from-literal=ATLAS_CONNECTION_STRING='your-connection-string' \
-n docs
```

> **Note**: These secrets should already exist in the production environment. Contact the DevDocs team if you need to create or update them.

### Deployment Process

The deployment is fully automated via Drone CI/CD:

1. **Test Pipeline** (`github-metrics-test`):
- Checks if files in `github-metrics/` directory changed
- Validates dependencies with `npm ci`
- Runs on pull requests and pushes to main

2. **Build Pipeline** (`github-metrics-build`):
- Builds Docker image using Kaniko
- Publishes to ECR: `795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/github-metrics`
- Tags with git commit SHA and `latest`

3. **Deploy Pipeline** (`github-metrics-deploy`):
- Deploys to production Kanopy cluster using Helm
- Uses the `mongodb/cronjobs` chart (version 1.21.2)
- Deploys to the `docs` namespace

### Manual Deployment

To manually trigger a deployment:

1. Push changes to the `main` branch
2. Drone will automatically run the test, build, and deploy pipelines

### Manually Triggering the CronJob

To manually run the cronjob outside of its schedule:

```bash
# Find the cronjob
kubectl get cronjobs -n docs

# Create a one-time job from the cronjob
kubectl create job --from=cronjob/github-metrics-collection \
github-metrics-manual-$(date +%s) -n docs

# Check the job status
kubectl get jobs -n docs

# View logs
kubectl logs -n docs job/github-metrics-manual-<timestamp>
```

### Monitoring

To check the status of the cronjob:

```bash
# View cronjob details
kubectl get cronjob github-metrics-collection -n docs

# View recent job runs
kubectl get jobs -n docs | grep github-metrics

# View logs from the most recent run
kubectl logs -n docs -l job-name=<job-name>

# Check the last run timestamp (requires exec into a pod)
kubectl exec -n docs <pod-name> -- cat /data/last-run.json
```

The logs will show whether the job ran or was skipped:
- `⏭️ Skipping run - only X days since last run (need 14)` - Job skipped, not enough time passed
- `✅ Proceeding with run - X days since last run` - Job is collecting metrics

### Configuration Changes

To modify the cronjob configuration:

1. **Change schedule**: Edit `cronjobs.yml` and update the `schedule` field
2. **Change resources**: Edit `cronjobs.yml` and update the `resources` section
3. **Change repositories tracked**: Edit `repo-details.json`

After making changes, commit and push to the `main` branch. Drone will automatically deploy the updates.
99 changes: 99 additions & 0 deletions github-metrics/check-last-run.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import fs from 'fs';
import path from 'path';

// Path to the state file (mounted from persistent volume)
const STATE_FILE_PATH = process.env.STATE_FILE_PATH || '/data/last-run.json';
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like we're trying to read from process.env here but I don't see where we're setting these vars in the env setup?


// Minimum days between runs
const MIN_DAYS_BETWEEN_RUNS = parseInt(process.env.MIN_DAYS_BETWEEN_RUNS || '14', 10);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this 13, instead? I'm wondering if any possibility of weirdness related to timing of test runs, etc. If it's only running on Monday and it's been close to 14 it should probably run.


/**
* Check if enough time has passed since the last run
* @returns {boolean} true if should run, false if should skip
*/
export function shouldRun() {
try {
// Check if state file exists
if (!fs.existsSync(STATE_FILE_PATH)) {
console.log('No previous run found. Running for the first time.');
return true;
}

// Read the last run timestamp
const stateData = JSON.parse(fs.readFileSync(STATE_FILE_PATH, 'utf8'));
const lastRunTime = new Date(stateData.lastRun);
const now = new Date();

// Calculate days since last run
const daysSinceLastRun = (now - lastRunTime) / (1000 * 60 * 60 * 24);

console.log(`Last run: ${lastRunTime.toISOString()}`);
console.log(`Days since last run: ${daysSinceLastRun.toFixed(2)}`);
console.log(`Minimum days required: ${MIN_DAYS_BETWEEN_RUNS}`);

if (daysSinceLastRun < MIN_DAYS_BETWEEN_RUNS) {
console.log(`⏭️ Skipping run - only ${daysSinceLastRun.toFixed(2)} days since last run (need ${MIN_DAYS_BETWEEN_RUNS})`);
return false;
}

console.log(`✅ Proceeding with run - ${daysSinceLastRun.toFixed(2)} days since last run`);
return true;

} catch (error) {
console.error('Error checking last run time:', error.message);
console.log('Proceeding with run due to error reading state file');
return true; // Run if we can't read the state file
}
}

/**
* Update the state file with the current timestamp
*/
export function updateLastRun() {
try {
const now = new Date();
const stateData = {
lastRun: now.toISOString(),
timestamp: now.getTime()
};

// Ensure the directory exists
const dir = path.dirname(STATE_FILE_PATH);
if (!fs.existsSync(dir)) {
fs.mkdirSync(dir, { recursive: true });
}

// Write the state file
fs.writeFileSync(STATE_FILE_PATH, JSON.stringify(stateData, null, 2), 'utf8');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's any possibility of weirdness if the file already exists. Have you tested this manually with an extant file to ensure the behavior is correct - i.e. overwrites file contents? I'm wondering if we should remove the old file during the update func just for safety, or if that's overkill.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like overkill, esp for this tiny doc. @krollins-mdb does this align with common practice around writeFileSync()?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method completely replaces the file if it already exists, so it should be fine.

I am curious about using the sync version here instead of the default async fs.writeFile. In index.js, the function call before this function is called is async and awaited. Seems a bit odd to not do the same with this one.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching. I'll update

console.log(`✅ Updated last run timestamp: ${now.toISOString()}`);

} catch (error) {
console.error('Error updating last run time:', error.message);
// Don't throw - we don't want to fail the job just because we can't write the state file
}
}

/**
* Get the last run information
* @returns {Object|null} Object with lastRun date and timestamp, or null if no previous run
*/
export function getLastRunInfo() {
Comment thread
cbullinger marked this conversation as resolved.
Outdated
try {
if (!fs.existsSync(STATE_FILE_PATH)) {
return null;
}

const stateData = JSON.parse(fs.readFileSync(STATE_FILE_PATH, 'utf8'));
return {
lastRun: new Date(stateData.lastRun),
timestamp: stateData.timestamp
};

} catch (error) {
console.error('Error reading last run info:', error.message);
return null;
}
}

export { MIN_DAYS_BETWEEN_RUNS };

35 changes: 35 additions & 0 deletions github-metrics/cronjobs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
# `image` can be skipped if the values are being set in your .drone.yml file
image:
repository: 795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/github-metrics
tag: latest

# global secrets are references to k8s Secrets
globalEnvSecrets:
GITHUB_TOKEN: github-token
ATLAS_CONNECTION_STRING: atlas-connection-string

cronJobs:
- name: github-metrics-collection
# Run weekly on Mondays at 8am UTC
# The application checks if it ran in the last 14 days and skips if so
# Cron format: minute hour day-of-month month day-of-week
# 0 = Sunday, 1 = Monday, etc.
schedule: "0 8 * * 1"
command:
- node
- index.js
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# Persistent volume to store last run timestamp
persistence:
enabled: true
storageClass: "standard"
accessMode: ReadWriteOnce
size: 1Gi
mountPath: /data
Loading