Skip to content

bandsintown/bit_dbt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

95 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

bit-dbt - Bandsintown dbt Data Transformation Service

This repository contains the dbt (data build tool) project for Bandsintown's data analytics platform, integrated with AWS EMR Serverless and orchestrated via Apache Airflow.

๐ŸŽฏ Overview

Project: dbt Data Platform (DI-11)
Epic: Infrastructure Setup (DI-12)
Owner: Complicated Subsystem Team / Data Platform Team

This service transforms raw data from EMR ingestion pipelines into analytics-ready datasets using dbt Core, with models materialized in AWS Athena.

๐Ÿ—๏ธ Architecture

EMR Ingestion โ†’ S3 Raw Data โ†’ Athena (bandsintown_raw)
                                    โ†“
                              dbt Transformations
                              (EMR Serverless)
                                    โ†“
                         Athena Analytics Schema
                    (staging โ†’ intermediate โ†’ marts)

๐Ÿ“‹ Prerequisites

  • Python 3.9+
  • AWS Account with appropriate IAM permissions
  • Access to Bandsintown AWS resources:
    • S3: s3://bandsintown-dbt-analytics/
    • Athena Workgroup: bandsintown-dbt-{env}
    • EMR Serverless Application
  • Airflow environment (for production deployments)

๐Ÿš€ Quick Start

1. Clone and Setup

# Clone the repository
git clone git@github.com:bandsintown/bit-dbt.git
cd bit-dbt

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configure Environment

# Copy environment template
cp .env.example .env

# Edit .env with your configuration
# Set AWS credentials, region, S3 paths, Athena workgroup, etc.

Required environment variables:

  • AWS_REGION - AWS region (e.g., us-east-1)
  • DBT_ATHENA_S3_STAGING_DIR - S3 path for Athena query results
  • DBT_ATHENA_S3_DATA_DIR - S3 path for dbt table data
  • DBT_ATHENA_DATABASE - Athena database name
  • DBT_ATHENA_WORKGROUP - Athena workgroup name (EMR Serverless enabled)
  • DBT_TARGET - Target environment (dev/staging/prod)

3. Verify Connection

# Set profiles directory
export DBT_PROFILES_DIR=$(pwd)

# Test connection to Athena
dbt debug

# Expected output: "Connection test: OK"

4. Run dbt Models

# Install dbt packages (if any)
dbt deps

# Run all models
dbt run

# Run specific models
dbt run --select stg_events

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve  # View docs at http://localhost:8080

๐Ÿ“ Project Structure

bit-dbt/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ staging/              # Staging models (views)
โ”‚   โ”‚   โ””โ”€โ”€ bandsintown_raw/
โ”‚   โ”‚       โ”œโ”€โ”€ src_bandsintown_raw.yml
โ”‚   โ”‚       โ”œโ”€โ”€ stg_events.sql
โ”‚   โ”‚       โ””โ”€โ”€ stg_bandsintown_raw.yml
โ”‚   โ”œโ”€โ”€ intermediate/         # Intermediate business logic (views)
โ”‚   โ””โ”€โ”€ marts/                # Final analytics tables
โ”œโ”€โ”€ macros/                   # Custom dbt macros
โ”œโ”€โ”€ tests/                    # Custom data tests
โ”œโ”€โ”€ seeds/                    # CSV reference data
โ”œโ”€โ”€ snapshots/                # SCD Type 2 snapshots
โ”œโ”€โ”€ analyses/                 # Ad-hoc SQL queries
โ”œโ”€โ”€ airflow/
โ”‚   โ””โ”€โ”€ dags/
โ”‚       โ””โ”€โ”€ bandsintown_dbt_dag.py
โ”œโ”€โ”€ dbt_project.yml           # dbt project configuration
โ”œโ”€โ”€ profiles.yml              # dbt connection profiles
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ .env.example              # Environment variable template
โ”œโ”€โ”€ .gitignore
โ””โ”€โ”€ README.md

๐Ÿ”ง dbt Configuration

Materialization Strategy

  • Staging (models/staging/): Views - Fast, lightweight transformations
  • Intermediate (models/intermediate/): Views - Business logic, reusable
  • Marts (models/marts/): Tables - Final consumption layer

Schema Layout

bandsintown_raw         โ†’ Source data (read-only)
  โ””โ”€โ”€ events
  
bandsintown_analytics_{env}
  โ”œโ”€โ”€ staging           โ†’ stg_events, stg_artists, etc.
  โ”œโ”€โ”€ intermediate      โ†’ int_* models
  โ””โ”€โ”€ analytics         โ†’ dim_*, fct_* final tables

๐Ÿ” IAM Permissions

The EMR Serverless execution role requires:

Athena Permissions:

  • athena:StartQueryExecution
  • athena:GetQueryExecution
  • athena:GetQueryResults
  • athena:StopQueryExecution

S3 Permissions:

  • Read: s3://bandsintown-raw-data/*
  • Read/Write: s3://bandsintown-dbt-analytics/*

Glue Permissions:

  • glue:GetDatabase
  • glue:GetTable
  • glue:GetPartitions
  • glue:CreateTable
  • glue:UpdateTable
  • glue:DeleteTable

See iam-policy-template.json for full policy.

๐Ÿ”„ Running dbt Transformations

Run dbt transformations directly from the command line:

Basic Workflow:

# Install dependencies
dbt deps

# Test connection
dbt debug

# Check source data freshness
dbt source freshness

# Run transformations
dbt run

# Run data quality tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

Schedule with Cron (Optional):

# Add to crontab for daily runs at 6 AM
0 6 * * * cd /path/to/bit-dbt && source .venv/bin/activate && dbt run && dbt test

๐Ÿงช Testing

Run All Tests

dbt test

Source Freshness

dbt source freshness

Test Specific Model

dbt test --select stg_events

๐Ÿ“Š Data Quality

dbt tests ensure:

  • Primary keys are unique and not null
  • Foreign key relationships are valid
  • Accepted values match expected enums
  • Source data freshness (< 24 hours)
  • Custom business logic validations

๐Ÿšข Deployment

Development

export DBT_TARGET=dev
dbt run

Staging

export DBT_TARGET=staging
dbt run --full-refresh

Production

Deployed via Airflow DAG automatically after EMR ingestion completes.

IAM Permissions (Serverless)

Deploy the IAM permissions stack with:

make deploy-permissions STAGE=prod AWS_PROFILE=default AWS_REGION=us-east-1

There is also a GitHub Actions pipeline at .github/workflows/deploy-serverless-permissions.yml. It deploys automatically on changes to the IAM Serverless config and can be run manually via workflow dispatch.

Airflow dbt Runtime Payload

Buildkite upload_s3 now uploads:

  • scripts/ to s3://bit-dbt-<env>/dags/dependencies/dbt/scripts/
  • dbt project payload to s3://bit-dbt-<env>/dags/dependencies/dbt/project/

In Airflow/MWAA, use:

/usr/local/airflow/dags/dependencies/dbt/scripts/run_dbt.sh run
/usr/local/airflow/dags/dependencies/dbt/scripts/run_dbt.sh test

The helper script accepts additional dbt args, for example:

/usr/local/airflow/dags/dependencies/dbt/scripts/run_dbt.sh build --select tag:daily

๐Ÿ“– Documentation

Generate and view dbt documentation:

dbt docs generate
dbt docs serve

Documentation artifacts are automatically uploaded to S3 after each production run:

  • s3://bandsintown-dbt-analytics/docs/manifest.json
  • s3://bandsintown-dbt-analytics/docs/catalog.json

๐Ÿ› Troubleshooting

Connection Issues

# Check AWS credentials
aws sts get-caller-identity

# Verify S3 access
aws s3 ls s3://bandsintown-dbt-analytics/

# Test Athena workgroup
aws athena get-work-group --work-group bandsintown-dbt-prod

dbt Errors

# Clear cache and retry
dbt clean
dbt deps
dbt run

# Verbose logging
dbt run --debug

# Run single model with full refresh
dbt run --select stg_events --full-refresh

๐Ÿค Contributing

  1. Create a feature branch from main
  2. Make changes and test locally
  3. Submit PR with description and tests
  4. Require 2 approvals from data platform team
  5. Merge to main triggers deployment to staging
  6. Manual promotion to production

๐Ÿ“ž Support

Team: Data Platform / Complicated Subsystem Team
Slack: #data-platform
Email: data-platform@bandsintown.com

๐Ÿ“š Resources

๐Ÿ“ License

Proprietary - Bandsintown, Inc.


Last Updated: May 14, 2026
Version: 1.0.0

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

โšก