Skip to content

feat(ci): implement nextest and optimize Docker test builds#9435

Merged
gustavovalverde merged 46 commits intomainfrom
imp-caching
Aug 11, 2025
Merged

feat(ci): implement nextest and optimize Docker test builds#9435
gustavovalverde merged 46 commits intomainfrom
imp-caching

Conversation

@gustavovalverde
Copy link
Copy Markdown
Member

@gustavovalverde gustavovalverde commented Apr 17, 2025

Motivation

Zebra's CI/CD pipeline had several inefficiencies that were impacting build and test execution times:

  1. Fragmented Test Execution: Individual environment variables were used to control different test suites (RUN_ALL_TESTS, STATE_FAKE_ACTIVATION_HEIGHTS, SYNC_LARGE_CHECKPOINTS_EMPTY, etc.), sometimes in combination with feature gates, leading to inconsistent test configuration and complex workflow management.

  2. Feature Gate Inefficiencies: Different feature combinations between build-time and runtime were causing Rust to rebuild artifacts unnecessarily, slowing down the build process.

  3. Complex Entrypoint Logic: The entrypoint.sh script contained complex conditional logic for different test scenarios, making it harder to maintain and understand.

  4. Verbose CI Workflows: Each test type required separate environment variables or feature gates configurations across multiple workflow files.

Fixes: #9331

Solution

This PR implements a modernization of Zebra's test execution and Docker build system:

1. Nextest Integration with Centralized Configuration

  • Added .config/nextest.toml with 17 specialized test profiles covering all test scenarios:
    • all-tests: Runs all tests except dependency checks
    • Individual profiles for each test type: sync-full-mainnet, lwd-grpc-wallet, rpc-submit-block, etc.
    • Proper timeout configurations and success output settings per test type
  • Replaced most environment variable-based test control with a single NEXTEST_PROFILE variable
  • Centralized test filtering and scoping logic from scattered shell scripts into declarative configuration

2. Docker Build Optimization

  • Streamlined feature sets: Uses minimal features (default-release-binaries proptest-impl lightwalletd-grpc-tests zebra-checkpoints) for testing to prevent unnecessary rebuilds
  • Simplified test stage: Reduced Docker commands and improved layer caching
  • Enhanced cargo nextest integration: Added automatic nextest binary installation and optimized build process
  • Updated .dockerignore to include .config directory for nextest configuration

3. CI/CD Workflow Modernization

  • Updated 2 major workflow files (sub-ci-unit-tests-docker.yml, sub-ci-integration-tests-gcp.yml) with 15+ test jobs converted to use nextest profiles
  • Simplified environment variable management: Replaced dozens of individual test flags with unified NEXTEST_PROFILE usage
  • Improved log streaming: Simplified deployment test monitoring by removing complex grep patterns and relying on container exit codes
  • Enhanced error handling: More reliable test result detection using container exit status

4. Entrypoint Simplification (Phase 1)

  • Reduced entrypoint.sh complexity: Moved test execution logic to nextest profiles
  • Added nextest integration: When NEXTEST_PROFILE is set, the entrypoint uses cargo nextest run with appropriate flags
  • Maintained backward compatibility: Existing entry points still work while new nextest path is preferred

5. Test Execution Improvements

  • Faster test execution: Nextest's parallel execution and smart filtering reduce test times
  • Consistent timeout handling: Proper timeout configurations per test type prevent false failures
  • Improved test output: Better progress reporting and immediate success output for long-running tests

Performance Impact

Based on CI run comparisons:

  • Significant reduction in test execution times across all test suites
  • Eliminated unnecessary Rust rebuilds caused by feature flag mismatches
  • Streamlined CI workflow execution with unified environment variable approach

Migration Path

  • Backward Compatible: Existing entrypoint.sh functionality preserved
  • Gradual Adoption: Docker tests automatically use nextest, existing tests continue to work
  • Future-Ready: Foundation laid for further optimizations (test sharding, more granular grouping)

Testing

  • All existing test suites pass with nextest profiles
  • Docker builds complete successfully with optimized features
  • CI workflows execute correctly with new environment variables

@gustavovalverde gustavovalverde changed the title ref(docker): improve cargo caching by aligning mounts with CARGO_HOME ref(docker): improve caching by aligning mounts with CARGO_HOME Apr 17, 2025
@gustavovalverde gustavovalverde changed the title ref(docker): improve caching by aligning mounts with CARGO_HOME ref(docker): improve cache by aligning mounts with CARGO_HOME May 5, 2025
- Add feature gates to lightwalletd test infrastructure to prevent compilation errors when lightwalletd-grpc-tests is disabled
- Add feature gate to indexer test to prevent compilation errors when indexer is disabled
- Move lightwalletd-related imports and constants behind feature gates
- Wrap gRPC code generation in feature-conditional module
- Fix GitHub Actions workflow to pass features as single string
- Restore missing lightwalletd_failure_messages method with feature gate
- Add missing DATABASE_FORMAT_UPGRADE_IS_LONG import

This ensures --no-default-features builds work correctly while maintaining full functionality when features are enabled.
- Remove unused DATABASE_FORMAT_UPGRADE_IS_LONG import
- Update references to use common::cached_state::DATABASE_FORMAT_UPGRADE_IS_LONG
- Fix unused import linting warning
This commit addresses several issues related to running tests within Docker and CI environments.

The initial problem was a permissions error where `nextest` could not write to its store directory. This was resolved by adjusting `CARGO_HOME` and `CARGO_TARGET_DIR` to be relative to the user's home directory within the Docker image.

A subsequent issue was discovered where test filters in `nextest.toml` were platform-specific, causing the entire test suite to run on `x86_64` CI runners, leading to failures. The configuration has been refactored to be platform-agnostic, ensuring filters are applied correctly on all architectures.

Additionally, the `Dockerfile` has been updated to use a multi-stage build for fetching the `lightwalletd` binary, resolving multi-platform build failures. The test entrypoint script was also improved to correctly handle ignored tests and provide cleaner logs.

Finally, the GCP integration test workflow has been simplified to rely on the container's exit code for determining test success, removing fragile log parsing.
Copy link
Copy Markdown
Collaborator

@conradoplg conradoplg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some minor suggestions.

I'll wait before approving in order to check if we want to do the next release before merging this.

Copy link
Copy Markdown
Collaborator

@conradoplg conradoplg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@conradoplg
Copy link
Copy Markdown
Collaborator

@Mergifyio requeue

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 11, 2025

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

@gustavovalverde
Copy link
Copy Markdown
Member Author

I'm admin merging to clean-up the merge message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-devops Area: Pipelines, CI/CD and Dockerfiles A-rust Area: Updates to Rust code C-enhancement Category: This is an improvement C-feature Category: New features P-Critical 🚑

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

CI recompiles Zebra from scratch for each CI test

5 participants