Skip to content

Discovery Mode for Pre-existing Host Directories with Auto-scaling Support #309

@prabhatsingh014

Description

@prabhatsingh014

Use Case

I have a use case where application pods are consuming local persistent volumes backed by NVMe ephemeral instance storage on cloud provider nodes (AWS EKS, Azure AKS, GCP GKE).

My Setup

During node bootstrapping, a script runs that:

  1. Creates a RAID0 array from multiple NVMe ephemeral storage devices
  2. Formats and mounts the RAID device
  3. Creates multiple bind-mounted directories (e.g., /mnt/local-storage/vol0, /mnt/local-storage/vol1, ... /mnt/local-storage/vol20) pointing to the same underlying storage

This enables a "shared cache" pattern where:

  • Multiple pods on the same node can access the same underlying data
  • Each pod gets its own PV/PVC for Kubernetes compatibility
  • Data written by one pod is immediately visible to others (shared filesystem)

Current Solution

I am currently using:

Problem

This solution does not support cluster auto-scaling.

When all pre-existing PVs on available nodes are bound:

  • New pods remain in Pending state
  • Cluster Autoscaler does not receive the proper signal (ResourceExhausted) to scale up
  • The autoscaler doesn't recognize that new nodes would provide additional capacity

Proposed Feature: Discovery Mode

I would like dynamic-localpv-provisioner to support a new provisioning mode that:

  1. Discovers pre-existing directories instead of creating new ones
  2. Allocates discovered paths to PVCs one-to-one (each directory can only be used by one PVC)
  3. Signals ResourceExhausted when all directories on available nodes are in use, triggering cluster autoscaler
  4. Works with reclaimable-pv-releaser using reclaimPolicy: Retain to recycle PVs without deleting data
  5. Enforces max PVs per node to prevent over-provisioning

Proposed StorageClass Configuration

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-nvme-discovery
provisioner: openebs.io/local
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
parameters:
  # New discovery mode parameters
  discoveryMode: "static"           # Enable discovery mode
  basePath: "/mnt/local-storage"    # Base directory to discover
  namePattern: "vol*"               # Glob pattern for directories
  slotsPerNode: "21"                # Max PVs per node (for ResourceExhausted signal)### Expected Behavior
Scenario Behavior
PVC created, Available PV exists Bind to existing Available PV
PVC created, no Available PV, free directory exists Create new PV for that directory
PVC created, all directories have PVs (Bound/Released) Wait for reclaimable-pv-releaser to make one Available
PVC created, all PVs are Bound Return ResourceExhausted → trigger autoscaler
PVC deleted (with Retain policy) PV becomes Released → releaser makes it Available

Similar Use Cases

  1. ML/AI Training: Shared model cache across training pods
  2. CI/CD Build Caches: Shared build artifacts/dependencies
  3. Content Delivery: Shared media cache for serving pods
  4. Database Replicas: Shared data directory for read replicas

Environment

  • Kubernetes: 1.25+
  • Cloud Provider: AWS EKS (with NVMe instance storage)
  • Current workaround: local-static-provisioner + reclaimable-pv-releaser

Why dynamic-localpv-provisioner?

This feature would allow dynamic-localpv-provisioner to handle use cases involving:

  • Pre-existing host directories
  • Ephemeral instance storage (NVMe)
  • Cluster auto-scaling requirements
  • Shared storage patterns with bind mounts

The key differentiator from local-static-provisioner is the ability to signal ResourceExhausted to the cluster autoscaler, enabling automatic node scaling when local storage capacity is exhausted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions