Skip to content

Commit a57a0e6

Browse files
authored
Merge pull request #4 from cloudpilot-ai/yuhan-dev
feat: support installing wa;
2 parents 07f8974 + d74bbb0 commit a57a0e6

File tree

40 files changed

+3488
-308
lines changed

40 files changed

+3488
-308
lines changed

README.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,16 +30,22 @@ Easily manage Amazon EKS clusters and workloads with CloudPilot AI's automation
3030

3131
## Example Usage
3232

33-
See the [`examples/`](examples/) directory for real-world configurations:
33+
See the [`examples/`](examples/) directory for real-world configurations.
34+
35+
**All examples default to basic installation without enabling Node Autoscaler optimization.** Optimization options (`only_install_agent`, `enable_rebalance`) are defined as variables — modify them in `terraform.tfvars` and re-apply to enable optimization when ready.
3436

3537
| Example | Description | Use Case |
3638
|---------|-------------|----------|
37-
| [`0_details`](examples/nodeautoscale/eks/0_details/) | Full-featured EKS cluster with all options | Production setup with workload templates, nodeclasses, and complete configuration |
38-
| [`1_read-only_access`](examples/nodeautoscale/eks/1_read-only_access/) | Agent-only installation | Testing or monitoring without optimization changes |
39-
| [`2_basic_rebalance`](examples/nodeautoscale/eks/2_basic_rebalance/) | Basic rebalance enabled | Simple cost optimization with workload rebalancing |
40-
| [`3_nodeclass_nodepool_rebalance`](examples/nodeautoscale/eks/3_nodeclass_nodepool_rebalance/) | Custom nodeclass/nodepool | Advanced node management with custom configurations |
41-
42-
Each example folder contains a `main.tf` and a dedicated README with usage instructions.
39+
| [`0_details`](examples/nodeautoscale/eks/0_details/) | Full-featured reference with all options | Production setup with workload templates, nodeclasses, nodepools, Workload Autoscaler, and data sources |
40+
| [`1_read-only_access`](examples/nodeautoscale/eks/1_read-only_access/) | Minimal agent-only installation | Testing or monitoring without any optimization changes |
41+
| [`2_basic_rebalance`](examples/nodeautoscale/eks/2_basic_rebalance/) | Basic rebalance configuration | Simple cost optimization with rebalancing |
42+
| [`3_nodeclass_nodepool_rebalance`](examples/nodeautoscale/eks/3_nodeclass_nodepool_rebalance/) | Custom nodeclass and nodepool | Advanced node management with instance filtering and disruption controls |
43+
44+
Each example folder contains:
45+
- `main.tf` — resource definitions
46+
- `variables.tf` — variable declarations (including optimization toggles)
47+
- `terraform.tfvars.example` — sample variable values (copy to `terraform.tfvars` to use)
48+
- `README.md` — usage instructions with a two-step workflow (install → enable optimization)
4349

4450
---
4551

docs/data-sources/eks_cluster.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
page_title: "cloudpilotai_eks_cluster Data Source - cloudpilotai"
3+
subcategory: "Node Autoscale"
4+
description: |-
5+
Retrieves information about an existing EKS cluster registered with CloudPilot AI.
6+
---
7+
8+
# cloudpilotai_eks_cluster (Data Source)
9+
10+
Retrieves read-only information about an EKS cluster that is already registered with CloudPilot AI. Use this data source to query cluster status and agent information without making any changes.
11+
12+
## Example Usage
13+
14+
```terraform
15+
data "cloudpilotai_eks_cluster" "production" {
16+
cluster_name = "production-cluster"
17+
region = "us-west-2"
18+
}
19+
20+
output "cluster_status" {
21+
value = data.cloudpilotai_eks_cluster.production.status
22+
}
23+
24+
output "agent_version" {
25+
value = data.cloudpilotai_eks_cluster.production.agent_version
26+
}
27+
```
28+
29+
## Schema
30+
31+
### Required
32+
33+
- `cluster_name` (String) — Name of the EKS cluster.
34+
- `region` (String) — AWS region where the EKS cluster is located.
35+
36+
### Optional
37+
38+
- `account_id` (String) — AWS account ID. If not provided, it is auto-detected from the current AWS CLI credentials.
39+
40+
### Read-Only
41+
42+
- `cluster_id` (String) — CloudPilot AI cluster identifier.
43+
- `cloud_provider` (String) — Cloud provider (e.g. `aws`).
44+
- `status` (String) — Current cluster status: `online`, `offline`, or `demo`.
45+
- `agent_version` (String) — Version of the CloudPilot AI agent installed.
46+
- `rebalance_enable` (Boolean) — Whether rebalancing is enabled.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
page_title: "cloudpilotai_workload_autoscaler Data Source - cloudpilotai"
3+
subcategory: "Workload Autoscaler"
4+
description: |-
5+
Retrieves the Workload Autoscaler configuration for a given cluster.
6+
---
7+
8+
# cloudpilotai_workload_autoscaler (Data Source)
9+
10+
Retrieves read-only information about the Workload Autoscaler configuration on a cluster registered with CloudPilot AI. Use this data source to check whether the autoscaler is enabled and installed without making any changes.
11+
12+
## Example Usage
13+
14+
```terraform
15+
data "cloudpilotai_workload_autoscaler" "current" {
16+
cluster_id = cloudpilotai_eks_cluster.my_cluster.cluster_id
17+
}
18+
19+
output "wa_enabled" {
20+
value = data.cloudpilotai_workload_autoscaler.current.enabled
21+
}
22+
23+
output "wa_installed" {
24+
value = data.cloudpilotai_workload_autoscaler.current.installed
25+
}
26+
```
27+
28+
## Schema
29+
30+
### Required
31+
32+
- `cluster_id` (String) — The CloudPilot AI cluster ID.
33+
34+
### Read-Only
35+
36+
- `enabled` (Boolean) — Whether the Workload Autoscaler is enabled on this cluster.
37+
- `installed` (Boolean) — Whether the Workload Autoscaler is installed on this cluster.

docs/index.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
page_title: "CloudPilot AI Provider"
3+
subcategory: ""
4+
description: |-
5+
The CloudPilot AI provider enables Terraform to manage EKS clusters and workload autoscaling with CloudPilot AI's cost optimization platform.
6+
---
7+
8+
# CloudPilot AI Provider
9+
10+
The CloudPilot AI provider enables you to manage Amazon EKS clusters and workloads through [CloudPilot AI](https://cloudpilot.ai/)'s automation and cost optimization platform.
11+
12+
## Features
13+
14+
- Provision and manage EKS clusters with CloudPilot AI integration
15+
- Automated agent and rebalance component installation
16+
- Node pool and node class management (including custom Karpenter JSON)
17+
- Workload cost optimization (rebalance, spot-friendly, min non-spot replicas)
18+
- Workload Autoscaler with recommendation and autoscaling policies
19+
- Read-only data sources for querying existing cluster and autoscaler state
20+
21+
## Prerequisites
22+
23+
- [Terraform](https://developer.hashicorp.com/terraform/install) >= 1.0
24+
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) configured with EKS permissions
25+
- [Kubectl](https://kubernetes.io/docs/tasks/tools/) for cluster operations
26+
- A CloudPilot AI API key — see [Getting API Keys](https://docs.cloudpilot.ai/guide/getting_started/get_apikeys)
27+
28+
## Example Usage
29+
30+
```terraform
31+
provider "cloudpilotai" {
32+
api_key = var.cloudpilotai_api_key
33+
}
34+
```
35+
36+
## Authentication
37+
38+
The provider requires a CloudPilot AI API key. You can supply it in two ways:
39+
40+
- **`api_key`** — Pass the key directly (use a Terraform variable to avoid hardcoding).
41+
- **`api_key_profile`** — Path to a file containing the API key.
42+
43+
## Schema
44+
45+
### Optional
46+
47+
- `api_key` (String, Sensitive) — API key for the CloudPilot AI API.
48+
- `api_key_profile` (String) — Path to a file containing the API key.
49+
- `api_endpoint` (String) — Custom API endpoint. Defaults to `https://api.cloudpilot.ai`.

docs/resources/eks_cluster.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
page_title: "cloudpilotai_eks_cluster Resource - cloudpilotai"
3+
subcategory: "Node Autoscale"
4+
description: |-
5+
Manages an EKS cluster with CloudPilot AI agent, rebalance components, and node configuration.
6+
---
7+
8+
# cloudpilotai_eks_cluster (Resource)
9+
10+
Manages an EKS cluster registered with CloudPilot AI. This resource handles the full lifecycle: installing the CloudPilot AI agent, configuring rebalance settings, and managing node pools and node classes.
11+
12+
## Example Usage
13+
14+
### Read-Only Agent Installation
15+
16+
```terraform
17+
resource "cloudpilotai_eks_cluster" "readonly" {
18+
cluster_name = "my-eks-cluster"
19+
region = "us-west-2"
20+
restore_node_number = 0
21+
only_install_agent = true
22+
}
23+
```
24+
25+
### Basic Rebalance
26+
27+
```terraform
28+
resource "cloudpilotai_eks_cluster" "rebalance" {
29+
cluster_name = "my-eks-cluster"
30+
region = "us-west-2"
31+
restore_node_number = 3
32+
enable_rebalance = true
33+
}
34+
```
35+
36+
### With Node Classes and Node Pools
37+
38+
```terraform
39+
resource "cloudpilotai_eks_cluster" "full" {
40+
cluster_name = "my-eks-cluster"
41+
region = "us-west-2"
42+
restore_node_number = 3
43+
enable_rebalance = true
44+
45+
nodeclasses {
46+
name = "default"
47+
system_disk_size_gib = 30
48+
}
49+
50+
nodepools {
51+
name = "default"
52+
nodeclass = "default"
53+
enable = true
54+
capacity_type = ["spot", "on-demand"]
55+
instance_arch = ["amd64"]
56+
}
57+
}
58+
```
59+
60+
## Schema
61+
62+
### Required
63+
64+
- `cluster_name` (String) — Name of the EKS cluster to be managed.
65+
- `region` (String) — AWS region where the EKS cluster is located.
66+
- `restore_node_number` (Number) — Number of nodes to restore when deleting the cluster resource. Set to 0 if no nodes need restoring.
67+
68+
### Optional
69+
70+
- `kubeconfig` (String) — Path to the kubeconfig file. If not provided, the provider generates one using AWS CLI.
71+
- `account_id` (String) — AWS account ID. Auto-detected from AWS CLI if not set.
72+
- `disable_workload_uploading` (Boolean) — Disable uploading workload information. Default: `false`.
73+
- `only_install_agent` (Boolean) — Only install the agent without rebalance. Default: `false`.
74+
- `enable_upgrade_agent` (Boolean) — Upgrade the agent on next apply. Default: `false`.
75+
- `enable_upgrade_rebalance_component` (Boolean) — Upgrade the rebalance component. Default: `false`.
76+
- `enable_rebalance` (Boolean) — Enable automatic workload rebalancing. Default: `false`.
77+
- `enable_upload_config` (Boolean) — Upload nodepool/nodeclass config to CloudPilot AI. Default: `true`.
78+
- `enable_diversity_instance_type` (Boolean) — Enable diverse instance types. Default: `false`.
79+
- `workload_templates` (List of Object) — Workload template configurations.
80+
- `workloads` (List of Object) — Workload rebalance configurations.
81+
- `nodeclass_templates` (List of Object) — NodeClass template configurations.
82+
- `nodeclasses` (List of Object) — NodeClass configurations.
83+
- `nodepool_templates` (List of Object) — NodePool template configurations.
84+
- `nodepools` (List of Object) — NodePool configurations.
85+
86+
### Read-Only
87+
88+
- `cluster_id` (String) — Unique identifier of the cluster (computed).
89+
90+
## Import
91+
92+
This resource does not support import.
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
page_title: "cloudpilotai_workload_autoscaler Resource - cloudpilotai"
3+
subcategory: "Workload Autoscaler"
4+
description: |-
5+
Manages the CloudPilot AI Workload Autoscaler with recommendation and autoscaling policies.
6+
---
7+
8+
# cloudpilotai_workload_autoscaler (Resource)
9+
10+
Manages the CloudPilot AI Workload Autoscaler on a Kubernetes cluster. This resource installs the autoscaler components, configures recommendation policies, and sets up autoscaling policies for workload right-sizing.
11+
12+
## Example Usage
13+
14+
```terraform
15+
resource "cloudpilotai_workload_autoscaler" "example" {
16+
cluster_id = cloudpilotai_eks_cluster.my_cluster.cluster_id
17+
kubeconfig = "/path/to/kubeconfig"
18+
19+
recommendation_policies {
20+
name = "default-rp"
21+
strategy_type = "percentile"
22+
percentile_cpu = 95
23+
percentile_memory = 95
24+
history_window_cpu = "168h"
25+
history_window_memory = "168h"
26+
evaluation_period = "1h"
27+
}
28+
29+
autoscaling_policies {
30+
name = "default-ap"
31+
enable = true
32+
recommendation_policy_name = "default-rp"
33+
34+
target_refs {
35+
api_version = "apps/v1"
36+
kind = "Deployment"
37+
}
38+
39+
update_schedules {
40+
name = "default"
41+
mode = "inplace"
42+
}
43+
}
44+
45+
enable_proactive = [
46+
{
47+
namespaces = ["my-namespace"]
48+
}
49+
]
50+
51+
disable_proactive = [
52+
{
53+
namespaces = ["kube-system"]
54+
}
55+
]
56+
}
57+
```
58+
59+
## Schema
60+
61+
### Required
62+
63+
- `cluster_id` (String) — The CloudPilot AI cluster ID to deploy Workload Autoscaler on.
64+
- `kubeconfig` (String) — Path to the kubeconfig file for the target Kubernetes cluster.
65+
66+
### Optional
67+
68+
- `storage_class` (String) — StorageClass name for VictoriaMetrics persistent volume. Default: cluster default.
69+
- `enable_node_agent` (Boolean) — Enable the Node Agent DaemonSet for per-node metrics. Default: `true`.
70+
- `recommendation_policies` (List of Object) — List of recommendation policies. See [Recommendation Policy](#recommendation-policy) below.
71+
- `autoscaling_policies` (List of Object) — List of autoscaling policies. See [Autoscaling Policy](#autoscaling-policy) below.
72+
- `enable_proactive` (List of Object) — Workload filters to enable proactive optimization. See [Proactive Filter](#proactive-filter) below.
73+
- `disable_proactive` (List of Object) — Workload filters to disable proactive optimization. See [Proactive Filter](#proactive-filter) below.
74+
75+
### Recommendation Policy
76+
77+
Each recommendation policy supports:
78+
79+
| Attribute | Type | Required | Description |
80+
|-----------|------|----------|-------------|
81+
| `name` | String | Yes | Policy name |
82+
| `strategy_type` | String | No | Strategy type (`percentile`). Default: `percentile` |
83+
| `percentile_cpu` | Number | No | CPU percentile (50-100). Default: `95` |
84+
| `percentile_memory` | Number | No | Memory percentile (50-100). Default: `95` |
85+
| `history_window_cpu` | String | Yes | CPU history window duration (e.g. `168h`) |
86+
| `history_window_memory` | String | Yes | Memory history window duration |
87+
| `evaluation_period` | String | Yes | Evaluation period duration (e.g. `1h`) |
88+
| `buffer_cpu` | String | No | CPU buffer (e.g. `10%` or `100m`) |
89+
| `buffer_memory` | String | No | Memory buffer (e.g. `10%` or `128Mi`) |
90+
| `request_min_cpu` | String | No | Minimum CPU request recommendation |
91+
| `request_min_memory` | String | No | Minimum Memory request recommendation |
92+
| `request_max_cpu` | String | No | Maximum CPU request recommendation |
93+
| `request_max_memory` | String | No | Maximum Memory request recommendation |
94+
95+
### Autoscaling Policy
96+
97+
Each autoscaling policy supports:
98+
99+
| Attribute | Type | Required | Description |
100+
|-----------|------|----------|-------------|
101+
| `name` | String | Yes | Policy name |
102+
| `enable` | Boolean | No | Whether enabled. Default: `true` |
103+
| `recommendation_policy_name` | String | Yes | Associated recommendation policy |
104+
| `priority` | Number | No | Priority (higher wins). Default: `0` |
105+
| `update_resources` | List(String) | No | Resources to optimize (e.g. `["cpu", "memory"]`) |
106+
| `drift_threshold_cpu` | String | No | CPU drift threshold |
107+
| `drift_threshold_memory` | String | No | Memory drift threshold |
108+
| `on_policy_removal` | String | No | Behavior on removal: `off`, `recreate`, `inplace`. Default: `off` |
109+
| `target_refs` | List(Object) | No | Target workload references |
110+
| `update_schedules` | List(Object) | No | Update schedule items |
111+
| `limit_policies` | List(Object) | No | Per-resource limit policies |
112+
| `startup_boost_enabled` | Boolean | No | Enable startup resource boost. Default: `false` |
113+
| `in_place_fallback_default_policy` | String | No | Fallback policy: `recreate` or `hold` |
114+
115+
### Proactive Filter
116+
117+
Each `enable_proactive` and `disable_proactive` entry supports the same set of filter attributes:
118+
119+
| Attribute | Type | Required | Description |
120+
|-----------|------|----------|-------------|
121+
| `workload_name` | String | No | Filter by workload name (substring match) |
122+
| `namespaces` | List(String) | No | Namespaces to filter workloads |
123+
| `workload_kinds` | List(String) | No | Workload kinds (e.g. `Deployment`, `StatefulSet`) |
124+
| `autoscaling_policy_names` | List(String) | No | Filter by autoscaling policy names |
125+
| `workload_state` | String | No | Filter by workload state |
126+
| `optimization_states` | List(String) | No | Filter by optimization states |
127+
| `disable_proactive_update` | Boolean | No | Filter by whether proactive update is disabled |
128+
| `recommendation_policy_names` | List(String) | No | Filter by recommendation policy names |
129+
| `runtime_languages` | List(String) | No | Filter by container runtime languages |
130+
| `optimized` | Boolean | No | Filter by whether the workload is optimized |
131+
132+
## Import
133+
134+
This resource does not support import.

0 commit comments

Comments
 (0)