This document provides comprehensive configuration guidance for HugeGraph PD, including parameter descriptions, deployment scenarios, and production tuning recommendations.
- Configuration File Overview
- Core Configuration Parameters
- Deployment Scenarios
- Production Tuning
- Logging Configuration
- Monitoring and Metrics
PD uses the following configuration files (located in conf/ directory):
| File | Purpose |
|---|---|
application.yml |
Main PD configuration (gRPC, Raft, storage, etc.) |
log4j2.xml |
Logging configuration (log levels, appenders, rotation) |
verify-license.json |
License verification configuration (optional) |
application.yml
├── spring # Spring Boot framework settings
├── management # Actuator endpoints and metrics
├── logging # Log configuration file location
├── license # License verification (optional)
├── grpc # gRPC server settings
├── server # REST API server settings
├── pd # PD-specific settings
├── raft # Raft consensus settings
├── store # Store node management settings
└── partition # Partition management settings
Controls the gRPC server for inter-service communication.
grpc:
host: 127.0.0.1 # gRPC bind address
port: 8686 # gRPC server port| Parameter | Type | Default | Description |
|---|---|---|---|
grpc.host |
String | 127.0.0.1 |
IMPORTANT: Must be set to actual IP address (not 127.0.0.1) for distributed deployments. Store and Server nodes connect to this address. In Docker bridge networking, set this to the container hostname (e.g., pd0) via HG_PD_GRPC_HOST env var. |
grpc.port |
Integer | 8686 |
gRPC server port. Ensure this port is accessible from Store and Server nodes. |
Production Notes:
- Set
grpc.hostto the node's actual IP address (e.g.,192.168.1.10) - Avoid using
0.0.0.0as it may cause service discovery issues - Ensure firewall allows incoming connections on
grpc.port
Controls the REST API server for management and monitoring.
server:
port: 8620 # REST API port| Parameter | Type | Default | Description |
|---|---|---|---|
server.port |
Integer | 8620 |
REST API port for health checks, metrics, and management operations. |
Endpoints:
- Health check:
http://<host>:8620/actuator/health - Metrics:
http://<host>:8620/actuator/metrics - Prometheus:
http://<host>:8620/actuator/prometheus
Controls Raft consensus for PD cluster coordination.
raft:
address: 127.0.0.1:8610 # This node's Raft address
peers-list: 127.0.0.1:8610 # All PD nodes in the cluster| Parameter | Type | Default | Description |
|---|---|---|---|
raft.address |
String | 127.0.0.1:8610 |
Raft service address for this PD node. Format: <ip>:<port>. Must be unique across all PD nodes. |
raft.peers-list |
String | 127.0.0.1:8610 |
Comma-separated list of all PD nodes' Raft addresses. Used for cluster formation and leader election. |
Critical Rules:
raft.addressmust be unique for each PD noderaft.peers-listmust be identical on all PD nodesraft.peers-listmust contain all PD nodes (including this node)- Use actual IP addresses, not
127.0.0.1, for multi-node clusters - Cluster size should be odd (3, 5, 7) for optimal Raft quorum
Example (3-node cluster):
# Node 1
raft:
address: 192.168.1.10:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
# Node 2
raft:
address: 192.168.1.11:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
# Node 3
raft:
address: 192.168.1.12:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610When deploying PD in Docker with bridge networking (e.g., docker/docker-compose-3pd-3store-3server.yml), container hostnames are used instead of IP addresses. Configuration is injected via HG_PD_* environment variables:
# pd0 — set via HG_PD_RAFT_ADDRESS and HG_PD_RAFT_PEERS_LIST env vars
raft:
address: pd0:8610
peers-list: pd0:8610,pd1:8610,pd2:8610
# pd1
raft:
address: pd1:8610
peers-list: pd0:8610,pd1:8610,pd2:8610
# pd2
raft:
address: pd2:8610
peers-list: pd0:8610,pd1:8610,pd2:8610The grpc.host must also use the container hostname (e.g., pd0) set via HG_PD_GRPC_HOST. Do not use 127.0.0.1 or 0.0.0.0 in bridge networking mode.
See docker/README.md for the full environment variable reference.
Controls PD-specific behavior.
pd:
data-path: ./pd_data # Metadata storage path
patrol-interval: 1800 # Partition rebalancing interval (seconds)
initial-store-count: 1 # Minimum stores for cluster availability
initial-store-list: 127.0.0.1:8500 # Auto-activated stores| Parameter | Type | Default | Description |
|---|---|---|---|
pd.data-path |
String | ./pd_data |
Directory for RocksDB metadata storage and Raft logs. Ensure sufficient disk space and fast I/O (SSD recommended). |
pd.patrol-interval |
Integer | 1800 |
Interval (in seconds) for partition health patrol and automatic rebalancing. Lower values = more frequent checks. |
pd.initial-store-count |
Integer | 1 |
Minimum number of Store nodes required for cluster to be operational. Set to expected initial store count. |
pd.initial-store-list |
String | 127.0.0.1:8500 |
Comma-separated list of Store gRPC addresses to auto-activate on startup. Useful for bootstrapping. |
Production Recommendations:
pd.data-path: Use dedicated SSD with at least 50GB free spacepd.patrol-interval:- Development:
300(5 minutes) for fast testing - Production:
1800(30 minutes) to reduce overhead - Large clusters:
3600(1 hour)
- Development:
pd.initial-store-count: Set to expected initial store count (e.g.,3for 3 stores)
Controls how PD monitors and manages Store nodes.
store:
max-down-time: 172800 # Store permanent failure threshold (seconds)
monitor_data_enabled: true # Enable metrics collection
monitor_data_interval: 1 minute # Metrics collection interval
monitor_data_retention: 1 day # Metrics retention period| Parameter | Type | Default | Description |
|---|---|---|---|
store.max-down-time |
Integer | 172800 |
Time (in seconds) after which a Store is considered permanently offline and its partitions are reallocated. Default: 48 hours. |
store.monitor_data_enabled |
Boolean | true |
Enable collection of Store metrics (CPU, memory, disk, partition count). |
store.monitor_data_interval |
Duration | 1 minute |
Interval for collecting Store metrics. Format: <value> <unit> (second, minute, hour). |
store.monitor_data_retention |
Duration | 1 day |
Retention period for historical metrics. Format: <value> <unit> (day, month, year). |
Production Recommendations:
store.max-down-time:- Development:
300(5 minutes) for fast failover testing - Production:
86400(24 hours) to avoid false positives during maintenance - Conservative:
172800(48 hours) for network instability
- Development:
store.monitor_data_interval:- High-frequency monitoring:
10 seconds - Standard:
1 minute - Low overhead:
5 minutes
- High-frequency monitoring:
store.monitor_data_retention:- Short-term:
1 day - Standard:
7 days - Long-term:
30 days(requires more disk space)
- Short-term:
Controls partition allocation and replication.
partition:
default-shard-count: 1 # Replicas per partition
store-max-shard-count: 12 # Max partitions per store| Parameter | Type | Default | Description |
|---|---|---|---|
partition.default-shard-count |
Integer | 1 |
Number of replicas per partition. Typically 3 in production for high availability. |
partition.store-max-shard-count |
Integer | 12 |
Maximum number of partition replicas a single Store can hold. Used for initial partition allocation. |
Initial Partition Count Calculation:
initial_partitions = (store_count * store_max_shard_count) / default_shard_count
Example:
- 3 stores,
store-max-shard-count=12,default-shard-count=3 - Initial partitions:
(3 * 12) / 3 = 12partitions - Each store hosts:
12 * 3 / 3 = 12shards (4 partitions as leader + 8 as follower)
Production Recommendations:
partition.default-shard-count:- Development/Testing:
1(no replication) - Production:
3(standard HA configuration) - Critical systems:
5(maximum fault tolerance)
- Development/Testing:
partition.store-max-shard-count:- Small deployment:
10-20 - Medium deployment:
50-100 - Large deployment:
200-500 - Limit based on Store disk capacity and expected data volume
- Small deployment:
Controls Spring Boot Actuator endpoints for monitoring.
management:
metrics:
export:
prometheus:
enabled: true # Enable Prometheus metrics export
endpoints:
web:
exposure:
include: "*" # Expose all actuator endpoints| Parameter | Type | Default | Description |
|---|---|---|---|
management.metrics.export.prometheus.enabled |
Boolean | true |
Enable Prometheus-compatible metrics at /actuator/prometheus. |
management.endpoints.web.exposure.include |
String | "*" |
Actuator endpoints to expose. "*" = all, or specify comma-separated list (e.g., "health,metrics"). |
Minimal configuration for local development.
grpc:
host: 127.0.0.1
port: 8686
server:
port: 8620
raft:
address: 127.0.0.1:8610
peers-list: 127.0.0.1:8610
pd:
data-path: ./pd_data
patrol-interval: 300 # Fast rebalancing for testing
initial-store-count: 1
initial-store-list: 127.0.0.1:8500
store:
max-down-time: 300 # Fast failover for testing
monitor_data_enabled: true
monitor_data_interval: 10 seconds
monitor_data_retention: 1 day
partition:
default-shard-count: 1 # No replication
store-max-shard-count: 10Characteristics:
- Single PD node (no HA)
- No replication (
default-shard-count=1) - Fast rebalancing for quick testing
- Suitable for development, not for production
Recommended configuration for production deployments.
grpc:
host: 192.168.1.10
port: 8686
server:
port: 8620
raft:
address: 192.168.1.10:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 1800
initial-store-count: 3
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500
store:
max-down-time: 86400 # 24 hours
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 7 days
partition:
default-shard-count: 3 # Triple replication
store-max-shard-count: 50grpc:
host: 192.168.1.11
port: 8686
server:
port: 8620
raft:
address: 192.168.1.11:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 1800
initial-store-count: 3
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500
store:
max-down-time: 86400
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 7 days
partition:
default-shard-count: 3
store-max-shard-count: 50grpc:
host: 192.168.1.12
port: 8686
server:
port: 8620
raft:
address: 192.168.1.12:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 1800
initial-store-count: 3
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500
store:
max-down-time: 86400
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 7 days
partition:
default-shard-count: 3
store-max-shard-count: 50Characteristics:
- 3 PD nodes for high availability
- Tolerates 1 PD node failure
- Triple replication (
default-shard-count=3) - 3 Store nodes specified in
initial-store-list - Standard monitoring and metrics collection
Network Requirements:
- Low latency (<5ms) between PD nodes for Raft
- Open ports: 8620 (REST), 8686 (gRPC), 8610 (Raft)
Configuration for mission-critical deployments requiring maximum fault tolerance.
# Node 1: 192.168.1.10
grpc:
host: 192.168.1.10
port: 8686
raft:
address: 192.168.1.10:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610,192.168.1.13:8610,192.168.1.14:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 3600 # Lower frequency for large clusters
initial-store-count: 5
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500,192.168.1.23:8500,192.168.1.24:8500
store:
max-down-time: 172800 # 48 hours (conservative)
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 30 days # Long-term retention
partition:
default-shard-count: 3 # Or 5 for extreme HA
store-max-shard-count: 100Characteristics:
- 5 PD nodes for maximum HA
- Tolerates 2 PD node failures
- 5 Store nodes for data distribution
- Lower patrol frequency to reduce overhead
- Long-term metrics retention (30 days)
JVM options are specified via the startup script (bin/start-hugegraph-pd.sh).
# Option 1: Via startup script flag
bin/start-hugegraph-pd.sh -j "-Xmx8g -Xms8g"
# Option 2: Edit start-hugegraph-pd.sh directly
JAVA_OPTIONS="-Xmx8g -Xms8g -XX:+UseG1GC"Recommendations by Cluster Size:
| Cluster Size | Partitions | Heap Size | Notes |
|---|---|---|---|
| Small (1-3 stores, <100 partitions) | <100 | -Xmx2g -Xms2g |
Development/testing |
| Medium (3-10 stores, 100-1000 partitions) | 100-1000 | -Xmx4g -Xms4g |
Standard production |
| Large (10-50 stores, 1000-10000 partitions) | 1000-10000 | -Xmx8g -Xms8g |
Large production |
| X-Large (50+ stores, 10000+ partitions) | 10000+ | -Xmx16g -Xms16g |
Enterprise scale |
Key Principles:
- Set
-Xmsequal to-Xmxto avoid heap resizing - Reserve at least 2GB for OS and off-heap memory
- Monitor GC pause times and adjust accordingly
G1GC (Default, Recommended):
bin/start-hugegraph-pd.sh -g g1 -j "-Xmx8g -Xms8g \
-XX:MaxGCPauseMillis=200 \
-XX:G1HeapRegionSize=16m \
-XX:InitiatingHeapOccupancyPercent=45"- MaxGCPauseMillis: Target GC pause time (200ms recommended)
- G1HeapRegionSize: Region size (16m for 8GB heap)
- InitiatingHeapOccupancyPercent: When to trigger concurrent GC (45% recommended)
ZGC (Low-Latency, Java 11+):
bin/start-hugegraph-pd.sh -g ZGC -j "-Xmx8g -Xms8g \
-XX:ZCollectionInterval=30"- Ultra-low pause times (<10ms)
- Recommended for latency-sensitive deployments
- Requires Java 11+ (Java 15+ for production)
-Xlog:gc*:file=logs/gc.log:time,uptime,level,tags:filecount=10,filesize=100MRaft parameters are typically sufficient with defaults, but can be tuned for specific scenarios.
Increase election timeout for high-latency networks.
Default: 1000ms (1 second)
Tuning (requires code changes in RaftEngine.java):
// In hg-pd-core/.../raft/RaftEngine.java
nodeOptions.setElectionTimeoutMs(3000); // 3 secondsWhen to Increase:
- Network latency >10ms between PD nodes
- Frequent false leader elections
- Cross-datacenter deployments
Control how often Raft snapshots are created.
Default: 3600 seconds (1 hour)
Tuning (in RaftEngine.java):
nodeOptions.setSnapshotIntervalSecs(7200); // 2 hoursRecommendations:
- Frequent snapshots (1800s): Faster recovery, more I/O overhead
- Infrequent snapshots (7200s): Less I/O, slower recovery
PD uses RocksDB for metadata storage. Optimize for your workload.
SSD Optimization (default, recommended):
- RocksDB uses default settings optimized for SSD
- No configuration changes needed
HDD Optimization (not recommended): If using HDD (not recommended for production):
// In MetadataRocksDBStore.java, customize RocksDB options
Options options = new Options()
.setCompactionStyle(CompactionStyle.LEVEL)
.setWriteBufferSize(64 * 1024 * 1024) // 64MB
.setMaxWriteBufferNumber(3)
.setLevelCompactionDynamicLevelBytes(true);Key Metrics to Monitor:
- Disk I/O utilization
- RocksDB write stalls
- Compaction backlog
For high-throughput scenarios, tune gRPC connection pool size.
Client-Side (in PDClient):
PDConfig config = PDConfig.builder()
.pdServers("192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686")
.maxChannels(5) // Number of gRPC channels per PD node
.build();Recommendations:
- Low traffic:
maxChannels=1 - Medium traffic:
maxChannels=3-5 - High traffic:
maxChannels=10+
Optimize OS-level TCP settings for low latency.
# Increase TCP buffer sizes
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
# Reduce TIME_WAIT connections
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=30| Metric | Threshold | Action |
|---|---|---|
| PD Leader Changes | >2 per hour | Investigate network stability, increase election timeout |
| Raft Log Lag | >1000 entries | Check follower disk I/O, network latency |
| Store Heartbeat Failures | >5% | Check Store node health, network connectivity |
| Partition Imbalance | >20% deviation | Reduce patrol-interval, check rebalancing logic |
| GC Pause Time | >500ms | Tune GC settings, increase heap size |
Disk Usage (pd.data-path) |
>80% | Clean up old snapshots, expand disk, increase monitor_data_retention |
scrape_configs:
- job_name: 'hugegraph-pd'
static_configs:
- targets:
- '192.168.1.10:8620'
- '192.168.1.11:8620'
- '192.168.1.12:8620'
metrics_path: '/actuator/prometheus'
scrape_interval: 15sKey panels to create:
- PD Cluster Status: Leader, follower count, Raft state
- Store Health: Online/offline stores, heartbeat success rate
- Partition Distribution: Partitions per store, leader distribution
- Performance: QPS, latency (p50, p95, p99)
- System Resources: CPU, memory, disk I/O, network
Located at conf/log4j2.xml.
<Loggers>
<!-- PD application logs -->
<Logger name="org.apache.hugegraph.pd" level="INFO"/>
<!-- Raft consensus logs (verbose, set to WARN in production) -->
<Logger name="com.alipay.sofa.jraft" level="WARN"/>
<!-- RocksDB logs -->
<Logger name="org.rocksdb" level="WARN"/>
<!-- gRPC logs -->
<Logger name="io.grpc" level="WARN"/>
<!-- Root logger -->
<Root level="INFO">
<AppenderRef ref="RollingFile"/>
<AppenderRef ref="Console"/>
</Root>
</Loggers>Recommendations:
- Development: Set PD logger to
DEBUGfor detailed tracing - Production: Use
INFO(default) orWARNfor lower overhead - Troubleshooting: Temporarily set specific package to
DEBUG
<RollingFile name="RollingFile" fileName="logs/hugegraph-pd.log"
filePattern="logs/hugegraph-pd-%d{yyyy-MM-dd}-%i.log.gz">
<PatternLayout>
<Pattern>%d{ISO8601} [%t] %-5level %logger{36} - %msg%n</Pattern>
</PatternLayout>
<Policies>
<TimeBasedTriggeringPolicy interval="1" modulate="true"/>
<SizeBasedTriggeringPolicy size="100 MB"/>
</Policies>
<DefaultRolloverStrategy max="30"/>
</RollingFile>Configuration:
- Size: Rotate when log file reaches 100MB
- Time: Rotate daily
- Retention: Keep last 30 log files
curl http://localhost:8620/actuator/healthResponse (healthy):
{
"status": "UP"
}curl http://localhost:8620/actuator/metricsAvailable Metrics:
pd.raft.state: Raft state (0=Follower, 1=Candidate, 2=Leader)pd.store.count: Number of stores by statepd.partition.count: Total partitionsjvm.memory.used: JVM memory usagejvm.gc.pause: GC pause times
curl http://localhost:8620/actuator/prometheusSample Output:
# HELP pd_raft_state Raft state
# TYPE pd_raft_state gauge
pd_raft_state 2.0
# HELP pd_store_count Store count by state
# TYPE pd_store_count gauge
pd_store_count{state="Up"} 3.0
pd_store_count{state="Offline"} 0.0
# HELP pd_partition_count Total partitions
# TYPE pd_partition_count gauge
pd_partition_count 36.0
-
grpc.hostset to actual IP address or container hostname (not127.0.0.1). For Docker bridge networking use container hostname viaHG_PD_GRPC_HOSTenv var. -
raft.addressunique for each PD node -
raft.peers-listidentical on all PD nodes -
raft.peers-listcontains all PD node addresses -
pd.data-pathhas sufficient disk space (>50GB) -
pd.initial-store-countmatches expected store count -
partition.default-shard-count= 3 (for production HA) - Ports accessible from Store/Server nodes (8620, 8686, 8610)
- NTP synchronized across all nodes
# Check Raft configuration
grep -A2 "^raft:" conf/application.yml
# Verify peers list on all nodes
for node in 192.168.1.{10,11,12}; do
echo "Node $node:"
ssh $node "grep peers-list /path/to/conf/application.yml"
done
# Check port accessibility
nc -zv 192.168.1.10 8620 8686 8610Key configuration guidelines:
- Single-node: Use defaults with
127.0.0.1addresses - 3-node cluster: Standard production setup with triple replication
- 5-node cluster: Maximum HA with increased fault tolerance
- JVM tuning: Allocate 4-8GB heap for typical production deployments
- Monitoring: Enable Prometheus metrics and create Grafana dashboards
For architecture details, see Architecture Documentation.
For API usage, see API Reference.
For development, see Development Guide.