DroidNode is a lightweight runtime on Android. It API-enables low-level mobile control capabilities via wireless debugging (ADB over Wi-Fi), turning a phone into an automation node that can be remotely controlled through standard HTTP requests.
Unlike traditional automation frameworks, DroidNode focuses on the infrastructure layer. It is not tied to any specific AI model or workflow, and provides stable, standardized primitive operation APIs for LLM/VLM agents, automation testing, and remote device management.
This project is currently in an AI PoC stage and has not been production-hardened or security-hardened.
- No authentication: the current version does not enforce token verification, and any reachable host in the same LAN may control the device.
- Public network risk: never expose the API port directly to the public internet.
- Legal responsibility: users must ensure usage complies with local laws and platform policies. The authors are not liable for any loss caused by abuse.
- Fully embedded ADB communication: inspired by Shizuku, uses
mDNSfor automatic endpoint discovery and supports completing wireless pairing and connection directly on-device, without relying on external PC-side runtime during operation. - Embedded API Server: built on Ktor, listening on port
17171by default. - Primitive controls: standardized APIs for click, swipe, text input, UI tree capture (XML), and screenshots.
- Native input enhancement: built-in
ActlImeServiceIME supports UTF-8 injection and improves remote input reliability for focus/text scenarios. - Modular architecture: APIs are auto-registered by a KSP-generated registry, and route paths are derived from package paths (e.g.
com.actl.mvp.api.v1.control.click -> /v1/control/click).
- Android 11+ (wireless debugging required).
- Phone and controller must be on the same Wi-Fi network.
# Clone repository
git clone https://github.com/your-username/droidnode.git
cd droidnode
# Build with local Gradle
./gradlew :app:assembleDebug
# Install to device
adb install app/build/outputs/apk/debug/app-debug.apk
- Launch DroidNode App on the phone.
- Go to Developer Options -> Wireless debugging -> Pair device with pairing code.
- Enter the pairing code in the DroidNode notification action to complete local ADB authorization.
- Tap Start API Server.
Default service endpoint: http://<device-ip>:17171
| Path | Method | Description |
|---|---|---|
/v1/health |
GET |
Check node liveness |
/v1/system/info |
GET |
Get device hardware and Android version info |
/v1/control/click |
POST |
Tap, payload: {"x": int, "y": int} |
/v1/control/swipe |
POST |
Swipe, payload: {"startX": int, "startY": int, "endX": int, "endY": int, "durationMs": int} |
/v1/control/input |
POST |
Text input, payload: {"text": "...", "pressEnter": bool, "enterAction": "auto/search/send/done/go/next/enter/none"} |
/v1/ui/xml |
POST |
Get current page UI hierarchy (XML) |
/v1/ui/screenshot |
POST |
Get screenshot (PNG binary stream) |
Tip: for detailed API invocation examples, see tests/api_tester.sh.
Latest benchmark run (2026-02-13):
- Target:
http://192.168.0.105:17175 - Config: warmup=2, samples=20, timeout=30s
- Result: 140/140 successful requests (100%)
| API | Method | Mean (ms) | P95 (ms) |
|---|---|---|---|
/v1/health |
GET |
48.35 | 65.26 |
/v1/system/info |
GET |
113.95 | 125.19 |
/v1/control/click |
POST |
101.06 | 185.11 |
/v1/control/swipe |
POST |
211.42 | 224.06 |
/v1/control/input |
POST |
807.88 | 908.69 |
/v1/ui/xml |
POST |
2234.79 | 2272.72 |
/v1/ui/screenshot |
POST |
414.51 | 450.69 |
Detailed report: docs/API_PERFORMANCE_REPORT.md
.
├── api-registry-ksp/ # KSP processor that generates the API registry
├── app/src/main/java/com/actl/mvp/
│ ├── api/
│ │ ├── framework/ # ApiDefinition, server wiring, KSP registry bridge, path resolver
│ │ └── v1/ # Versioned API implementations by route path
│ ├── adb/
│ │ ├── core/ # Pure Kotlin ADB protocol, pairing, transport internals
│ │ ├── session/ # Shared DirectAdbManager runtime and command result models
│ │ └── discovery/ # mDNS discovery and endpoint state models
│ ├── startup/
│ │ └── service/ # Foreground services for pairing and API server lifecycle
│ └── ime/ # Custom input method service
├── clients/ # External tools / addons , not core app runtime
│ └── llm-controller/ # Demo controller that calls DroidNode APIs
├── tests/ # API/performance/build test scripts (Python/Shell)
├── tools/ # Local utility scripts (agent/assets), not core runtime
└── LICENSE # Open-source license
clients/ is the workspace for external tooling built on top of DroidNode APIs.
These tools are optional clients, not part of the Android node runtime itself.
Current demo: clients/llm-controller
Controller principles:
- API-driven closed loop: repeatedly captures screen via
/v1/ui/screenshot, decides next action, then executes control APIs. - Adaptive grid planning: each screenshot is partitioned into an
m*ngrid (m,n >= 3) with near-square cells and minimal total cells. - Two-stage localization:
- Stage-1 model outputs action + target + region(s).
- Stage-2 model predicts normalized offsets inside the selected crop, then maps back to absolute screen coordinates.
- Multi-region merge: if a target spans multiple regions, regions are expanded to a minimal covering rectangle and cropped once for Stage-2.
- Persistent memory: step events are stored in SQLite for short-term task context and replay/debug analysis.
The screenshots below are captured from a real running build. They show the current MVP UI, including the main control page and the debug page (mDNS discovery, API port configuration, and logs).
- Standardized design
- Code style and infrastructure setup
- App-adaptive semantic click APIs
- Performance optimizations for screenshot and related APIs
- Token-based request authentication
- Action Anthropomorphization
- Native virtual LAN integration based on ZeroTier
Issues and Pull Requests are welcome!

