Skip to content

Leshyan/DroidNode

Repository files navigation

DroidNode

中文文档 (Chinese)

DroidNode Logo

DroidNode is a lightweight runtime on Android. It API-enables low-level mobile control capabilities via wireless debugging (ADB over Wi-Fi), turning a phone into an automation node that can be remotely controlled through standard HTTP requests.

Unlike traditional automation frameworks, DroidNode focuses on the infrastructure layer. It is not tied to any specific AI model or workflow, and provides stable, standardized primitive operation APIs for LLM/VLM agents, automation testing, and remote device management.

⚠️ Disclaimer & Security Warning

This project is currently in an AI PoC stage and has not been production-hardened or security-hardened.

  • No authentication: the current version does not enforce token verification, and any reachable host in the same LAN may control the device.
  • Public network risk: never expose the API port directly to the public internet.
  • Legal responsibility: users must ensure usage complies with local laws and platform policies. The authors are not liable for any loss caused by abuse.

🚀 Core Features

  • Fully embedded ADB communication: inspired by Shizuku, uses mDNS for automatic endpoint discovery and supports completing wireless pairing and connection directly on-device, without relying on external PC-side runtime during operation.
  • Embedded API Server: built on Ktor, listening on port 17171 by default.
  • Primitive controls: standardized APIs for click, swipe, text input, UI tree capture (XML), and screenshots.
  • Native input enhancement: built-in ActlImeService IME supports UTF-8 injection and improves remote input reliability for focus/text scenarios.
  • Modular architecture: APIs are auto-registered by a KSP-generated registry, and route paths are derived from package paths (e.g. com.actl.mvp.api.v1.control.click -> /v1/control/click).

🛠 Quick Start

1. Environment

  • Android 11+ (wireless debugging required).
  • Phone and controller must be on the same Wi-Fi network.

2. Build and Install

# Clone repository
git clone https://github.com/your-username/droidnode.git
cd droidnode

# Build with local Gradle
./gradlew :app:assembleDebug

# Install to device
adb install app/build/outputs/apk/debug/app-debug.apk

3. Activate Node

  1. Launch DroidNode App on the phone.
  2. Go to Developer Options -> Wireless debugging -> Pair device with pairing code.
  3. Enter the pairing code in the DroidNode notification action to complete local ADB authorization.
  4. Tap Start API Server.

📖 API Overview

Default service endpoint: http://<device-ip>:17171

Path Method Description
/v1/health GET Check node liveness
/v1/system/info GET Get device hardware and Android version info
/v1/control/click POST Tap, payload: {"x": int, "y": int}
/v1/control/swipe POST Swipe, payload: {"startX": int, "startY": int, "endX": int, "endY": int, "durationMs": int}
/v1/control/input POST Text input, payload: {"text": "...", "pressEnter": bool, "enterAction": "auto/search/send/done/go/next/enter/none"}
/v1/ui/xml POST Get current page UI hierarchy (XML)
/v1/ui/screenshot POST Get screenshot (PNG binary stream)

Tip: for detailed API invocation examples, see tests/api_tester.sh.


📊 API Performance Snapshot

Latest benchmark run (2026-02-13):

  • Target: http://192.168.0.105:17175
  • Config: warmup=2, samples=20, timeout=30s
  • Result: 140/140 successful requests (100%)
API Method Mean (ms) P95 (ms)
/v1/health GET 48.35 65.26
/v1/system/info GET 113.95 125.19
/v1/control/click POST 101.06 185.11
/v1/control/swipe POST 211.42 224.06
/v1/control/input POST 807.88 908.69
/v1/ui/xml POST 2234.79 2272.72
/v1/ui/screenshot POST 414.51 450.69

Detailed report: docs/API_PERFORMANCE_REPORT.md


📂 Project Layout

.
├── api-registry-ksp/  # KSP processor that generates the API registry
├── app/src/main/java/com/actl/mvp/
│   ├── api/
│   │   ├── framework/  # ApiDefinition, server wiring, KSP registry bridge, path resolver
│   │   └── v1/         # Versioned API implementations by route path
│   ├── adb/
│   │   ├── core/       # Pure Kotlin ADB protocol, pairing, transport internals
│   │   ├── session/    # Shared DirectAdbManager runtime and command result models
│   │   └── discovery/  # mDNS discovery and endpoint state models
│   ├── startup/
│   │   └── service/    # Foreground services for pairing and API server lifecycle
│   └── ime/            # Custom input method service
├── clients/            # External tools / addons , not core app runtime
│   └── llm-controller/ # Demo controller that calls DroidNode APIs
├── tests/              # API/performance/build test scripts (Python/Shell)
├── tools/              # Local utility scripts (agent/assets), not core runtime
└── LICENSE             # Open-source license


🔌 Clients (External Addons)

clients/ is the workspace for external tooling built on top of DroidNode APIs.
These tools are optional clients, not part of the Android node runtime itself.

Current demo: clients/llm-controller

Controller principles:

  • API-driven closed loop: repeatedly captures screen via /v1/ui/screenshot, decides next action, then executes control APIs.
  • Adaptive grid planning: each screenshot is partitioned into an m*n grid (m,n >= 3) with near-square cells and minimal total cells.
  • Two-stage localization:
    • Stage-1 model outputs action + target + region(s).
    • Stage-2 model predicts normalized offsets inside the selected crop, then maps back to absolute screen coordinates.
  • Multi-region merge: if a target spans multiple regions, regions are expanded to a minimal covering rectangle and cropped once for Stage-2.
  • Persistent memory: step events are stored in SQLite for short-term task context and replay/debug analysis.

🖼 UI Preview

The screenshots below are captured from a real running build. They show the current MVP UI, including the main control page and the debug page (mDNS discovery, API port configuration, and logs).

DroidNode Main Page DroidNode Debug Page


🤝 Roadmap

  • Standardized design
  • Code style and infrastructure setup
  • App-adaptive semantic click APIs
  • Performance optimizations for screenshot and related APIs
  • Token-based request authentication
  • Action Anthropomorphization
  • Native virtual LAN integration based on ZeroTier

Issues and Pull Requests are welcome!

About

DroidNode is a lightweight runtime that runs on the Android device side. It leverages wireless debugging capabilities (ADB over Wi-Fi) to expose low-level device control capabilities as APIs, allowing the device to be remotely invoked over HTTP within a local network.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors