Urban Ground Full Stack Engineer Assessment
Built by Anurag Dubey
A production-ready voice-controlled task manager where every action โ create, read, update, delete โ happens through natural conversation. No buttons. No typing. Just speak.
Frontend: https://taskora-dusky.vercel.app
Backend API: https://taskora-0kde.onrender.com
โ ๏ธ Note for Reviewers: The live demo uses a shared Gemini API key which may be exhausted. For guaranteed functionality, clone the repo and add your ownGEMINI_API_KEYin the backend.envfile. Setup takes under 5 minutes โ instructions below.
Taskora is a full-stack AI voice agent. You tap the mic, speak naturally, and the assistant understands, acts, and confirms โ all through voice.
- "Create a task for gym at 7 AM tomorrow"
- "What are my evening tasks today?"
- "Move the LinkedIn post to 6 PM"
- "Delete the gym task" โ confirms before deleting
- "Actually change the previous one to 8 PM" โ uses conversation memory
- Speech-to-Text via Web Speech API (Chrome/Edge)
- Text-to-Speech via Google Cloud Neural TTS (deep male voice) with browser TTS fallback
- Continuous listening loop โ automatically resumes after AI speaks
- Interruption handling โ tap mic while AI is speaking to stop it instantly and redirect
- Gemini 2.5 Flash with function calling for intent detection and tool execution
- Full conversation history maintained across turns
- Context-aware references โ "the previous one", "the second task", "that" all resolve correctly
- Semantic search โ "evening workout" matches "gym session"
- Multi-task creation in a single utterance
- Create tasks with name, time, date, and details
- Read tasks โ summarized conversationally, filtered by time of day
- Update any field โ time, date, name, status
- Delete with mandatory confirmation โ never deletes without explicit "yes"
- Time-range filtering โ morning / afternoon / evening / night / today / tomorrow
- Task cards animate on create (green glow), update (blue pulse), delete (red fade-out)
- Live voice panel with animated wave bars while listening
- Thinking dots while AI processes
- Confirm card overlay for voice-triggered actions
- Fully responsive โ mobile and desktop
- JWT-based signup and login
- Persisted session via localStorage (Zustand)
- All task and chat endpoints protected by auth middleware
- Per-user task isolation โ users only see their own tasks
| Layer | Technology |
|---|---|
| Frontend | React 18, Vite, Tailwind CSS |
| State | Zustand |
| Routing | React Router v6 |
| Voice STT | Web Speech API |
| Voice TTS | Google Cloud Neural TTS + browser fallback |
| AI | Google Gemini 2.5 Flash (function calling) |
| Backend | Node.js, Express |
| Database | MongoDB Atlas (Mongoose) |
| Auth | JWT + bcryptjs |
| Deployment | Vercel (frontend) + Render (backend) |
taskora/
โโโ backend/
โ โโโ config/
โ โ โโโ db.js # MongoDB connection
โ โโโ controllers/
โ โ โโโ authController.js # Signup, login, me
โ โโโ middleware/
โ โ โโโ authMiddleware.js # JWT verification
โ โโโ models/
โ โ โโโ taskModel.js # Task schema (user, name, time, date, status)
โ โ โโโ User.js # User schema with bcrypt hashing
โ โโโ routes/
โ โ โโโ authRoutes.js # /api/auth/signup, /login, /me
โ โโโ services/
โ โ โโโ gemini.js # Gemini AI, system prompt, tool calling
โ โโโ taskTools.js # createTask, getAllTasks, updateTask, deleteTask, findTasksByName, getTasksByTimeRange
โ โโโ server.js # Express app, /api/chat, /api/tasks, /api/tts
โ
โโโ frontend/
โโโ src/
โโโ components/
โ โโโ ConfirmCard.jsx # Action confirmation overlay
โ โโโ Hero.jsx # Landing header
โ โโโ Navbar.jsx # Logo + logout
โ โโโ ProtectedRoute.jsx # Auth guard
โ โโโ TaskCard.jsx # Task with glow animations
โ โโโ TaskGrid.jsx # Responsive task grid
โ โโโ VoicePanel.jsx # Mic button + transcript panel
โโโ config/
โ โโโ api.js # API_URL from env
โโโ hooks/
โ โโโ useVoiceAgent.js # Core voice loop, STT, TTS, chat logic
โโโ pages/
โ โโโ Home.jsx # Main app page
โ โโโ Login.jsx # Login form
โ โโโ Signup.jsx # Signup form
โโโ store/
โ โโโ authStore.js # Zustand auth (user, token, persist)
โ โโโ taskStore.js # Zustand tasks (CRUD, animations, confirmCard)
โโโ App.jsx # Routes
โโโ main.jsx # Entry point
- Node.js 18+
- MongoDB Atlas URI
- Google Gemini API key
- Google Cloud TTS API key (optional โ browser TTS fallback included)
git clone https://github.com/AnuragDubey007/Taskora.git
cd taskoracd backend
npm installCreate backend/.env:
PORT=3001
MONGO_URI=your_mongodb_atlas_uri
GEMINI_API_KEY=your_gemini_api_key
JWT_SECRET=your_jwt_secret_stringStart the backend:
npm run devcd frontend
npm installCreate frontend/.env:
VITE_API_URL=https://taskora-0kde.onrender.comStart the frontend:
npm run devOpen http://localhost:5173
User taps mic
โ
Web Speech API (STT) captures speech
โ
Text sent to /api/chat with conversation history
โ
Gemini 2.5 Flash decides which tool to call
โ
Tool executes against MongoDB (create / read / update / delete)
โ
Gemini generates a natural spoken reply
โ
Google Cloud TTS (or browser fallback) plays the audio
โ
UI animates the result (glow on card, confirm overlay)
โ
Listening resumes automatically
The useVoiceAgent hook uses a speakSessionRef counter. Every speakText call captures a session ID. All audio callbacks (onended, onerror, play rejection) check this ID before acting โ so if the user interrupts mid-speech, the stale audio callback cannot trigger a second listen cycle. A fallbackCalled guard additionally prevents both audio.onerror and audio.play().catch from triggering the browser TTS fallback simultaneously.
Create
"Schedule a team sync at 9 AM tomorrow"
Taskora: "Done. Team sync added for tomorrow at 9 AM."
Context reference
"Actually move that to 10 AM"
Taskora: "Updated. Team sync is now at 10 AM."
Delete with confirmation
"Delete the gym task"
Taskora: "I found Gym Session. Should I delete it?"
"Yes"
Taskora: "Done. Gym Session has been removed."
Time-range query
"What are my evening tasks today?"
Taskora: "You have a product sync at 6 PM and a LinkedIn post at 8 PM."
Multi-task
"Create three tasks โ gym at 7, standup at 9, and lunch with client at 1 PM"
Taskora: "All three tasks have been created."
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| GET | /api/health | No | Server health check |
| POST | /api/auth/signup | No | Register new user |
| POST | /api/auth/login | No | Login, returns JWT |
| GET | /api/auth/me | Yes | Get current user |
| GET | /api/tasks | Yes | Get all tasks for user |
| POST | /api/chat | Yes | Send voice message, get AI reply |
| POST | /api/tts | No | Convert text to speech audio |
- Push backend to GitHub
- Create a new Web Service on Render
- Set build command:
npm install - Set start command:
node server.js - Add all environment variables from
.env
- Push frontend to GitHub
- Import project on Vercel
- Set environment variable:
VITE_API_URL=https://your-render-url.onrender.com - Deploy
Why Gemini 2.5 Flash? Function calling support with low latency. The model reliably maps natural language to structured tool calls (create_task, update_task etc.) without hallucinating extra actions.
Why browser TTS fallback? Google Cloud TTS returns high-quality MP3 audio but requires a billing-enabled API key. The browser fallback ensures the app works for evaluators without any API key configuration.
Why Zustand over Redux? Minimal boilerplate for this scope. The task store, auth store, and animation state (deletingId, newTaskId, confirmCard) are clean and co-located without reducers or actions files.
Why speakSessionRef instead of a simple boolean? A boolean can't distinguish between "the previous session stopped" and "this session was never started." The incrementing integer cleanly invalidates all callbacks from any prior call, even if multiple are in flight simultaneously.
Built for the Urban Ground Full Stack Engineer Assessment ยท June 2026