๐Ÿ†Gemini Live Agent Challenge

Your AI Copilot for
Any ERP System.

Just speak. EFRION sees your screen, reads the interface, and navigates for you โ€” powered by Gemini 2.5 Live. No mouse. No training. No friction.

Real-time voice & visionChrome Extension (MV3)Gemini 2.5 multimodal
EFRION ERP โ€” Invoices
Dashboard
Invoices
Reports

Create New Invoice

Vendor
Amazon Web Services
Amount ($)
1,500.00
Notes
Submit Invoice
AI Online
๐Ÿค– Filling amount fieldโ€ฆ
How It Works

Three steps. Zero friction.

EFRION bridges the gap between human intent and ERP execution through a continuous loop of hearing, seeing, and acting.

01

Speak Naturally

No commands to memorize. Just say what you want โ€” "Create an invoice for AWS for $1,500" โ€” and EFRION starts listening in real time.

16kHz PCM audio stream via AudioWorklet
02

AI Sees & Reasons

Gemini 2.5 Live simultaneously processes your voice, live screenshots, and a real-time accessibility tree to understand context with precision.

Screenshot + DOM accessibility tree diff
03

Action Taken

A ghost cursor flies to the target, clicks, types, scrolls, and completes your workflow. The animated HUD keeps you informed at every step.

Ghost cursor ยท click ยท type ยท scroll ยท verify
Features

Built for the real world.

Every feature was designed around the actual challenges of navigating enterprise ERP interfaces โ€” not just demos.

๐ŸŽ™๏ธCore

Voice-First Interface

Continuous 16kHz PCM audio stream with push-to-talk. Natural language โ€” no rigid commands. Supports mid-sentence barge-in for instant redirection.

๐Ÿ‘๏ธCore

Visual + DOM Fusion

Combines live screenshots with a real-time simplified accessibility tree. Gemini sees both pixels and semantics for pixel-perfect element targeting.

๐Ÿ–ฑ๏ธUX

Ghost Cursor Animation

An animated cursor visually flies to the target element before clicking, so users always know exactly what the AI is about to do. Builds instant trust.

๐Ÿ“‹UX

Multi-Step Planning HUD

Before acting, EFRION announces its plan. A floating HUD displays each step with real-time progress tracking โ€” full transparency, no black-box behavior.

๐Ÿ•ต๏ธSmart

Dead-End Detection

After every click, EFRION verifies the DOM actually changed. If it didn't, it automatically reports the failure and self-corrects with a new strategy.

โ†ฉ๏ธSmart

AI Undo System

Say "undo" and EFRION reverses the last action โ€” restoring typed values or re-clicking toggles. Keeps an in-memory stack of the last 10 actions.

๐Ÿ”’Safety

Safety Lock

Optional confirmation mode for high-stakes actions. EFRION pauses and asks before submitting, deleting, or navigating away. Confirm by voice or button.

โ™ป๏ธReliability

Cross-Page Persistence

Session state, action plan, and history survive page reloads and navigation. Multi-step workflows across different ERP pages work seamlessly.

Architecture

Real-time. Event-driven.

A clean separation of concerns across four layers, connected by a bidirectional WebSocket for sub-second latency.

User
๐ŸŽ™๏ธ
Voice Input
16kHz PCM
๐Ÿ–ฅ๏ธ
ERP Screen
Any web ERP
Chrome Extension
๐Ÿ“ธ
Screen Capture
1.5s interval
๐ŸŒณ
Accessibility Tree
DOM diff
๐ŸŽš๏ธ
AudioWorklet
PCM streaming
FastAPI Backend
๐Ÿ”Œ
WebSocket Server
Bidirectional
๐Ÿง 
Tool Orchestration
6 actions
๐Ÿ”’
Safety Lock
Optional
Gemini 2.5 Live
๐Ÿ‘๏ธ
Vision
Screenshots
๐Ÿ—ฃ๏ธ
Audio I/O
Native TTS
โšก
Function Calling
Real-time
AI Actions โ€” Tool Calls executed on the page
๐Ÿ–ฑ๏ธclick_element
โŒจ๏ธtype_text
๐Ÿ“œscroll_page
๐Ÿ”—navigate_to
๐Ÿ“–read_text
โœจhighlight_element
Bidirectional WebSocket at ws://localhost:8000/ws
Live Demo

See it in action.

These are real commands you can speak to EFRION. Watch it decompose intent into precise UI actions.

๐Ÿ‘ค

โ€œCreate an invoice for Amazon Web Services for $1,500 due next Friday.โ€

๐Ÿค–

Executing Plan (5 steps)

  1. 1Navigate to Invoices tab
  2. 2Select 'Amazon Web Services' in vendor dropdown
  3. 3Type '1500' in the Amount field
  4. 4Set due date to next Friday
  5. 5Click Submit Invoice
Invoice #INV-994 created and submitted โœ“

Built with

Gemini 2.5 Live
Multimodal AI Engine
Chrome Extension
Manifest V3
FastAPI
Python Backend
WebSocket
Bidirectional Real-time
AudioWorklet
16kHz PCM Streaming
TypeScript
Extension + Website
Next.js 16
Promo Website
Tailwind CSS v4
Styling