Automation tools

Browser automation and desktop control let tota interact with websites and your local system.

#Browser Automation

tota supports three browser engines via Playwright — Chromium (default), Firefox, and WebKit (Safari-compatible). The browser opens as a visible window on your desktop by default.

ToolDescription
browser_openOpen a URL in the browser; returns page title, URL, and visible text
browser_clickClick an element by CSS selector or visible text
browser_typeType text into an input field (click-to-focus + fill; SPA-safe for Google, Gmail, etc.)
browser_keyPress a keyboard key — Enter, Tab, Escape, ArrowDown, combos like Control+a
browser_waitWait for a CSS selector to appear or for page navigation to complete
browser_screenshotTake a full-page or element screenshot and send it to the user
browser_extractExtract text or attribute values from elements matching a CSS selector
browser_scrollScroll the page up, down, to the top, or to the bottom
browser_closeClose the browser session and free resources
browser_engineSwitch the active browser engine (chromium / firefox / webkit)

Install browser binaries once:

npx playwright install chromium firefox webkit
# or via the wizard:
tota setup browser

Set BROWSER_ENGINE=firefox (or webkit) in ~/.tota/.env to change the default engine. Set PLAYWRIGHT_HEADLESS=true to run without a visible window (CI/server environments).

#Computer-Use (Desktop Control)

Enable with COMPUTER_USE_ENABLED=true in ~/.tota/.env (or capabilities.computer.enabled: true in tota.yaml).

#Desktop

ToolDescription
computer_screenshotCapture the full screen or a region; save to a temp file
computer_seeScreenshot + immediate vision AI analysis — understand what's on screen before acting
computer_clickLeft, right, or double-click at pixel coordinates
computer_moveMove the cursor to coordinates
computer_typeType text at the current keyboard focus
computer_keyPress a key or key combo (cmd+c, ctrl+z, enter, tab, escape, arrow keys, etc.)
computer_scrollScroll up, down, left, or right at a position
computer_dragClick and drag between two screen coordinates
computer_screen_sizeGet the primary display resolution

Mouse/keyboard control uses @nut-tree-fork/nut-js (cross-platform native module). On Linux, install libxtst-dev first:

sudo apt install libxtst-dev

#Android (ADB)

Android tools work via adb in your PATH — no additional Node.js packages needed.

ToolDescription
adb_devicesList connected Android devices
adb_screenshotTake a screenshot of an Android device
adb_seeAndroid screenshot + vision AI analysis
adb_tapTap at pixel coordinates on the device
adb_swipeSwipe between two coordinates (with optional duration)
adb_typeType text into the focused field on the device
adb_keySend an Android key event (3=HOME, 4=BACK, 26=POWER, 66=ENTER…)
adb_shellRun any adb shell command on the device
adb_pullPull a file from the device to the local machine
adb_pushPush a local file to the device