Automation tools
Browser automation and desktop control let tota interact with websites and your local system.
#Browser Automation
tota supports three browser engines via Playwright — Chromium (default), Firefox, and WebKit (Safari-compatible). The browser opens as a visible window on your desktop by default.
| Tool | Description |
|---|---|
browser_open | Open a URL in the browser; returns page title, URL, and visible text |
browser_click | Click an element by CSS selector or visible text |
browser_type | Type text into an input field (click-to-focus + fill; SPA-safe for Google, Gmail, etc.) |
browser_key | Press a keyboard key — Enter, Tab, Escape, ArrowDown, combos like Control+a |
browser_wait | Wait for a CSS selector to appear or for page navigation to complete |
browser_screenshot | Take a full-page or element screenshot and send it to the user |
browser_extract | Extract text or attribute values from elements matching a CSS selector |
browser_scroll | Scroll the page up, down, to the top, or to the bottom |
browser_close | Close the browser session and free resources |
browser_engine | Switch the active browser engine (chromium / firefox / webkit) |
Install browser binaries once:
npx playwright install chromium firefox webkit
# or via the wizard:
tota setup browser
Set BROWSER_ENGINE=firefox (or webkit) in ~/.tota/.env to change the default engine. Set PLAYWRIGHT_HEADLESS=true to run without a visible window (CI/server environments).
#Computer-Use (Desktop Control)
Enable with COMPUTER_USE_ENABLED=true in ~/.tota/.env (or capabilities.computer.enabled: true in tota.yaml).
#Desktop
| Tool | Description |
|---|---|
computer_screenshot | Capture the full screen or a region; save to a temp file |
computer_see | Screenshot + immediate vision AI analysis — understand what's on screen before acting |
computer_click | Left, right, or double-click at pixel coordinates |
computer_move | Move the cursor to coordinates |
computer_type | Type text at the current keyboard focus |
computer_key | Press a key or key combo (cmd+c, ctrl+z, enter, tab, escape, arrow keys, etc.) |
computer_scroll | Scroll up, down, left, or right at a position |
computer_drag | Click and drag between two screen coordinates |
computer_screen_size | Get the primary display resolution |
Mouse/keyboard control uses @nut-tree-fork/nut-js (cross-platform native module). On Linux, install libxtst-dev first:
sudo apt install libxtst-dev
#Android (ADB)
Android tools work via adb in your PATH — no additional Node.js packages needed.
| Tool | Description |
|---|---|
adb_devices | List connected Android devices |
adb_screenshot | Take a screenshot of an Android device |
adb_see | Android screenshot + vision AI analysis |
adb_tap | Tap at pixel coordinates on the device |
adb_swipe | Swipe between two coordinates (with optional duration) |
adb_type | Type text into the focused field on the device |
adb_key | Send an Android key event (3=HOME, 4=BACK, 26=POWER, 66=ENTER…) |
adb_shell | Run any adb shell command on the device |
adb_pull | Pull a file from the device to the local machine |
adb_push | Push a local file to the device |
