Playwright for browser automation
Playwright is the top pick for browser automation, and the reason is structural: Microsoft's official server drives Chromium, Firefox, and WebKit and exposes each page as an accessibility snapshot rather than a screenshot. That makes interactions deterministic, an agent acts on named elements instead of guessing at pixel coordinates, and it keeps token cost far lower than image-based control.
It runs locally, so the browser is free and self-driven with nothing to provision. For the common automation jobs (log into a site, click through a flow, scrape a page that only renders after JavaScript) this is the default, and the two alternatives mostly matter when local execution is the wrong constraint.
How Playwright fits
The working set is broad: browser_navigate and browser_navigate_back move through pages, browser_snapshot reads the current page as an accessibility tree, and browser_click, browser_hover, browser_type, browser_press_key, browser_fill_form, and browser_select_option cover the full range of interactions a flow needs. browser_file_upload handles file inputs, while browser_drag and browser_drop cover drag-and-drop, including dropping external files onto the page.
The honest limit is the flip side of running locally: you supply the machine, the browsers, and any scaling, so high-concurrency or geographically distributed runs become your problem. That is exactly where Browserbase fits better, a managed cloud browser with nothing to run, billed per session, which suits headless work at scale or in CI without local infrastructure. Exa is not a browser at all; for scraping jobs that are really content retrieval, its search-and-extract approach is simpler than driving a page. Choose Playwright when the automation is interactive and you want it local and free; reach for the others when scale or pure extraction is the real need.
Tools you would use
| Tool | What it does |
|---|---|
| browser_navigate | Navigates the browser to a URL. |
| browser_navigate_back | Goes back to the previous page in the history. |
| browser_snapshot | Captures an accessibility snapshot of the current page, which is better than a screenshot for taking actions. |
| browser_click | Performs a click on a web page element. |
| browser_hover | Hovers over an element on the page. |
| browser_type | Types text into an editable element. |
| browser_press_key | Presses a key on the keyboard. |
| browser_fill_form | Fills multiple form fields in one call. |
| browser_select_option | Selects an option in a dropdown. |
| browser_file_upload | Uploads one or multiple files. |
FAQ
- Why does Playwright use accessibility snapshots instead of screenshots?
- browser_snapshot returns the page as an accessibility tree, so the agent targets named elements rather than pixel coordinates. Actions are more deterministic and far cheaper in tokens than reasoning over an image, which is the main reason it leads this list.
- When should I use Browserbase or Exa instead?
- Browserbase fits when you need a managed cloud browser at scale with nothing to run, billed per session. Exa fits when the job is really content retrieval rather than interaction, where search-and-extract beats driving a full browser. Playwright is the default for local, interactive automation.