BlogAI / Agents

Add a Screenshot Tool to Claude Desktop, Cursor, and Windsurf with @openkova/mcp

One config block. No server. No API key. Your AI assistant gets eyes.

What this does: After a two-line config change, Claude Desktop, Cursor, or Windsurf can call screenshot_url, screenshot_snippet, and crawl_url as tools — screenshots are returned as inline images the model sees directly. Everything runs on your own machine via headless Chromium.

What is MCP?

The Model Context Protocol (MCP) is an open standard that lets AI clients — editors, chat apps, agent frameworks — connect to local or remote tool servers over a simple JSON-RPC interface. Instead of hard-coding integrations, MCP-compatible clients discover tools at runtime and expose them to the model without any additional code.

@openkova/mcp is one such tool server. It exposes Chromium-backed screenshot and crawl capabilities as MCP tools.

How @openkova/mcp works

Unlike the web app, the MCP server does not require a running Openkova instance. It uses @openkova/core directly and spawns its own headless Chromium process on demand. There is nothing to deploy or keep alive — the MCP server is just a process your AI client starts when it needs a screenshot.

The architecture is:

AI client (Claude Desktop / Cursor / Windsurf)
  └─ starts kova-mcp (via npx -y @openkova/mcp)
       └─ @openkova/core  ←  uses puppeteer-core
            └─ headless Chromium  (your local Chrome or system Chromium)

Because it runs in-process on your machine, it can screenshot anything your browser can reach — localhost, staging environments, VPN-only tools, and authenticated pages (with cookies).

Chrome prerequisites

@openkova/mcp ships with puppeteer-core, which does not bundle a Chrome binary. It finds a usable browser by checking, in order:

  1. The CHROMIUM_PATH environment variable
  2. A globally or locally installed puppeteer npm package (which does bundle Chrome)
  3. System paths: google-chrome, chromium-browser, /usr/bin/chromium, and on macOS, /Applications/Google Chrome.app/...

This means zero-configuration works for most developers:

EnvironmentWorks out of the box?Notes
macOS with Google Chrome✅ YesChrome.app found automatically
Linux with chromium-browser✅ YesFound on $PATH
npm install -g puppeteer✅ YesBundled Chrome detected
Fresh Linux / Docker (no Chrome)❌ Needs setupSet CHROMIUM_PATH or install chromium-browser

On fresh Linux, the fastest fix is apt-get install -y chromium-browser. No configuration needed after that.

Setup: Claude Desktop

Open ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows) and add:

{
  "mcpServers": {
    "openkova": {
      "command": "npx",
      "args": ["-y", "@openkova/mcp"]
    }
  }
}

Restart Claude Desktop. In a new conversation you should see the Openkova tools listed in the tool picker. Claude will use them automatically when you ask it to look at a page.

Setup: Cursor

Add the same block to .cursor/mcp.json in your project root (or the global ~/.cursor/mcp.json):

{
  "mcpServers": {
    "openkova": {
      "command": "npx",
      "args": ["-y", "@openkova/mcp"]
    }
  }
}

Cursor will pick up the new MCP server on the next reload. The Agent mode will call screenshot tools automatically when it needs visual context.

Setup: Windsurf

Add the same config to .windsurf/mcp.json:

{
  "mcpServers": {
    "openkova": {
      "command": "npx",
      "args": ["-y", "@openkova/mcp"]
    }
  }
}

The three tools

screenshot_url

Navigates to a URL and returns a screenshot as an inline image. Supports depth crawling (1 or 2) to capture multiple pages in one call.

Example prompts:

screenshot_snippet

Renders a raw HTML string and returns a screenshot. Useful for previewing generated HTML, email templates, or OG image designs before deploying them.

Example prompts:

crawl_url

Crawls a site up to the specified depth and returns screenshots of all discovered pages. Same as screenshot_url with depth > 1, surfaced as a distinct tool for clarity.

What the model sees

Screenshots are returned as inline image/pngMCP content — the model receives the actual image bytes, not a URL. This means Claude (or Cursor's agent) can analyze the screenshot immediately, without a second fetch step. Multimodal vision is used to answer questions, verify layout, or extract information directly from the rendered page.

Example: visual QA in Cursor

In Cursor's Agent mode, you can ask it to visually verify your work as part of a coding session:

// Prompt to Cursor Agent:
// "After applying the CSS fix, screenshot http://localhost:3000
//  and confirm the button is now centered."

// Cursor will call screenshot_url({ url: "http://localhost:3000" })
// and receive the PNG inline. It then analyzes the image and
// confirms (or flags) the visual result.

No browser extension, no manual screenshots, no context-switching. The agent closes the loop between code change and visual verification.

Privacy

All processing happens locally. The kova-mcp process runs on your machine, Chromium runs on your machine, and the resulting PNG is passed directly to your AI client over the local MCP socket. No image data is sent to any external service.

Frequently asked questions

Does @openkova/mcp require a running Openkova server?

No. The MCP server uses @openkova/core directly and launches its own Chromium process. There is nothing to deploy.

Which AI clients support @openkova/mcp?

Any MCP-compatible client: Claude Desktop, Cursor, Windsurf, and any custom agent that implements the Model Context Protocol. The config block is the same for all of them.

Does @openkova/mcp need an API key?

No. It is MIT-licensed, runs locally, and has no usage fees or rate limits. The only cost is the compute on your own machine.

What Chrome binary does @openkova/mcp use?

It checks CHROMIUM_PATH first, then a puppeteer global/local install, then system paths (google-chrome, chromium-browser, /usr/bin/chromium, Chrome.app on macOS). On most developer machines this works with zero configuration.

Want to call Openkova over HTTP instead? Screenshot API for AI Agents covers the REST approach — or deploy Openkova with Docker to get the full web UI and API.