Does Openkova have rate limits that would affect high-frequency agent workflows?

No. Openkova is self-hosted with no rate limits. The only constraint is your server capacity — how many concurrent Chromium instances your container can sustain. For high-frequency workflows, run multiple Openkova instances behind a load balancer.

Screenshot API for AI Agents: Self-Hosted, No API Keys

The pattern: Agent calls POST /api/convert/url → receives PNG bytes → passes image to vision model → acts on what it sees. Self-hosted Openkova keeps this loop fast, private, and free of usage caps.

Why AI agents need screenshots

Modern agent frameworks — LangChain, AutoGen, CrewAI, Magentic-One — increasingly use multimodal models. An agent that can see a web page can verify that a form submitted correctly, detect UI regressions, extract structured data from a rendered table, or complete browser-based tasks without relying purely on HTML parsing.

Screenshots are more reliable than HTML for agents because:

Rendered output reflects JavaScript execution, CSS layout, and lazy-loaded content
Vision models are trained on visual content, not raw DOM strings
A PNG is a stable, language-agnostic interchange format

Why self-hosted matters for agent workflows

SaaS screenshot APIs have per-request pricing. An agent that takes 50 screenshots per task run, running 100 times a day, generates 5,000 requests daily. At typical SaaS prices, that cost accumulates fast.

More importantly, agents often need to screenshot pages that a SaaS API cannot reach:

Internal tools on a private VPN
Staging environments not exposed to the public internet
Localhost during development
Authenticated dashboards where you'd need to pass session cookies

A self-hosted API running on the same network has access to all of these.

MCP server: the easiest path for Claude, Cursor, and Windsurf

If your agent runs inside Claude Desktop, Cursor, or Windsurf, the @openkova/mcppackage is the simplest integration — no HTTP client code, no SSE parsing. Add it to your client's MCP config and your AI assistant gets three tools automatically: screenshot_url, screenshot_snippet, and crawl_url. Screenshots are returned as inline images the model can see directly.

// claude_desktop_config.json / .cursor/mcp.json / .windsurf/mcp.json
{
  "mcpServers": {
    "openkova": {
      "command": "npx",
      "args": ["-y", "@openkova/mcp"]
    }
  }
}

Everything runs on your machine — no API key, no external service, no data leaving your network.

Calling Openkova from a Python agent

import httpx, json

async def screenshot_url(url: str) -> str:
    """Return the image URL for a given URL."""
    async with httpx.AsyncClient(timeout=60) as client:
        async with client.stream(
            "POST",
            "http://localhost:3000/api/convert/url",
            json={"url": url, "depth": 1},
        ) as response:
            image_url = None
            async for line in response.aiter_lines():
                if line.startswith("data: "):
                    event = json.loads(line[6:])
                    if event.get("type") == "done":
                        # results[0].url = "/api/image/{sessionId}/{imageId}"
                        image_url = event["data"]["results"][0]["url"]
            return image_url

Calling from a Node.js / TypeScript agent

async function screenshotUrl(url: string): Promise<string> {
  const res = await fetch('http://localhost:3000/api/convert/url', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ url, depth: 1 }),
  });

  const reader = res.body!.getReader();
  const decoder = new TextDecoder();
  let imageUrl = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    for (const line of decoder.decode(value).split('\n')) {
      if (line.startsWith('data: ')) {
        const event = JSON.parse(line.slice(6));
        // event.data.results[0].url = "/api/image/{sessionId}/{imageId}"
        if (event.type === 'done') imageUrl = event.data.results[0].url;
      }
    }
  }
  return imageUrl;
}

Using screenshots with a vision model

Once you have the PNG, pass it to a multimodal model. Example with the Anthropic Claude API:

import anthropic, base64, asyncio

client = anthropic.Anthropic()

async def describe_page(url: str) -> str:
    image_url = await screenshot_url(url)
    img_resp = await httpx.AsyncClient().get(f"http://localhost:3000{image_url}")
    image_data = base64.standard_b64encode(img_resp.content).decode("utf-8")

    message = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "What is shown on this web page? Summarize the main content."
                }
            ],
        }],
    )
    return message.content[0].text

Architecture for high-frequency agents

For agents that take many screenshots per minute, a single Openkova instance may become the bottleneck. Options for scaling:

Horizontal scaling— run 2–4 Openkova instances behind a round-robin load balancer (Nginx, Traefik, or your cloud provider's ALB)
Queue-based dispatch — put screenshot requests in a queue (Redis, BullMQ, or a simple Postgres table) and have workers pull from it
Dedicated containers per agent — in Kubernetes, give each agent pod its own Openkova sidecar container for full isolation

SSE streaming and agent feedback loops

Openkova returns Server-Sent Events during conversion. For crawled URLs (depth 1–2), your agent receives progress events as each page is captured. This is useful for agents that need to know which sub-pages were processed before acting on the results.

# SSE event types
{"type": "progress", "message": "Capturing https://example.com"}
{"type": "progress", "message": "Capturing https://example.com/about"}
{"type": "done", "message": "Done — 2 pages captured", "data": {"sessionId": "...", "results": [{"imageId": "abc123.png", "url": "/api/image/..."}], "total": 2}}

Frequently asked questions

Why do AI agents need a screenshot API?

Multimodal agents see pages as images. A screenshot API gives any agent — regardless of language — a consistent HTTP interface to capture any URL or HTML as a PNG for vision model input.

What is the difference between a screenshot API and Playwright for agents?

Playwright is a Node.js library. Your agent must be JavaScript and manage Chromium directly. A screenshot API is language-agnostic: any Python, Go, or bash agent can call it over HTTP. This matters in multi-agent systems where agents run in different runtimes.

Does Openkova have rate limits for high-frequency agent workflows?

No. Self-hosted means no usage caps. The only ceiling is your server capacity — CPU and memory for concurrent Chromium instances. Scale horizontally with multiple containers behind a load balancer.

Can Openkova screenshot internal or authenticated pages?

Yes. It runs on your own infrastructure and can reach any URL your server can reach — including localhost, VPN-only services, and staging environments. SaaS screenshot APIs cannot access private networks.

Get started: Deploy Openkova with Docker — or see the API reference for the full endpoint and SSE event spec.