What is the difference between using page.screenshot() in Crawlee vs Openkova?

page.screenshot() captures the page inside your crawler's existing browser session — useful for error snapshots or quick debugging. Openkova is a dedicated REST API: you give it a URL, it spins up a clean browser instance, captures the page, and returns the bytes. Openkova is better for production screenshot pipelines where you want consistent rendering, format control (PNG/JPEG/WebP/PDF), and independence from your crawler's browser state.

Adding Screenshot Capture to Your Crawlee Crawler with Openkova

Two approaches: inline page.screenshot() and a dedicated REST API via Openkova — when to use each, and how to set up both.

The two approaches

Crawlee's PlaywrightCrawler and PuppeteerCrawler both give you access to the browser page object inside each request handler. You can call page.screenshot() directly — but for a production screenshot pipeline, this mixes two concerns: crawling (following links, extracting data) and rendering (producing clean, consistent image output).

Openkova is a dedicated REST API. Your crawler does what it does — visiting pages, extracting data — and when it needs a screenshot, it fires a POST to Openkova and gets back raw image bytes. Openkova handles its own browser instance, format options, and rendering quality.

Approach 1: page.screenshot() inside the crawler

The simplest option — capture a screenshot inside your existing Crawlee request handler and store it in the key-value store:

import { PlaywrightCrawler, KeyValueStore } from 'crawlee';

const crawler = new PlaywrightCrawler({
  async requestHandler({ page, request }) {
    // Do your data extraction
    const title = await page.title();

    // Capture screenshot of current page state
    const screenshot = await page.screenshot({ type: 'png', fullPage: false });

    // Store in Crawlee's key-value store
    const slug = new URL(request.url).hostname.replace(/./g, '-');
    await KeyValueStore.setValue(`screenshot-${slug}`, screenshot, {
      contentType: 'image/png',
    });

    console.log(`Captured: ${title}`);
  },
});

await crawler.run(['https://example.com']);

This works well for:

Error snapshots — capturing what the page looks like when something goes wrong
Visual debugging — checking what the crawler actually sees at a given point
Low-volume archiving — a handful of pages where you want a quick snapshot

The limitation: page.screenshot()captures the page in its current browser session state — cookies, existing scroll position, any JavaScript side-effects from earlier in your handler. For a consistent, clean render, you need a fresh browser context. That's what Openkova provides.

Approach 2: Openkova REST API alongside Crawlee

Run Openkova alongside your crawler. When a page needs a screenshot, your request handler calls Openkova with the URL — Openkova opens a fresh browser session, renders the page cleanly, and returns the image bytes.

Step 1: Run Openkova

docker run -d -p 3001:3000 \
  -e CHROMIUM_PATH=/usr/bin/chromium \
  ghcr.io/scnix-git/openkova:latest

Running on port 3001 to avoid conflict if your Crawlee dev server is on 3000.

Step 2: Call Openkova from your request handler

import { PlaywrightCrawler } from 'crawlee';
import { writeFile, mkdir } from 'fs/promises';
import { join } from 'path';

const OPENKOVA_URL = process.env.OPENKOVA_URL ?? 'http://localhost:3001';

async function screenshotUrl(url: string, format: 'png' | 'jpeg' = 'jpeg') {
  const res = await fetch(`${OPENKOVA_URL}/api/convert/url`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      url,
      format,
      viewport: { width: 1280, height: 900 },
    }),
  });

  if (!res.ok) throw new Error(`Openkova error: ${res.status}`);
  return Buffer.from(await res.arrayBuffer());
}

const crawler = new PlaywrightCrawler({
  async requestHandler({ request, enqueueLinks }) {
    // Screenshot via Openkova — clean, fresh browser session
    const img = await screenshotUrl(request.url);

    await mkdir('screenshots', { recursive: true });
    const slug = new URL(request.url).pathname.replace(/\//g, '-').replace(/^-/, '');
    await writeFile(join('screenshots', `${slug || 'home'}.jpg`), img);

    // Continue crawling
    await enqueueLinks();
  },
});

await crawler.run(['https://example.com']);

When to use each approach

Scenario	page.screenshot()	Openkova API
Error snapshots / debugging	✓ Captures current page state	Overkill
Production thumbnail pipeline	✗ Session state leaks in	✓ Fresh render each time
HTML template rendering	✗ URL-only	✓ /api/convert/snippet
PDF generation	✗ page.pdf() — complex setup	✓ format: "pdf"
WebP output	Limited	✓ format: "webp"
Consistent viewport control	✗ Inherits crawler viewport	✓ Per-request viewport
Private / localhost URLs	✓ Same network as crawler	✓ Same network if co-located

Bulk screenshot of crawled URLs

If you have a list of URLs from a Crawlee run and want to screenshot all of them efficiently, use Openkova's SSE streaming endpoint to get progress events:

import { createWriteStream } from 'fs';

async function screenshotWithProgress(url: string, outputPath: string) {
  const res = await fetch(`${OPENKOVA_URL}/api/convert/url`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', Accept: 'text/event-stream' },
    body: JSON.stringify({ url, format: 'jpeg' }),
  });

  // SSE stream: read events until "done"
  const reader = res.body!.getReader();
  const decoder = new TextDecoder();
  let imageData: Buffer | null = null;

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const text = decoder.decode(value);
    const lines = text.split('\n').filter(l => l.startsWith('data: '));
    for (const line of lines) {
      const event = JSON.parse(line.slice(6));
      if (event.type === 'done') {
        // Fetch the image from the returned URL
        const imgRes = await fetch(`${OPENKOVA_URL}${event.data.url}`);
        imageData = Buffer.from(await imgRes.arrayBuffer());
      }
    }
  }

  if (imageData) await writeFile(outputPath, imageData);
}

Using @openkova/cli for simpler batch screenshots

For shell-based batch screenshotting after a Crawlee run, the CLI is the fastest option. After your crawler writes extracted URLs to a file:

# Install the CLI
npm install -g @openkova/cli

# Screenshot every URL from your crawl output
while IFS= read -r url; do
  slug=$(echo "$url" | sed 's|https\?://||;s|/|-|g')
  kova screenshot "$url" --output "screenshots/$slug.jpg" --format jpeg
done < crawled-urls.txt

Crawlee + Openkova: a practical architecture

For a production screenshot pipeline built on Crawlee:

Crawlee discovers URLs — use PlaywrightCrawler or HttpCrawler to crawl a site and build a URL list. Store results in a dataset.
Openkova renders screenshots — a separate worker reads the URL dataset and calls POST /api/convert/url for each entry. Clean browser state, consistent output format, controllable viewport.
Store in S3 / object storage — save rendered images to S3 or R2 keyed by URL slug. Serve from CDN with a long cache TTL.

This separation keeps your crawler fast (it doesn't wait for screenshot renders) and your screenshots clean (Openkova gets a fresh browser context per request).

Frequently asked questions

How do I take screenshots in a Crawlee crawler?

Inside a PlaywrightCrawler handler, call await page.screenshot({ type: 'png' }) and store the result with KeyValueStore.setValue(). For a production screenshot pipeline with consistent rendering and format control, call Openkova's REST API from the handler instead.

Can Crawlee take screenshots?

Yes. PlaywrightCrawler and PuppeteerCrawler both expose a page object with page.screenshot(). For a dedicated screenshot service alongside Crawlee, Openkova handles format options (PNG, JPEG, WebP, PDF), viewport control, and clean rendering independently of your crawler.

What is the difference between page.screenshot() and Openkova?

page.screenshot()captures the current page in your crawler's existing browser session — including session cookies, scroll state, and any JavaScript side effects. Openkova opens a fresh browser context for every request, giving consistent output regardless of what the crawler has done on the page before.