Adding Screenshot Capture to Your Crawlee Crawler
Two approaches: inline page.screenshot() and a dedicated REST API via Openkova — when to use each, and how to set up both.
page.screenshot()is fine. For a production screenshot pipeline — consistent rendering, format control, storage, and clean separation from your crawler — call Openkova's REST API instead.The two approaches
Crawlee's PlaywrightCrawler and PuppeteerCrawler both give you access to the browser page object inside each request handler. You can call page.screenshot() directly — but for a production screenshot pipeline, this mixes two concerns: crawling (following links, extracting data) and rendering (producing clean, consistent image output).
Openkova is a dedicated REST API. Your crawler does what it does — visiting pages, extracting data — and when it needs a screenshot, it fires a POST to Openkova and gets back raw image bytes. Openkova handles its own browser instance, format options, and rendering quality.
Approach 1: page.screenshot() inside the crawler
The simplest option — capture a screenshot inside your existing Crawlee request handler and store it in the key-value store:
import { PlaywrightCrawler, KeyValueStore } from 'crawlee';
const crawler = new PlaywrightCrawler({
async requestHandler({ page, request }) {
// Do your data extraction
const title = await page.title();
// Capture screenshot of current page state
const screenshot = await page.screenshot({ type: 'png', fullPage: false });
// Store in Crawlee's key-value store
const slug = new URL(request.url).hostname.replace(/./g, '-');
await KeyValueStore.setValue(`screenshot-${slug}`, screenshot, {
contentType: 'image/png',
});
console.log(`Captured: ${title}`);
},
});
await crawler.run(['https://example.com']);This works well for:
- Error snapshots — capturing what the page looks like when something goes wrong
- Visual debugging — checking what the crawler actually sees at a given point
- Low-volume archiving — a handful of pages where you want a quick snapshot
The limitation: page.screenshot()captures the page in its current browser session state — cookies, existing scroll position, any JavaScript side-effects from earlier in your handler. For a consistent, clean render, you need a fresh browser context. That's what Openkova provides.
Approach 2: Openkova REST API alongside Crawlee
Run Openkova alongside your crawler. When a page needs a screenshot, your request handler calls Openkova with the URL — Openkova opens a fresh browser session, renders the page cleanly, and returns the image bytes.
Step 1: Run Openkova
docker run -d -p 3001:3000 \
-e CHROMIUM_PATH=/usr/bin/chromium \
ghcr.io/scnix-git/openkova:latestRunning on port 3001 to avoid conflict if your Crawlee dev server is on 3000.
Step 2: Call Openkova from your request handler
import { PlaywrightCrawler } from 'crawlee';
import { writeFile, mkdir } from 'fs/promises';
import { join } from 'path';
const OPENKOVA_URL = process.env.OPENKOVA_URL ?? 'http://localhost:3001';
async function screenshotUrl(url: string, format: 'png' | 'jpeg' = 'jpeg') {
const res = await fetch(`${OPENKOVA_URL}/api/convert/url`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url,
format,
viewport: { width: 1280, height: 900 },
}),
});
if (!res.ok) throw new Error(`Openkova error: ${res.status}`);
return Buffer.from(await res.arrayBuffer());
}
const crawler = new PlaywrightCrawler({
async requestHandler({ request, enqueueLinks }) {
// Screenshot via Openkova — clean, fresh browser session
const img = await screenshotUrl(request.url);
await mkdir('screenshots', { recursive: true });
const slug = new URL(request.url).pathname.replace(/\//g, '-').replace(/^-/, '');
await writeFile(join('screenshots', `${slug || 'home'}.jpg`), img);
// Continue crawling
await enqueueLinks();
},
});
await crawler.run(['https://example.com']);When to use each approach
| Scenario | page.screenshot() | Openkova API |
|---|---|---|
| Error snapshots / debugging | ✓ Captures current page state | Overkill |
| Production thumbnail pipeline | ✗ Session state leaks in | ✓ Fresh render each time |
| HTML template rendering | ✗ URL-only | ✓ /api/convert/snippet |
| PDF generation | ✗ page.pdf() — complex setup | ✓ format: "pdf" |
| WebP output | Limited | ✓ format: "webp" |
| Consistent viewport control | ✗ Inherits crawler viewport | ✓ Per-request viewport |
| Private / localhost URLs | ✓ Same network as crawler | ✓ Same network if co-located |
Bulk screenshot of crawled URLs
If you have a list of URLs from a Crawlee run and want to screenshot all of them efficiently, use Openkova's SSE streaming endpoint to get progress events:
import { createWriteStream } from 'fs';
async function screenshotWithProgress(url: string, outputPath: string) {
const res = await fetch(`${OPENKOVA_URL}/api/convert/url`, {
method: 'POST',
headers: { 'Content-Type': 'application/json', Accept: 'text/event-stream' },
body: JSON.stringify({ url, format: 'jpeg' }),
});
// SSE stream: read events until "done"
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let imageData: Buffer | null = null;
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split('\n').filter(l => l.startsWith('data: '));
for (const line of lines) {
const event = JSON.parse(line.slice(6));
if (event.type === 'done') {
// Fetch the image from the returned URL
const imgRes = await fetch(`${OPENKOVA_URL}${event.data.url}`);
imageData = Buffer.from(await imgRes.arrayBuffer());
}
}
}
if (imageData) await writeFile(outputPath, imageData);
}Using @openkova/cli for simpler batch screenshots
For shell-based batch screenshotting after a Crawlee run, the CLI is the fastest option. After your crawler writes extracted URLs to a file:
# Install the CLI
npm install -g @openkova/cli
# Screenshot every URL from your crawl output
while IFS= read -r url; do
slug=$(echo "$url" | sed 's|https\?://||;s|/|-|g')
kova screenshot "$url" --output "screenshots/$slug.jpg" --format jpeg
done < crawled-urls.txtCrawlee + Openkova: a practical architecture
For a production screenshot pipeline built on Crawlee:
- Crawlee discovers URLs — use
PlaywrightCrawlerorHttpCrawlerto crawl a site and build a URL list. Store results in a dataset. - Openkova renders screenshots — a separate worker reads the URL dataset and calls
POST /api/convert/urlfor each entry. Clean browser state, consistent output format, controllable viewport. - Store in S3 / object storage — save rendered images to S3 or R2 keyed by URL slug. Serve from CDN with a long cache TTL.
This separation keeps your crawler fast (it doesn't wait for screenshot renders) and your screenshots clean (Openkova gets a fresh browser context per request).
Frequently asked questions
How do I take screenshots in a Crawlee crawler?
Inside a PlaywrightCrawler handler, call await page.screenshot({ type: 'png' }) and store the result with KeyValueStore.setValue(). For a production screenshot pipeline with consistent rendering and format control, call Openkova's REST API from the handler instead.
Can Crawlee take screenshots?
Yes. PlaywrightCrawler and PuppeteerCrawler both expose a page object with page.screenshot(). For a dedicated screenshot service alongside Crawlee, Openkova handles format options (PNG, JPEG, WebP, PDF), viewport control, and clean rendering independently of your crawler.
What is the difference between page.screenshot() and Openkova?
page.screenshot()captures the current page in your crawler's existing browser session — including session cookies, scroll state, and any JavaScript side effects. Openkova opens a fresh browser context for every request, giving consistent output regardless of what the crawler has done on the page before.
See also: Openkova screenshot API reference, screenshot any webpage in CI/CD, and Browserless vs Openkova.