Options
createScraper() accepts a ScraperConfig object to configure plugins, resolve rules, fetch behavior, and post-processing.
import { createScraper, metaTags, openGraph, twitter, jsonLd, oembed, DEFAULT_RULES } from 'web-meta-scraper';
const scraper = createScraper({
plugins: [metaTags, openGraph, twitter, jsonLd, oembed],
rules: DEFAULT_RULES,
fetch: { /* ... */ },
postProcess: { /* ... */ },
});ScraperConfig
| Field | Type | Description |
|---|---|---|
plugins | Plugin[] | Array of plugins to use (default: [] for createScraper, all built-ins for scrape()) |
rules | ResolveRule[] | Priority resolve rules (default: DEFAULT_RULES) |
fetch | FetchOptions | HTTP fetch options (for scrapeUrl) |
postProcess | PostProcessOptions | Post-processing behavior |
Fetch Options
Options for scrapeUrl() HTTP requests:
| Option | Type | Default | Description |
|---|---|---|---|
timeout | number | 30000 | Request timeout in milliseconds |
userAgent | string | Chrome 131 UA | User-Agent header |
followRedirects | boolean | true | Follow HTTP redirects |
maxContentLength | number | 5242880 | Max response size in bytes (5MB) |
stealth | boolean | false | Use HTTP/2 with browser-like TLS fingerprint for improved compatibility |
headers | Record<string, string> | undefined | Additional headers to send with the request |
Stealth Mode
When stealth: true, the fetcher uses HTTP/2 with a browser-like TLS fingerprint (cipher suite, elliptic curves, signature algorithms). This improves compatibility with sites that restrict non-browser HTTP clients.
const scraper = createScraper({
plugins: [metaTags, openGraph],
fetch: {
stealth: true,
},
});
const result = await scraper.scrapeUrl('https://example.com/product/123');Note: Stealth mode mimics a real browser's TLS handshake and HTTP/2 settings. Use this option responsibly and ensure that your scraping activity complies with the target site's Terms of Service and
robots.txtdirectives. This feature is intended for legitimate metadata extraction (similar to how social platforms generate link previews), not for circumventing access controls or performing large-scale data collection.Some sites may serve a JavaScript challenge page instead of actual content when they detect rapid successive requests from the same IP. Since this library does not execute JavaScript, such challenges cannot be solved automatically. If you encounter empty or missing metadata, consider adding a delay between requests.
Post-Processing Options
| Option | Type | Default | Description |
|---|---|---|---|
maxDescriptionLength | number | 200 | Maximum character length for descriptions. Truncated with .... |
secureImages | boolean | true | Convert http:// and // image/favicon URLs to https://. |
omitEmpty | boolean | true | Remove null, undefined, empty strings, and empty arrays from the result. |
fallbacks | boolean | true | Apply fallback logic for missing fields. |
const scraper = createScraper({
plugins: [metaTags, openGraph],
postProcess: {
maxDescriptionLength: 100,
secureImages: true,
omitEmpty: true,
fallbacks: true,
},
});Fallback Behavior
When fallbacks: true (default):
- Title fallback — If no
titleis found,siteNameis used instead. - Description fallback — If no
descriptionis found, it's extracted from JSON-LD structured data. - URL resolution — Relative image and favicon paths (
/image.png) are converted to absolute URLs using the page URL.
Secure Images
When secureImages: true (default):
http://example.com/img.png→https://example.com/img.png//example.com/img.png→https://example.com/img.png
Custom Resolve Rules
The default rules determine how metadata from different plugins is merged. You can override them entirely:
import { createScraper, metaTags, openGraph, twitter, type ResolveRule } from 'web-meta-scraper';
const myRules: ResolveRule[] = [
{
field: 'title',
sources: [
{ plugin: 'twitter', key: 'title', priority: 3 }, // Twitter wins
{ plugin: 'open-graph', key: 'title', priority: 2 },
{ plugin: 'meta-tags', key: 'title', priority: 1 },
],
},
{
field: 'description',
sources: [
{ plugin: 'open-graph', key: 'description', priority: 2 },
{ plugin: 'meta-tags', key: 'description', priority: 1 },
],
},
// Add your own fields...
];
const scraper = createScraper({
plugins: [metaTags, openGraph, twitter],
rules: myRules,
});Each ResolveRule maps a final field name to an array of sources. Each source specifies:
plugin— The plugin name (matchesPluginResult.name)key— The key in the plugin'sdataobjectpriority— Higher number wins
Default Rules
| Resolved Field | Sources (by priority) |
|---|---|
title | open-graph (3) → meta-tags (2) → twitter (1) |
description | open-graph (3) → meta-tags (2) → twitter (1) |
image | open-graph (2) → twitter (1) |
url | open-graph (2) → meta-tags/canonicalUrl (1) |
favicon | meta-tags (1) |
author | meta-tags (1) |
keywords | meta-tags (1) |
type | open-graph (1) |
siteName | open-graph (1) |
locale | open-graph (1) |
twitterCard | twitter/card (1) |
twitterSite | twitter/site (1) |
twitterCreator | twitter/creator (1) |
jsonLd | json-ld (1) |
oembed | oembed (1) |
favicons | favicons (1) |
feeds | feeds (1) |
robots | robots (1) |