Docs
Options

Options

createScraper() accepts a ScraperConfig object to configure plugins, resolve rules, fetch behavior, and post-processing.

import { createScraper, metaTags, openGraph, twitter, jsonLd, oembed, DEFAULT_RULES } from 'web-meta-scraper';
 
const scraper = createScraper({
  plugins: [metaTags, openGraph, twitter, jsonLd, oembed],
  rules: DEFAULT_RULES,
  fetch: { /* ... */ },
  postProcess: { /* ... */ },
});

ScraperConfig

FieldTypeDescription
pluginsPlugin[]Array of plugins to use (default: [] for createScraper, all built-ins for scrape())
rulesResolveRule[]Priority resolve rules (default: DEFAULT_RULES)
fetchFetchOptionsHTTP fetch options (for scrapeUrl)
postProcessPostProcessOptionsPost-processing behavior

Fetch Options

Options for scrapeUrl() HTTP requests:

OptionTypeDefaultDescription
timeoutnumber30000Request timeout in milliseconds
userAgentstringChrome 131 UAUser-Agent header
followRedirectsbooleantrueFollow HTTP redirects
maxContentLengthnumber5242880Max response size in bytes (5MB)
stealthbooleanfalseUse HTTP/2 with browser-like TLS fingerprint for improved compatibility
headersRecord<string, string>undefinedAdditional headers to send with the request

Stealth Mode

When stealth: true, the fetcher uses HTTP/2 with a browser-like TLS fingerprint (cipher suite, elliptic curves, signature algorithms). This improves compatibility with sites that restrict non-browser HTTP clients.

const scraper = createScraper({
  plugins: [metaTags, openGraph],
  fetch: {
    stealth: true,
  },
});
 
const result = await scraper.scrapeUrl('https://example.com/product/123');

Note: Stealth mode mimics a real browser's TLS handshake and HTTP/2 settings. Use this option responsibly and ensure that your scraping activity complies with the target site's Terms of Service and robots.txt directives. This feature is intended for legitimate metadata extraction (similar to how social platforms generate link previews), not for circumventing access controls or performing large-scale data collection.

Some sites may serve a JavaScript challenge page instead of actual content when they detect rapid successive requests from the same IP. Since this library does not execute JavaScript, such challenges cannot be solved automatically. If you encounter empty or missing metadata, consider adding a delay between requests.

Post-Processing Options

OptionTypeDefaultDescription
maxDescriptionLengthnumber200Maximum character length for descriptions. Truncated with ....
secureImagesbooleantrueConvert http:// and // image/favicon URLs to https://.
omitEmptybooleantrueRemove null, undefined, empty strings, and empty arrays from the result.
fallbacksbooleantrueApply fallback logic for missing fields.
const scraper = createScraper({
  plugins: [metaTags, openGraph],
  postProcess: {
    maxDescriptionLength: 100,
    secureImages: true,
    omitEmpty: true,
    fallbacks: true,
  },
});

Fallback Behavior

When fallbacks: true (default):

  1. Title fallback — If no title is found, siteName is used instead.
  2. Description fallback — If no description is found, it's extracted from JSON-LD structured data.
  3. URL resolution — Relative image and favicon paths (/image.png) are converted to absolute URLs using the page URL.

Secure Images

When secureImages: true (default):

  • http://example.com/img.pnghttps://example.com/img.png
  • //example.com/img.pnghttps://example.com/img.png

Custom Resolve Rules

The default rules determine how metadata from different plugins is merged. You can override them entirely:

import { createScraper, metaTags, openGraph, twitter, type ResolveRule } from 'web-meta-scraper';
 
const myRules: ResolveRule[] = [
  {
    field: 'title',
    sources: [
      { plugin: 'twitter', key: 'title', priority: 3 },   // Twitter wins
      { plugin: 'open-graph', key: 'title', priority: 2 },
      { plugin: 'meta-tags', key: 'title', priority: 1 },
    ],
  },
  {
    field: 'description',
    sources: [
      { plugin: 'open-graph', key: 'description', priority: 2 },
      { plugin: 'meta-tags', key: 'description', priority: 1 },
    ],
  },
  // Add your own fields...
];
 
const scraper = createScraper({
  plugins: [metaTags, openGraph, twitter],
  rules: myRules,
});

Each ResolveRule maps a final field name to an array of sources. Each source specifies:

  • plugin — The plugin name (matches PluginResult.name)
  • key — The key in the plugin's data object
  • priority — Higher number wins

Default Rules

Resolved FieldSources (by priority)
titleopen-graph (3) → meta-tags (2) → twitter (1)
descriptionopen-graph (3) → meta-tags (2) → twitter (1)
imageopen-graph (2) → twitter (1)
urlopen-graph (2) → meta-tags/canonicalUrl (1)
faviconmeta-tags (1)
authormeta-tags (1)
keywordsmeta-tags (1)
typeopen-graph (1)
siteNameopen-graph (1)
localeopen-graph (1)
twitterCardtwitter/card (1)
twitterSitetwitter/site (1)
twitterCreatortwitter/creator (1)
jsonLdjson-ld (1)
oembedoembed (1)
faviconsfavicons (1)
feedsfeeds (1)
robotsrobots (1)