Docs
Custom Plugins

Custom Plugins

Create your own plugins to extract any data from the HTML.

Plugin Interface

type Plugin = (ctx: ScrapeContext) => PluginResult | Promise<PluginResult>;
 
interface ScrapeContext {
  $: CheerioAPI;        // Cheerio instance for DOM queries
  url?: string;         // Page URL (if available)
  options: ScraperOptions;
}
 
interface PluginResult {
  name: string;                    // Unique plugin name
  data: Record<string, unknown>;   // Extracted data
}

Basic Example

import { createScraper, openGraph, DEFAULT_RULES, type Plugin } from 'web-meta-scraper';
 
const pricePlugin: Plugin = (ctx) => {
  const { $ } = ctx;
 
  const price = $('[itemprop="price"]').attr('content');
  const currency = $('[itemprop="priceCurrency"]').attr('content');
 
  return {
    name: 'price',
    data: { price, currency },
  };
};
 
const scraper = createScraper({
  plugins: [openGraph, pricePlugin],
  rules: [
    ...DEFAULT_RULES,
    { field: 'price', sources: [{ plugin: 'price', key: 'price', priority: 1 }] },
    { field: 'currency', sources: [{ plugin: 'price', key: 'currency', priority: 1 }] },
  ],
});
 
const result = await scraper.scrapeUrl('https://shop.example.com');
console.log(result.metadata); // { title: "...", price: "$99.99", currency: "USD", ... }

Using the Context

The ScrapeContext gives you:

  • $ — A Cheerio (opens in a new tab) instance preloaded with the page HTML. Use it like jQuery.
  • url — The page URL (set for scrapeUrl, or passed as option to scrape). Useful for resolving relative URLs.
  • options — Scraper options including timeout.
import { resolveUrl } from 'web-meta-scraper'; // URL utility
 
const imagePlugin: Plugin = (ctx) => {
  const { $, url } = ctx;
 
  const src = $('img.hero').attr('src');
  const absoluteSrc = src ? resolveUrl(src, url) : undefined;
 
  return {
    name: 'hero-image',
    data: { heroImage: absoluteSrc },
  };
};

Async Plugins

Plugins can be async. All plugins run in parallel via Promise.all:

const apiPlugin: Plugin = async (ctx) => {
  const { $, options } = ctx;
  const pageId = $('meta[name="page-id"]').attr('content');
 
  if (!pageId) return { name: 'api', data: {} };
 
  const res = await fetch(`https://api.example.com/pages/${pageId}`, {
    signal: AbortSignal.timeout(options.timeout ?? 5000),
  });
  const json = await res.json();
 
  return {
    name: 'api',
    data: { pageViews: json.views, lastUpdated: json.updatedAt },
  };
};

Registering Custom Fields

For your plugin's data to appear in result.metadata, you need to add corresponding resolve rules:

const scraper = createScraper({
  plugins: [metaTags, openGraph, pricePlugin],
  rules: [
    ...DEFAULT_RULES,
    // Map plugin data to result fields
    { field: 'price', sources: [{ plugin: 'price', key: 'price', priority: 1 }] },
    { field: 'currency', sources: [{ plugin: 'price', key: 'currency', priority: 1 }] },
  ],
});

Without rules, your plugin's data is still available in result.sources:

const result = await scraper.scrapeUrl('https://shop.example.com');
result.sources['price']; // { price: "$99.99", currency: "USD" }

Testing Plugins

Use createContext to test your plugins without making HTTP requests:

import { createContext } from 'web-meta-scraper';
 
const html = `
<html>
  <head>
    <meta itemprop="price" content="29.99" />
    <meta itemprop="priceCurrency" content="USD" />
  </head>
</html>
`;
 
const ctx = createContext(html, 'https://example.com', {});
const result = pricePlugin(ctx);
 
console.log(result);
// { name: 'price', data: { price: '29.99', currency: 'USD' } }