Custom Plugins
Create your own plugins to extract any data from the HTML.
Plugin Interface
type Plugin = (ctx: ScrapeContext) => PluginResult | Promise<PluginResult>;
interface ScrapeContext {
$: CheerioAPI; // Cheerio instance for DOM queries
url?: string; // Page URL (if available)
options: ScraperOptions;
}
interface PluginResult {
name: string; // Unique plugin name
data: Record<string, unknown>; // Extracted data
}Basic Example
import { createScraper, openGraph, DEFAULT_RULES, type Plugin } from 'web-meta-scraper';
const pricePlugin: Plugin = (ctx) => {
const { $ } = ctx;
const price = $('[itemprop="price"]').attr('content');
const currency = $('[itemprop="priceCurrency"]').attr('content');
return {
name: 'price',
data: { price, currency },
};
};
const scraper = createScraper({
plugins: [openGraph, pricePlugin],
rules: [
...DEFAULT_RULES,
{ field: 'price', sources: [{ plugin: 'price', key: 'price', priority: 1 }] },
{ field: 'currency', sources: [{ plugin: 'price', key: 'currency', priority: 1 }] },
],
});
const result = await scraper.scrapeUrl('https://shop.example.com');
console.log(result.metadata); // { title: "...", price: "$99.99", currency: "USD", ... }Using the Context
The ScrapeContext gives you:
$— A Cheerio (opens in a new tab) instance preloaded with the page HTML. Use it like jQuery.url— The page URL (set forscrapeUrl, or passed as option toscrape). Useful for resolving relative URLs.options— Scraper options includingtimeout.
import { resolveUrl } from 'web-meta-scraper'; // URL utility
const imagePlugin: Plugin = (ctx) => {
const { $, url } = ctx;
const src = $('img.hero').attr('src');
const absoluteSrc = src ? resolveUrl(src, url) : undefined;
return {
name: 'hero-image',
data: { heroImage: absoluteSrc },
};
};Async Plugins
Plugins can be async. All plugins run in parallel via Promise.all:
const apiPlugin: Plugin = async (ctx) => {
const { $, options } = ctx;
const pageId = $('meta[name="page-id"]').attr('content');
if (!pageId) return { name: 'api', data: {} };
const res = await fetch(`https://api.example.com/pages/${pageId}`, {
signal: AbortSignal.timeout(options.timeout ?? 5000),
});
const json = await res.json();
return {
name: 'api',
data: { pageViews: json.views, lastUpdated: json.updatedAt },
};
};Registering Custom Fields
For your plugin's data to appear in result.metadata, you need to add corresponding resolve rules:
const scraper = createScraper({
plugins: [metaTags, openGraph, pricePlugin],
rules: [
...DEFAULT_RULES,
// Map plugin data to result fields
{ field: 'price', sources: [{ plugin: 'price', key: 'price', priority: 1 }] },
{ field: 'currency', sources: [{ plugin: 'price', key: 'currency', priority: 1 }] },
],
});Without rules, your plugin's data is still available in result.sources:
const result = await scraper.scrapeUrl('https://shop.example.com');
result.sources['price']; // { price: "$99.99", currency: "USD" }Testing Plugins
Use createContext to test your plugins without making HTTP requests:
import { createContext } from 'web-meta-scraper';
const html = `
<html>
<head>
<meta itemprop="price" content="29.99" />
<meta itemprop="priceCurrency" content="USD" />
</head>
</html>
`;
const ctx = createContext(html, 'https://example.com', {});
const result = pricePlugin(ctx);
console.log(result);
// { name: 'price', data: { price: '29.99', currency: 'USD' } }