Quick Start
The Simple Way — scrape()
The scrape() function auto-detects URL vs HTML and uses all built-in plugins:
import { scrape } from 'web-meta-scraper';
// From URL — fetches and parses automatically
const result = await scrape('https://example.com');
// From HTML string
const result = await scrape('<html><head><title>Hello</title></head></html>');
console.log(result.metadata);
// {
// title: "Example Domain",
// description: "This domain is for use in illustrative examples.",
// image: "https://example.com/og-image.png",
// url: "https://example.com",
// type: "website",
// siteName: "Example",
// favicon: "https://example.com/favicon.ico",
// ...
// }The Full Way — createScraper()
Use createScraper() when you need control over which plugins to use, fetch behavior, or post-processing:
import { createScraper, metaTags, openGraph, twitter } from 'web-meta-scraper';
const scraper = createScraper({
plugins: [metaTags, openGraph, twitter],
fetch: {
timeout: 10000,
userAgent: 'MyBot/1.0',
},
postProcess: {
maxDescriptionLength: 100,
secureImages: true,
},
});The scraper object has two methods:
scraper.scrapeUrl(url)
Fetches the page and extracts metadata:
const result = await scraper.scrapeUrl('https://example.com');scraper.scrape(html, options?)
Parses raw HTML. Optionally pass a url for resolving relative paths:
const html = `
<html>
<head>
<title>My Page</title>
<meta property="og:title" content="My Page - OG" />
<meta property="og:description" content="A description from Open Graph" />
<link rel="icon" href="/favicon.ico" />
</head>
</html>
`;
const result = await scraper.scrape(html, { url: 'https://example.com' });
console.log(result.metadata.title); // "My Page - OG" (OG has higher priority)
console.log(result.metadata.favicon); // "https://example.com/favicon.ico" (resolved)Result Structure
Both methods return a ScraperResult:
interface ScraperResult {
metadata: ResolvedMetadata; // Merged metadata
sources: Record<string, Record<string, unknown>>; // Raw plugin outputs
}The sources field gives you access to each plugin's raw output for debugging or custom logic:
const result = await scrape('https://example.com');
// Merged result
result.metadata.title; // "Example" (from OG, highest priority)
// Raw per-plugin data
result.sources['open-graph'].title; // "Example"
result.sources['meta-tags'].title; // "Example Domain"
result.sources['twitter'].title; // "Example | Twitter"See Options for the full configuration reference.