Docs
Quick Start

Quick Start

The Simple Way — scrape()

The scrape() function auto-detects URL vs HTML and uses all built-in plugins:

import { scrape } from 'web-meta-scraper';
 
// From URL — fetches and parses automatically
const result = await scrape('https://example.com');
 
// From HTML string
const result = await scrape('<html><head><title>Hello</title></head></html>');
 
console.log(result.metadata);
// {
//   title: "Example Domain",
//   description: "This domain is for use in illustrative examples.",
//   image: "https://example.com/og-image.png",
//   url: "https://example.com",
//   type: "website",
//   siteName: "Example",
//   favicon: "https://example.com/favicon.ico",
//   ...
// }

The Full Way — createScraper()

Use createScraper() when you need control over which plugins to use, fetch behavior, or post-processing:

import { createScraper, metaTags, openGraph, twitter } from 'web-meta-scraper';
 
const scraper = createScraper({
  plugins: [metaTags, openGraph, twitter],
  fetch: {
    timeout: 10000,
    userAgent: 'MyBot/1.0',
  },
  postProcess: {
    maxDescriptionLength: 100,
    secureImages: true,
  },
});

The scraper object has two methods:

scraper.scrapeUrl(url)

Fetches the page and extracts metadata:

const result = await scraper.scrapeUrl('https://example.com');

scraper.scrape(html, options?)

Parses raw HTML. Optionally pass a url for resolving relative paths:

const html = `
<html>
  <head>
    <title>My Page</title>
    <meta property="og:title" content="My Page - OG" />
    <meta property="og:description" content="A description from Open Graph" />
    <link rel="icon" href="/favicon.ico" />
  </head>
</html>
`;
 
const result = await scraper.scrape(html, { url: 'https://example.com' });
 
console.log(result.metadata.title);    // "My Page - OG" (OG has higher priority)
console.log(result.metadata.favicon);  // "https://example.com/favicon.ico" (resolved)

Result Structure

Both methods return a ScraperResult:

interface ScraperResult {
  metadata: ResolvedMetadata;                    // Merged metadata
  sources: Record<string, Record<string, unknown>>; // Raw plugin outputs
}

The sources field gives you access to each plugin's raw output for debugging or custom logic:

const result = await scrape('https://example.com');
 
// Merged result
result.metadata.title; // "Example" (from OG, highest priority)
 
// Raw per-plugin data
result.sources['open-graph'].title;  // "Example"
result.sources['meta-tags'].title;   // "Example Domain"
result.sources['twitter'].title;     // "Example | Twitter"

See Options for the full configuration reference.