Docs
Introduction

Introduction

web-meta-scraper is a TypeScript library that extracts metadata from web pages using a plugin-based architecture.

Why web-meta-scraper?

web-meta-scrapermetascraperopen-graph-scraper
Dependencies1 (cheerio)10+4+
Bundle size~5KB min+gzip~50KB+~15KB+
Plugin systemComposable pluginsRule-basedMonolithic
Custom pluginsSimple functionComplex rulesNot supported
TypeScriptFirst-classPartialPartial
oEmbedBuilt-in pluginSeparate packageNot supported
Custom rulesConfigurableFixedFixed
HTTP clientNative fetch()gotundici

How It Works

  1. Create a scraper with createScraper() (or use the scrape() shorthand).
  2. The scraper fetches HTML from a URL or accepts raw HTML.
  3. Each plugin extracts metadata from the HTML via the ScrapeContext.
  4. The resolver merges results using priority rules — e.g., og:title wins over <title>.
  5. You get a ScraperResult with merged metadata and raw sources.
import { scrape } from 'web-meta-scraper';
 
const result = await scrape('https://example.com');
 
console.log(result.metadata.title);       // Best title from available sources
console.log(result.metadata.description);  // Best description
console.log(result.metadata.image);        // Best image URL
 
// See what each plugin extracted
console.log(result.sources['open-graph']); // { title: "...", image: "..." }
console.log(result.sources['meta-tags']);   // { title: "...", keywords: [...] }

Priority Merging

When the same field exists in multiple sources, the highest-priority value wins:

FieldPriority (high → low)
titleOpen Graph → Meta Tags → Twitter
descriptionOpen Graph → Meta Tags → Twitter
imageOpen Graph → Twitter
urlOpen Graph → Meta Tags (canonical)

Source-specific fields (twitterCard, siteName, locale, jsonLd, oembed) are always included directly.

You can customize these rules with the rules option in createScraper().

Next Steps