Docs
Plugins

Plugins

web-meta-scraper uses a plugin architecture. Each plugin extracts metadata from a specific source in the HTML.

import {
  createScraper, metaTags, openGraph, twitter, jsonLd, oembed,
  favicons, feeds, robots, date, logo, lang, video, audio, iframe,
} from 'web-meta-scraper';
 
const scraper = createScraper({
  plugins: [metaTags, openGraph, twitter, jsonLd, oembed, favicons, feeds, robots, date, logo, lang, video, audio, iframe],
});

Meta Tags

Extracts metadata from standard HTML <meta> tags and other head elements.

import { createScraper, metaTags } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [metaTags] });

Extracted fields:

FieldSource
title<title> tag
description<meta name="description">
keywords<meta name="keywords"> (parsed as array)
author<meta name="author">
favicon<link rel="icon"> or <link rel="shortcut icon">
canonicalUrl<link rel="canonical">

Open Graph

Extracts Open Graph (opens in a new tab) protocol metadata from og: meta tags.

import { createScraper, openGraph } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [openGraph] });

Extracted fields:

FieldSource
titleog:title
descriptionog:description
imageog:image
urlog:url
typeog:type (website, article, profile, book, music, video)
siteNameog:site_name
localeog:locale

Twitter Cards

Extracts Twitter Cards (opens in a new tab) metadata from twitter: meta tags.

import { createScraper, twitter } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [twitter] });

Extracted fields:

FieldSource
titletwitter:title
descriptiontwitter:description
imagetwitter:image
cardtwitter:card (summary, summary_large_image, app, player)
sitetwitter:site
creatortwitter:creator

JSON-LD

Extracts JSON-LD (opens in a new tab) structured data from <script type="application/ld+json"> tags.

import { createScraper, jsonLd } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [jsonLd] });

Parsed data is available as jsonLd in result.metadata. Supports Schema.org types including:

  • Article, Product, Organization, Person
  • WebSite, FAQPage, BreadcrumbList

Handles @graph notation and arrays. Invalid JSON is silently skipped.

const result = await scraper.scrapeUrl('https://example.com');
 
for (const item of result.metadata.jsonLd ?? []) {
  console.log(item['@type']); // "Article", "Organization", etc.
}

oEmbed

Discovers and fetches oEmbed (opens in a new tab) data from <link type="application/json+oembed"> tags.

import { createScraper, oembed } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [oembed] });

This plugin is async — it makes an additional HTTP request to the oEmbed endpoint. Returns data like:

FieldDescription
typeResource type (photo, video, link, rich)
titleResource title
author_nameAuthor name
author_urlAuthor URL
provider_nameProvider name
provider_urlProvider URL
thumbnail_urlThumbnail image URL
htmlEmbed HTML (for video/rich types)
const result = await scraper.scrapeUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');
console.log(result.metadata.oembed);
// { type: "video", title: "...", author_name: "...", html: "<iframe ...>" }

Favicons

Discovers all favicon and icon references in an HTML document, including standard icons, Apple touch icons, mask icons, and web app manifest links.

import { createScraper, favicons } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [favicons] });

Extracted fields:

Returns a favicons array of FaviconEntry objects:

FieldDescription
urlFully resolved URL of the icon resource
sizesIcon dimensions from the sizes attribute (e.g. "180x180")
typeMIME type (e.g. "image/png") or "manifest" for web app manifest links

Scanned selectors:

  • link[rel="icon"]
  • link[rel="shortcut icon"]
  • link[rel="apple-touch-icon"]
  • link[rel="apple-touch-icon-precomposed"]
  • link[rel="mask-icon"]
  • link[rel="manifest"] (recorded with type: "manifest")

Relative URLs are resolved against the page URL. Duplicate URLs are automatically deduplicated.

const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.favicons);
// [
//   { url: "https://example.com/favicon.ico" },
//   { url: "https://example.com/apple-touch-icon.png", sizes: "180x180", type: "image/png" },
//   { url: "https://example.com/manifest.json", type: "manifest" },
// ]

Feeds

Detects RSS and Atom feed links declared in the HTML <head>.

import { createScraper, feeds } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [feeds] });

Extracted fields:

Returns a feeds array of FeedEntry objects:

FieldDescription
urlFully resolved URL of the feed
titleHuman-readable title from the title attribute
type"rss" for RSS 2.0 feeds, "atom" for Atom feeds

Scanned selectors:

  • link[rel="alternate"][type="application/rss+xml"] → type "rss"
  • link[rel="alternate"][type="application/atom+xml"] → type "atom"
const result = await scraper.scrapeUrl('https://example.com/blog');
console.log(result.metadata.feeds);
// [
//   { url: "https://example.com/feed.xml", title: "Blog RSS", type: "rss" },
//   { url: "https://example.com/atom.xml", title: "Blog Atom", type: "atom" },
// ]

Robots

Extracts and interprets robots meta tag directives to determine indexing and crawl permissions.

import { createScraper, robots } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [robots] });

Extracted fields:

Returns a robots object of type RobotsInfo:

FieldTypeDescription
directivesArrayRaw directive entries from all robots-related meta tags
isIndexablebooleantrue if the page allows indexing (no noindex or none)
isFollowablebooleantrue if the page allows link following (no nofollow or none)
noarchivebooleanWhether caching/archiving is prohibited
nosnippetbooleanWhether search result snippets are prohibited
noimageindexbooleanWhether image indexing is prohibited
notranslatebooleanWhether automatic translation is prohibited

Scans <meta name="robots"> and bot-specific variants like <meta name="googlebot">. The boolean flags are derived from the generic robots tag only. Bot-specific directives are preserved in the directives array.

const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.robots);
// {
//   directives: [{ content: "noindex, nofollow", botName: "robots" }],
//   isIndexable: false,
//   isFollowable: false,
//   noarchive: false,
//   nosnippet: false,
//   noimageindex: false,
//   notranslate: false,
// }

Date

Extracts publication and modification dates from multiple sources including Open Graph article tags, Dublin Core metadata, JSON-LD structured data, and HTML <time> elements.

import { createScraper, date } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [date] });

Extracted fields:

FieldDescription
datePublication date in ISO 8601 format
dateModifiedModification date in ISO 8601 format

Sources (in priority order):

  1. article:published_time / article:modified_time (Open Graph)
  2. meta[name="date"], meta[name="DC.date"], meta[name="DC.date.issued"] (Dublin Core)
  3. JSON-LD datePublished / dateModified
  4. <time datetime> element (fallback)
const result = await scraper.scrapeUrl('https://example.com/article');
console.log(result.metadata.date);         // "2024-01-15T09:00:00.000Z"
console.log(result.metadata.dateModified); // "2024-02-01T12:30:00.000Z"

Logo

Extracts the site logo URL from Open Graph, Schema.org microdata, and JSON-LD structured data.

import { createScraper, logo } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [logo] });

Extracted fields:

FieldDescription
logoFully resolved site logo URL

Sources (in priority order):

  1. meta[property="og:logo"]
  2. meta[itemprop="logo"] / img[itemprop="logo"]
  3. JSON-LD logo field from Organization / Brand / Publisher
const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.logo); // "https://example.com/logo.png"

Lang

Detects the primary language of the document as a BCP 47 language tag.

import { createScraper, lang } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [lang] });

Extracted fields:

FieldDescription
langBCP 47 primary language tag (e.g. "en", "ko", "ja")

Sources (in priority order):

  1. <html lang> attribute
  2. og:locale (normalized — en_USen)
  3. meta[http-equiv="content-language"]
  4. meta[itemprop="inLanguage"]
  5. JSON-LD inLanguage field
const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.lang); // "en"

Video

Discovers video resources from Open Graph video tags, Twitter player tags, HTML <video> elements, and JSON-LD VideoObject entries.

import { createScraper, video } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [video] });

Extracted fields:

Returns a videos array of VideoEntry objects:

FieldDescription
urlFully resolved video URL
typeMIME type (e.g. "video/mp4")
widthWidth in pixels
heightHeight in pixels

Sources:

  • og:video, og:video:secure_url, og:video:type, og:video:width, og:video:height
  • twitter:player, twitter:player:width, twitter:player:height
  • <video> and <source> elements
  • JSON-LD VideoObject with contentUrl
const result = await scraper.scrapeUrl('https://example.com/video');
console.log(result.metadata.videos);
// [
//   { url: "https://example.com/video.mp4", type: "video/mp4", width: 1920, height: 1080 },
// ]

Audio

Discovers audio resources from Open Graph audio tags, HTML <audio> elements, and JSON-LD AudioObject entries.

import { createScraper, audio } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [audio] });

Extracted fields:

Returns an audio array of AudioEntry objects:

FieldDescription
urlFully resolved audio URL
typeMIME type (e.g. "audio/mpeg")

Sources:

  • og:audio, og:audio:secure_url, og:audio:type
  • <audio> and <source> elements
  • JSON-LD AudioObject with contentUrl
const result = await scraper.scrapeUrl('https://example.com/podcast');
console.log(result.metadata.audio);
// [
//   { url: "https://example.com/episode.mp3", type: "audio/mpeg" },
// ]

iFrame

Generates an embeddable iframe HTML snippet from Twitter player tags, with fallback to oEmbed embed HTML.

import { createScraper, iframe } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [iframe] });

Extracted fields:

FieldDescription
iframeComplete HTML <iframe> element string

Sources:

  1. twitter:player with twitter:player:width and twitter:player:height
  2. oEmbed html field (fallback, applied in core)
const result = await scraper.scrapeUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');
console.log(result.metadata.iframe);
// '<iframe src="https://..." width="640" height="360" frameborder="0" allowfullscreen></iframe>'

Combining Plugins

Plugins run in parallel via Promise.all. The order in the array does not affect priority — that's determined by the resolve rules.

// All produce the same merged result:
createScraper({ plugins: [metaTags, openGraph, twitter] });
createScraper({ plugins: [twitter, openGraph, metaTags] });

To create your own plugins, see Custom Plugins.