Plugins

web-meta-scraper uses a plugin architecture. Each plugin extracts metadata from a specific source in the HTML.

import {
  createScraper, metaTags, openGraph, twitter, jsonLd, oembed,
  favicons, feeds, robots, date, logo, lang, video, audio, iframe,
} from 'web-meta-scraper';
 
const scraper = createScraper({
  plugins: [metaTags, openGraph, twitter, jsonLd, oembed, favicons, feeds, robots, date, logo, lang, video, audio, iframe],
});

Meta Tags

Extracts metadata from standard HTML <meta> tags and other head elements.

import { createScraper, metaTags } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [metaTags] });

Extracted fields:

Field	Source
`title`	`<title>` tag
`description`	`<meta name="description">`
`keywords`	`<meta name="keywords">` (parsed as array)
`author`	`<meta name="author">`
`favicon`	`<link rel="icon">` or `<link rel="shortcut icon">`
`canonicalUrl`	`<link rel="canonical">`

Open Graph

Extracts Open Graph (opens in a new tab) protocol metadata from og: meta tags.

import { createScraper, openGraph } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [openGraph] });

Extracted fields:

Field	Source
`title`	`og:title`
`description`	`og:description`
`image`	`og:image`
`url`	`og:url`
`type`	`og:type` (`website`, `article`, `profile`, `book`, `music`, `video`)
`siteName`	`og:site_name`
`locale`	`og:locale`

Twitter Cards

Extracts Twitter Cards (opens in a new tab) metadata from twitter: meta tags.

import { createScraper, twitter } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [twitter] });

Extracted fields:

Field	Source
`title`	`twitter:title`
`description`	`twitter:description`
`image`	`twitter:image`
`card`	`twitter:card` (`summary`, `summary_large_image`, `app`, `player`)
`site`	`twitter:site`
`creator`	`twitter:creator`

JSON-LD

Extracts JSON-LD (opens in a new tab) structured data from <script type="application/ld+json"> tags.

import { createScraper, jsonLd } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [jsonLd] });

Parsed data is available as jsonLd in result.metadata. Supports Schema.org types including:

Article, Product, Organization, Person
WebSite, FAQPage, BreadcrumbList

Handles @graph notation and arrays. Invalid JSON is silently skipped.

const result = await scraper.scrapeUrl('https://example.com');
 
for (const item of result.metadata.jsonLd ?? []) {
  console.log(item['@type']); // "Article", "Organization", etc.
}

oEmbed

Discovers and fetches oEmbed (opens in a new tab) data from <link type="application/json+oembed"> tags.

import { createScraper, oembed } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [oembed] });

This plugin is async — it makes an additional HTTP request to the oEmbed endpoint. Returns data like:

Field	Description
`type`	Resource type (`photo`, `video`, `link`, `rich`)
`title`	Resource title
`author_name`	Author name
`author_url`	Author URL
`provider_name`	Provider name
`provider_url`	Provider URL
`thumbnail_url`	Thumbnail image URL
`html`	Embed HTML (for `video`/`rich` types)

const result = await scraper.scrapeUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');
console.log(result.metadata.oembed);
// { type: "video", title: "...", author_name: "...", html: "<iframe ...>" }

Favicons

Discovers all favicon and icon references in an HTML document, including standard icons, Apple touch icons, mask icons, and web app manifest links.

import { createScraper, favicons } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [favicons] });

Extracted fields:

Returns a favicons array of FaviconEntry objects:

Field	Description
`url`	Fully resolved URL of the icon resource
`sizes`	Icon dimensions from the `sizes` attribute (e.g. `"180x180"`)
`type`	MIME type (e.g. `"image/png"`) or `"manifest"` for web app manifest links

Scanned selectors:

link[rel="icon"]
link[rel="shortcut icon"]
link[rel="apple-touch-icon"]
link[rel="apple-touch-icon-precomposed"]
link[rel="mask-icon"]
link[rel="manifest"] (recorded with type: "manifest")

Relative URLs are resolved against the page URL. Duplicate URLs are automatically deduplicated.

const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.favicons);
// [
//   { url: "https://example.com/favicon.ico" },
//   { url: "https://example.com/apple-touch-icon.png", sizes: "180x180", type: "image/png" },
//   { url: "https://example.com/manifest.json", type: "manifest" },
// ]

Feeds

Detects RSS and Atom feed links declared in the HTML <head>.

import { createScraper, feeds } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [feeds] });

Extracted fields:

Returns a feeds array of FeedEntry objects:

Field	Description
`url`	Fully resolved URL of the feed
`title`	Human-readable title from the `title` attribute
`type`	`"rss"` for RSS 2.0 feeds, `"atom"` for Atom feeds

Scanned selectors:

link[rel="alternate"][type="application/rss+xml"] → type "rss"
link[rel="alternate"][type="application/atom+xml"] → type "atom"

const result = await scraper.scrapeUrl('https://example.com/blog');
console.log(result.metadata.feeds);
// [
//   { url: "https://example.com/feed.xml", title: "Blog RSS", type: "rss" },
//   { url: "https://example.com/atom.xml", title: "Blog Atom", type: "atom" },
// ]

Robots

Extracts and interprets robots meta tag directives to determine indexing and crawl permissions.

import { createScraper, robots } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [robots] });

Extracted fields:

Returns a robots object of type RobotsInfo:

Field	Type	Description
`directives`	`Array`	Raw directive entries from all robots-related meta tags
`isIndexable`	`boolean`	`true` if the page allows indexing (no `noindex` or `none`)
`isFollowable`	`boolean`	`true` if the page allows link following (no `nofollow` or `none`)
`noarchive`	`boolean`	Whether caching/archiving is prohibited
`nosnippet`	`boolean`	Whether search result snippets are prohibited
`noimageindex`	`boolean`	Whether image indexing is prohibited
`notranslate`	`boolean`	Whether automatic translation is prohibited

Scans <meta name="robots"> and bot-specific variants like <meta name="googlebot">. The boolean flags are derived from the generic robots tag only. Bot-specific directives are preserved in the directives array.

const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.robots);
// {
//   directives: [{ content: "noindex, nofollow", botName: "robots" }],
//   isIndexable: false,
//   isFollowable: false,
//   noarchive: false,
//   nosnippet: false,
//   noimageindex: false,
//   notranslate: false,
// }

Date

Extracts publication and modification dates from multiple sources including Open Graph article tags, Dublin Core metadata, JSON-LD structured data, and HTML <time> elements.

import { createScraper, date } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [date] });

Extracted fields:

Field	Description
`date`	Publication date in ISO 8601 format
`dateModified`	Modification date in ISO 8601 format

Sources (in priority order):

article:published_time / article:modified_time (Open Graph)
meta[name="date"], meta[name="DC.date"], meta[name="DC.date.issued"] (Dublin Core)
JSON-LD datePublished / dateModified
<time datetime> element (fallback)

const result = await scraper.scrapeUrl('https://example.com/article');
console.log(result.metadata.date);         // "2024-01-15T09:00:00.000Z"
console.log(result.metadata.dateModified); // "2024-02-01T12:30:00.000Z"

Logo

Extracts the site logo URL from Open Graph, Schema.org microdata, and JSON-LD structured data.

import { createScraper, logo } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [logo] });

Extracted fields:

Field	Description
`logo`	Fully resolved site logo URL

Sources (in priority order):

meta[property="og:logo"]
meta[itemprop="logo"] / img[itemprop="logo"]
JSON-LD logo field from Organization / Brand / Publisher

const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.logo); // "https://example.com/logo.png"

Lang

Detects the primary language of the document as a BCP 47 language tag.

import { createScraper, lang } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [lang] });

Extracted fields:

Field	Description
`lang`	BCP 47 primary language tag (e.g. `"en"`, `"ko"`, `"ja"`)

Sources (in priority order):

<html lang> attribute
og:locale (normalized — en_US → en)
meta[http-equiv="content-language"]
meta[itemprop="inLanguage"]
JSON-LD inLanguage field

const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.lang); // "en"

Video

Discovers video resources from Open Graph video tags, Twitter player tags, HTML <video> elements, and JSON-LD VideoObject entries.

import { createScraper, video } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [video] });

Extracted fields:

Returns a videos array of VideoEntry objects:

Field	Description
`url`	Fully resolved video URL
`type`	MIME type (e.g. `"video/mp4"`)
`width`	Width in pixels
`height`	Height in pixels

Sources:

og:video, og:video:secure_url, og:video:type, og:video:width, og:video:height
twitter:player, twitter:player:width, twitter:player:height
<video> and <source> elements
JSON-LD VideoObject with contentUrl

const result = await scraper.scrapeUrl('https://example.com/video');
console.log(result.metadata.videos);
// [
//   { url: "https://example.com/video.mp4", type: "video/mp4", width: 1920, height: 1080 },
// ]

Audio

Discovers audio resources from Open Graph audio tags, HTML <audio> elements, and JSON-LD AudioObject entries.

import { createScraper, audio } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [audio] });

Extracted fields:

Returns an audio array of AudioEntry objects:

Field	Description
`url`	Fully resolved audio URL
`type`	MIME type (e.g. `"audio/mpeg"`)

Sources:

og:audio, og:audio:secure_url, og:audio:type
<audio> and <source> elements
JSON-LD AudioObject with contentUrl

const result = await scraper.scrapeUrl('https://example.com/podcast');
console.log(result.metadata.audio);
// [
//   { url: "https://example.com/episode.mp3", type: "audio/mpeg" },
// ]

iFrame

Generates an embeddable iframe HTML snippet from Twitter player tags, with fallback to oEmbed embed HTML.

import { createScraper, iframe } from 'web-meta-scraper';
 
const scraper = createScraper({ plugins: [iframe] });

Extracted fields:

Field	Description
`iframe`	Complete HTML `<iframe>` element string

Sources:

twitter:player with twitter:player:width and twitter:player:height
oEmbed html field (fallback, applied in core)

const result = await scraper.scrapeUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');
console.log(result.metadata.iframe);
// '<iframe src="https://..." width="640" height="360" frameborder="0" allowfullscreen></iframe>'

Combining Plugins

Plugins run in parallel via Promise.all. The order in the array does not affect priority — that's determined by the resolve rules.

// All produce the same merged result:
createScraper({ plugins: [metaTags, openGraph, twitter] });
createScraper({ plugins: [twitter, openGraph, metaTags] });

To create your own plugins, see Custom Plugins.

Quick Start Batch Scraping