Plugins
web-meta-scraper uses a plugin architecture. Each plugin extracts metadata from a specific source in the HTML.
import {
createScraper, metaTags, openGraph, twitter, jsonLd, oembed,
favicons, feeds, robots, date, logo, lang, video, audio, iframe,
} from 'web-meta-scraper';
const scraper = createScraper({
plugins: [metaTags, openGraph, twitter, jsonLd, oembed, favicons, feeds, robots, date, logo, lang, video, audio, iframe],
});Meta Tags
Extracts metadata from standard HTML <meta> tags and other head elements.
import { createScraper, metaTags } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [metaTags] });Extracted fields:
| Field | Source |
|---|---|
title | <title> tag |
description | <meta name="description"> |
keywords | <meta name="keywords"> (parsed as array) |
author | <meta name="author"> |
favicon | <link rel="icon"> or <link rel="shortcut icon"> |
canonicalUrl | <link rel="canonical"> |
Open Graph
Extracts Open Graph (opens in a new tab) protocol metadata from og: meta tags.
import { createScraper, openGraph } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [openGraph] });Extracted fields:
| Field | Source |
|---|---|
title | og:title |
description | og:description |
image | og:image |
url | og:url |
type | og:type (website, article, profile, book, music, video) |
siteName | og:site_name |
locale | og:locale |
Twitter Cards
Extracts Twitter Cards (opens in a new tab) metadata from twitter: meta tags.
import { createScraper, twitter } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [twitter] });Extracted fields:
| Field | Source |
|---|---|
title | twitter:title |
description | twitter:description |
image | twitter:image |
card | twitter:card (summary, summary_large_image, app, player) |
site | twitter:site |
creator | twitter:creator |
JSON-LD
Extracts JSON-LD (opens in a new tab) structured data from <script type="application/ld+json"> tags.
import { createScraper, jsonLd } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [jsonLd] });Parsed data is available as jsonLd in result.metadata. Supports Schema.org types including:
Article,Product,Organization,PersonWebSite,FAQPage,BreadcrumbList
Handles @graph notation and arrays. Invalid JSON is silently skipped.
const result = await scraper.scrapeUrl('https://example.com');
for (const item of result.metadata.jsonLd ?? []) {
console.log(item['@type']); // "Article", "Organization", etc.
}oEmbed
Discovers and fetches oEmbed (opens in a new tab) data from <link type="application/json+oembed"> tags.
import { createScraper, oembed } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [oembed] });This plugin is async — it makes an additional HTTP request to the oEmbed endpoint. Returns data like:
| Field | Description |
|---|---|
type | Resource type (photo, video, link, rich) |
title | Resource title |
author_name | Author name |
author_url | Author URL |
provider_name | Provider name |
provider_url | Provider URL |
thumbnail_url | Thumbnail image URL |
html | Embed HTML (for video/rich types) |
const result = await scraper.scrapeUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');
console.log(result.metadata.oembed);
// { type: "video", title: "...", author_name: "...", html: "<iframe ...>" }Favicons
Discovers all favicon and icon references in an HTML document, including standard icons, Apple touch icons, mask icons, and web app manifest links.
import { createScraper, favicons } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [favicons] });Extracted fields:
Returns a favicons array of FaviconEntry objects:
| Field | Description |
|---|---|
url | Fully resolved URL of the icon resource |
sizes | Icon dimensions from the sizes attribute (e.g. "180x180") |
type | MIME type (e.g. "image/png") or "manifest" for web app manifest links |
Scanned selectors:
link[rel="icon"]link[rel="shortcut icon"]link[rel="apple-touch-icon"]link[rel="apple-touch-icon-precomposed"]link[rel="mask-icon"]link[rel="manifest"](recorded withtype: "manifest")
Relative URLs are resolved against the page URL. Duplicate URLs are automatically deduplicated.
const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.favicons);
// [
// { url: "https://example.com/favicon.ico" },
// { url: "https://example.com/apple-touch-icon.png", sizes: "180x180", type: "image/png" },
// { url: "https://example.com/manifest.json", type: "manifest" },
// ]Feeds
Detects RSS and Atom feed links declared in the HTML <head>.
import { createScraper, feeds } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [feeds] });Extracted fields:
Returns a feeds array of FeedEntry objects:
| Field | Description |
|---|---|
url | Fully resolved URL of the feed |
title | Human-readable title from the title attribute |
type | "rss" for RSS 2.0 feeds, "atom" for Atom feeds |
Scanned selectors:
link[rel="alternate"][type="application/rss+xml"]→ type"rss"link[rel="alternate"][type="application/atom+xml"]→ type"atom"
const result = await scraper.scrapeUrl('https://example.com/blog');
console.log(result.metadata.feeds);
// [
// { url: "https://example.com/feed.xml", title: "Blog RSS", type: "rss" },
// { url: "https://example.com/atom.xml", title: "Blog Atom", type: "atom" },
// ]Robots
Extracts and interprets robots meta tag directives to determine indexing and crawl permissions.
import { createScraper, robots } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [robots] });Extracted fields:
Returns a robots object of type RobotsInfo:
| Field | Type | Description |
|---|---|---|
directives | Array | Raw directive entries from all robots-related meta tags |
isIndexable | boolean | true if the page allows indexing (no noindex or none) |
isFollowable | boolean | true if the page allows link following (no nofollow or none) |
noarchive | boolean | Whether caching/archiving is prohibited |
nosnippet | boolean | Whether search result snippets are prohibited |
noimageindex | boolean | Whether image indexing is prohibited |
notranslate | boolean | Whether automatic translation is prohibited |
Scans <meta name="robots"> and bot-specific variants like <meta name="googlebot">. The boolean flags are derived from the generic robots tag only. Bot-specific directives are preserved in the directives array.
const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.robots);
// {
// directives: [{ content: "noindex, nofollow", botName: "robots" }],
// isIndexable: false,
// isFollowable: false,
// noarchive: false,
// nosnippet: false,
// noimageindex: false,
// notranslate: false,
// }Date
Extracts publication and modification dates from multiple sources including Open Graph article tags, Dublin Core metadata, JSON-LD structured data, and HTML <time> elements.
import { createScraper, date } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [date] });Extracted fields:
| Field | Description |
|---|---|
date | Publication date in ISO 8601 format |
dateModified | Modification date in ISO 8601 format |
Sources (in priority order):
article:published_time/article:modified_time(Open Graph)meta[name="date"],meta[name="DC.date"],meta[name="DC.date.issued"](Dublin Core)- JSON-LD
datePublished/dateModified <time datetime>element (fallback)
const result = await scraper.scrapeUrl('https://example.com/article');
console.log(result.metadata.date); // "2024-01-15T09:00:00.000Z"
console.log(result.metadata.dateModified); // "2024-02-01T12:30:00.000Z"Logo
Extracts the site logo URL from Open Graph, Schema.org microdata, and JSON-LD structured data.
import { createScraper, logo } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [logo] });Extracted fields:
| Field | Description |
|---|---|
logo | Fully resolved site logo URL |
Sources (in priority order):
meta[property="og:logo"]meta[itemprop="logo"]/img[itemprop="logo"]- JSON-LD
logofield fromOrganization/Brand/Publisher
const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.logo); // "https://example.com/logo.png"Lang
Detects the primary language of the document as a BCP 47 language tag.
import { createScraper, lang } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [lang] });Extracted fields:
| Field | Description |
|---|---|
lang | BCP 47 primary language tag (e.g. "en", "ko", "ja") |
Sources (in priority order):
<html lang>attributeog:locale(normalized —en_US→en)meta[http-equiv="content-language"]meta[itemprop="inLanguage"]- JSON-LD
inLanguagefield
const result = await scraper.scrapeUrl('https://example.com');
console.log(result.metadata.lang); // "en"Video
Discovers video resources from Open Graph video tags, Twitter player tags, HTML <video> elements, and JSON-LD VideoObject entries.
import { createScraper, video } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [video] });Extracted fields:
Returns a videos array of VideoEntry objects:
| Field | Description |
|---|---|
url | Fully resolved video URL |
type | MIME type (e.g. "video/mp4") |
width | Width in pixels |
height | Height in pixels |
Sources:
og:video,og:video:secure_url,og:video:type,og:video:width,og:video:heighttwitter:player,twitter:player:width,twitter:player:height<video>and<source>elements- JSON-LD
VideoObjectwithcontentUrl
const result = await scraper.scrapeUrl('https://example.com/video');
console.log(result.metadata.videos);
// [
// { url: "https://example.com/video.mp4", type: "video/mp4", width: 1920, height: 1080 },
// ]Audio
Discovers audio resources from Open Graph audio tags, HTML <audio> elements, and JSON-LD AudioObject entries.
import { createScraper, audio } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [audio] });Extracted fields:
Returns an audio array of AudioEntry objects:
| Field | Description |
|---|---|
url | Fully resolved audio URL |
type | MIME type (e.g. "audio/mpeg") |
Sources:
og:audio,og:audio:secure_url,og:audio:type<audio>and<source>elements- JSON-LD
AudioObjectwithcontentUrl
const result = await scraper.scrapeUrl('https://example.com/podcast');
console.log(result.metadata.audio);
// [
// { url: "https://example.com/episode.mp3", type: "audio/mpeg" },
// ]iFrame
Generates an embeddable iframe HTML snippet from Twitter player tags, with fallback to oEmbed embed HTML.
import { createScraper, iframe } from 'web-meta-scraper';
const scraper = createScraper({ plugins: [iframe] });Extracted fields:
| Field | Description |
|---|---|
iframe | Complete HTML <iframe> element string |
Sources:
twitter:playerwithtwitter:player:widthandtwitter:player:height- oEmbed
htmlfield (fallback, applied in core)
const result = await scraper.scrapeUrl('https://www.youtube.com/watch?v=dQw4w9WgXcQ');
console.log(result.metadata.iframe);
// '<iframe src="https://..." width="640" height="360" frameborder="0" allowfullscreen></iframe>'Combining Plugins
Plugins run in parallel via Promise.all. The order in the array does not affect priority — that's determined by the resolve rules.
// All produce the same merged result:
createScraper({ plugins: [metaTags, openGraph, twitter] });
createScraper({ plugins: [twitter, openGraph, metaTags] });To create your own plugins, see Custom Plugins.