MCP Server

web-meta-scraper-mcp (opens in a new tab) wraps web-meta-scraper as an MCP (Model Context Protocol) (opens in a new tab) server, allowing AI assistants like Claude Code and Claude Desktop to extract web page metadata as a tool.

💡

No installation required — run directly with npx web-meta-scraper-mcp.

Client Setup

claude mcp add web-meta-scraper -- npx -y web-meta-scraper-mcp

Tools

`scrape_url`

Extract metadata from a URL. Uses all built-in plugins including favicons, feeds, and robots.

Parameter	Type	Required	Description
`url`	`string`	Yes	The URL to scrape metadata from

`scrape_html`

Extract metadata from a raw HTML string. Useful when you already have the HTML content.

Parameter	Type	Required	Description
`html`	`string`	Yes	The raw HTML string to extract metadata from
`url`	`string`	No	Base URL for resolving relative paths (e.g. images, favicons)

`batch_scrape`

Scrape metadata from multiple URLs concurrently. Each URL is processed independently — one failure won't stop the rest.

Parameter	Type	Required	Description
`urls`	`string[]`	Yes	List of URLs to scrape
`concurrency`	`number`	No	Number of concurrent requests (default: 5, max: 20)

Example request:

{
  "name": "batch_scrape",
  "arguments": {
    "urls": ["https://github.com", "https://nodejs.org"],
    "concurrency": 3
  }
}

`detect_feeds`

Detect RSS and Atom feed links from a web page.

Parameter	Type	Required	Description
`url`	`string`	No	URL to detect feeds from
`html`	`string`	No	Raw HTML string to detect feeds from

One of url or html is required.

`check_robots`

Check robots meta tag directives and indexing status of a web page.

Parameter	Type	Required	Description
`url`	`string`	No	URL to check robots directives from
`html`	`string`	No	Raw HTML string to check robots directives from

One of url or html is required.

`validate_metadata`

Validate metadata quality and generate an SEO score report (100-point scale) with categorized issues.

Parameter	Type	Required	Description
`url`	`string`	No	URL to scrape and validate metadata from
`html`	`string`	No	Raw HTML string to validate metadata from

`extract_content`

Extract main text content from a web page (removes navigation, ads, sidebars).

Parameter	Type	Required	Description
`url`	`string`	No	URL to extract content from
`html`	`string`	No	Raw HTML string to extract content from

Extracted Metadata

The MCP server enables all built-in plugins by default:

Plugin	Fields
Meta Tags	`title`, `description`, `keywords`, `author`, `favicon`, `canonicalUrl`
Open Graph	`og:title`, `og:description`, `og:image`, `og:url`, `og:type`, `og:site_name`, `og:locale`
Twitter Cards	`twitter:title`, `twitter:description`, `twitter:image`, `twitter:card`, `twitter:site`, `twitter:creator`
JSON-LD	Structured data (`Article`, `Product`, `Organization`, etc.)
Favicons	All icon links with `sizes` and `type`
Feeds	RSS and Atom feed links with `title`
Robots	Robots meta directives and indexability flags

Fields are automatically merged using priority-based rules.

Local Development

# Install dependencies
pnpm install
 
# Build
pnpm --filter web-meta-scraper-mcp build
 
# Test with MCP Inspector
npx @modelcontextprotocol/inspector node mcp/dist/index.js

Error Handling