MCP Server
web-meta-scraper-mcp (opens in a new tab) wraps web-meta-scraper as an MCP (Model Context Protocol) (opens in a new tab) server, allowing AI assistants like Claude Code and Claude Desktop to extract web page metadata as a tool.
No installation required — run directly with npx web-meta-scraper-mcp.
Client Setup
claude mcp add web-meta-scraper -- npx -y web-meta-scraper-mcpTools
scrape_url
Extract metadata from a URL. Uses all built-in plugins including favicons, feeds, and robots.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to scrape metadata from |
scrape_html
Extract metadata from a raw HTML string. Useful when you already have the HTML content.
| Parameter | Type | Required | Description |
|---|---|---|---|
html | string | Yes | The raw HTML string to extract metadata from |
url | string | No | Base URL for resolving relative paths (e.g. images, favicons) |
batch_scrape
Scrape metadata from multiple URLs concurrently. Each URL is processed independently — one failure won't stop the rest.
| Parameter | Type | Required | Description |
|---|---|---|---|
urls | string[] | Yes | List of URLs to scrape |
concurrency | number | No | Number of concurrent requests (default: 5, max: 20) |
Example request:
{
"name": "batch_scrape",
"arguments": {
"urls": ["https://github.com", "https://nodejs.org"],
"concurrency": 3
}
}detect_feeds
Detect RSS and Atom feed links from a web page.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | No | URL to detect feeds from |
html | string | No | Raw HTML string to detect feeds from |
One of url or html is required.
check_robots
Check robots meta tag directives and indexing status of a web page.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | No | URL to check robots directives from |
html | string | No | Raw HTML string to check robots directives from |
One of url or html is required.
validate_metadata
Validate metadata quality and generate an SEO score report (100-point scale) with categorized issues.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | No | URL to scrape and validate metadata from |
html | string | No | Raw HTML string to validate metadata from |
extract_content
Extract main text content from a web page (removes navigation, ads, sidebars).
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | No | URL to extract content from |
html | string | No | Raw HTML string to extract content from |
Extracted Metadata
The MCP server enables all built-in plugins by default:
| Plugin | Fields |
|---|---|
| Meta Tags | title, description, keywords, author, favicon, canonicalUrl |
| Open Graph | og:title, og:description, og:image, og:url, og:type, og:site_name, og:locale |
| Twitter Cards | twitter:title, twitter:description, twitter:image, twitter:card, twitter:site, twitter:creator |
| JSON-LD | Structured data (Article, Product, Organization, etc.) |
| Favicons | All icon links with sizes and type |
| Feeds | RSS and Atom feed links with title |
| Robots | Robots meta directives and indexability flags |
Fields are automatically merged using priority-based rules.
Local Development
# Install dependencies
pnpm install
# Build
pnpm --filter web-meta-scraper-mcp build
# Test with MCP Inspector
npx @modelcontextprotocol/inspector node mcp/dist/index.js