Docs
MCP Server

MCP Server

web-meta-scraper-mcp (opens in a new tab) wraps web-meta-scraper as an MCP (Model Context Protocol) (opens in a new tab) server, allowing AI assistants like Claude Code and Claude Desktop to extract web page metadata as a tool.

💡

No installation required — run directly with npx web-meta-scraper-mcp.

Client Setup

claude mcp add web-meta-scraper -- npx -y web-meta-scraper-mcp

Tools

scrape_url

Extract metadata from a URL. Uses all built-in plugins including favicons, feeds, and robots.

ParameterTypeRequiredDescription
urlstringYesThe URL to scrape metadata from

scrape_html

Extract metadata from a raw HTML string. Useful when you already have the HTML content.

ParameterTypeRequiredDescription
htmlstringYesThe raw HTML string to extract metadata from
urlstringNoBase URL for resolving relative paths (e.g. images, favicons)

batch_scrape

Scrape metadata from multiple URLs concurrently. Each URL is processed independently — one failure won't stop the rest.

ParameterTypeRequiredDescription
urlsstring[]YesList of URLs to scrape
concurrencynumberNoNumber of concurrent requests (default: 5, max: 20)

Example request:

{
  "name": "batch_scrape",
  "arguments": {
    "urls": ["https://github.com", "https://nodejs.org"],
    "concurrency": 3
  }
}

detect_feeds

Detect RSS and Atom feed links from a web page.

ParameterTypeRequiredDescription
urlstringNoURL to detect feeds from
htmlstringNoRaw HTML string to detect feeds from

One of url or html is required.

check_robots

Check robots meta tag directives and indexing status of a web page.

ParameterTypeRequiredDescription
urlstringNoURL to check robots directives from
htmlstringNoRaw HTML string to check robots directives from

One of url or html is required.

validate_metadata

Validate metadata quality and generate an SEO score report (100-point scale) with categorized issues.

ParameterTypeRequiredDescription
urlstringNoURL to scrape and validate metadata from
htmlstringNoRaw HTML string to validate metadata from

extract_content

Extract main text content from a web page (removes navigation, ads, sidebars).

ParameterTypeRequiredDescription
urlstringNoURL to extract content from
htmlstringNoRaw HTML string to extract content from

Extracted Metadata

The MCP server enables all built-in plugins by default:

PluginFields
Meta Tagstitle, description, keywords, author, favicon, canonicalUrl
Open Graphog:title, og:description, og:image, og:url, og:type, og:site_name, og:locale
Twitter Cardstwitter:title, twitter:description, twitter:image, twitter:card, twitter:site, twitter:creator
JSON-LDStructured data (Article, Product, Organization, etc.)
FaviconsAll icon links with sizes and type
FeedsRSS and Atom feed links with title
RobotsRobots meta directives and indexability flags

Fields are automatically merged using priority-based rules.

Local Development

# Install dependencies
pnpm install
 
# Build
pnpm --filter web-meta-scraper-mcp build
 
# Test with MCP Inspector
npx @modelcontextprotocol/inspector node mcp/dist/index.js