MCP-Mark

agents

MCP-Mark evaluates LLMs on their ability to use Model Context Protocol (MCP) tools effectively, testing tool discovery, selection, invocation, and result interpretation across diverse MCP server scenarios.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, tool_calling. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3.7 Max self-reported llm-stats
    60.8%
  2. Kimi K2.6 self-reported llm-stats
    55.9%
  3. Qwen3.6 Plus self-reported llm-stats
    48.2%
  4. Qwen3.5-397B-A17B self-reported llm-stats
    46.1%
  5. DeepSeek-V3.2 self-reported llm-stats
    38.0%