Claude Sonnet5发布，性能接近Opus4.8，价格只有60%

6.3

深览指数

科技虎嗅·夕小瑶科技说©·9小时前·AI 生成

Claude Sonnet5发布，性能接近Opus4.8，价格只有60%

Anthropic 发布 Claude Sonnet 5，定位为新一代主力 agentic 模型，在 SWE-bench、Humanity's Last Exam 等多项基准测试中性能显著超越前代 Sonnet 4.6，部分指标已接近旗舰 Opus 4.8，但仍有差距。API 标价仅为 Opus 4.8 的 60%，且初期有最低 40% 的优惠价。文章通过 Benchmark 数据和第三方评估指出，尽管 Sonnet 5 单价更低，但在特定任务中的实际单次推理成本可能高于 Opus 4.8，提醒开发者不应只看 API 标价。该文适合负责模型选型、成本控制的 AI 开发者和技术团队，以及关注大模型竞争格局的分析人士阅读。原文 ↗原文 ↗

核心观点

▍Claude Sonnet 5 是 Anthropic 新一代主力 agentic 模型，在性能接近旗舰 Opus 4.8 的同时，标价仅为后者 60%，定位是让 agentic 能力从高端模型下沉到中端模型。

01在 SWE-bench Pro 上，Sonnet 5 得分 63.2%，高于 Sonnet 4.6 的 58.1%，但低于 Opus 4.8 的 69.2%。
02在 Humanity’s Last Exam（无工具）上，Sonnet 5 得分 43.2%，Sonnet 4.6 为 34.6%，Opus 4.8 为 49.8%；开启工具后 Sonnet 5 上升到 57.4%，接近 Opus 4.8。
03Sonnet 5 标准 API 价格为每百万输入 token 2 美元、输出 10 美元，2026 年 8 月 31 日前优惠价为此价格的 40%。
04Cursor 的 CursorBench 3.1 显示，Sonnet 5 high default 得分 57%，Sonnet 4.6 为 49%，但平均单任务成本低于 Opus 4.8 high。
05OSWorld-Verified 上，Sonnet 5 得分 81.2%，Sonnet 4.6 为 78.5%，Opus 4.8 为 83.4%。
06在 Artificial Analysis Intelligence 榜单中，Sonnet 5 max 得分 53，与 GPT-5.5 high 同档，低于 Opus 4.8 high。
07有网友实测对比：Sonnet 5 用 3.36 美元/2 分 11 秒完成任务，Opus 4.8 Ultracode 用 20.66 美元/20 分 15 秒。

反方 / 局限

— 在 Cost per Intelligence Index Task 指标上，Sonnet 5 max 单任务成本为 2.29 美元，反而高于 Opus 4.8 max 的 1.80 美元，提示实际成本受推理量、输出量等因素影响，不能只看 API 单价。
— Sonnet 5 启用了新的 Tokenizer，同样的文本会被切成更多 token，这意味着实际成本可能因 token 数量增加而上升。
— Anthropic 的自动化行为审计显示，Sonnet 5 的失准行为率虽低于 Sonnet 4.6，但略高于更强的 Opus 4.8 与 Claude Mythos Preview。

Claude Sonnet 5 Claude Opus 4.8 Claude Sonnet 4.6 Anthropic Cursor SWE-bench Pro Humanity's Last Exam OSWorld-Verified CursorBench 3.1 Artificial Analysis Intelligence

10 分钟 · 4 卡片 · 11 资料

读原文 →

Claude Sonnet5发布，性能接近Opus4.8，价格只有60%

概念锚点

前置背景

平行视角

未来推演