Independent Tracker Monitors Claude Code Opus 4.6 Performance for Degradation
Marginlab has launched an independent service to continuously monitor the performance of Anthropic's Claude Code model, specifically Opus 4.6, on software engineering tasks. This initiative aims to proactively detect statistically significant performance degradation over time, addressing concerns raised after Anthropic's September 2025 postmortem. The daily evaluations use a contamination-resistant subset of SWE-Bench-Pro to provide real-world user expectation metrics.