SWE-rebench Leaderboard is a newer coding benchmark with a dataset that is continuously updated with modern tasks that LLM's presumably have not seen before. Claude Opus 4.6 ranks number 1 which probably is not surprising, but GPT 5.2 (not 5.4) and GLM 5 being 2nd and 3rd may.
https://swe-rebench.com/