Additionally, they exhibit a counter-intuitive scaling limit: their reasoning work increases with challenge complexity up to a degree, then declines In spite of having an ample token budget. By comparing LRMs with their normal LLM counterparts less than equivalent inference compute, we detect three general performance regimes: (1) low-complexity https://www.youtube.com/watch?v=snr3is5MTiU