More

merlindru · 2026-04-24T10:08:34 1777025314

highly agree, sadly, as a huge fan of Opus

Opus 4.5 and 4.6 were the first models that i could talk to and get a sense that they really "understood" WHY i'm saying the things i am

Opus 4.7 kinda took that away, it's a definite regression. it doesn't extrapolate.

———————————————

refactor this thing? sure, will do! wait, what do you mean "obviously do not refactor the unrelated thing that's colocated in the same file"? i'm sorry, you're absolutely right, conceptually these two things have nothing to do with each other. i see it now. i shouldn't have thought they're the same just because they're in the same file.

———————————————

whereas GPT 5.5, much like Opus 4.6, gets it.

i wanted to build a MIDI listener for a macOS app i'm making, and translate every message into a new enum. that enum was to be opinionated and not to reflect MIDI message data. moreover, i explicitly said not to do bit shifting or pointer arithmetic as part of the transport.

what did Opus 4.7 do? it still used pointer arithmetic for the parsing! should i have to be this explicit? it also seemingly didn't care that i wanted the enum to be opinionated and not reflect the raw MIDI values. Opus 4.6 got it right (although with ugly, questionable implementation).

GPT 5.5 both immediately understood that I didn't want pointer arithmetic because of the risk of UB and that shuffling around bits is cumbersome and out of place. it started searching for alternatives, looking up crates to handle MIDI transports and parsing independently.

then it built out a very lean implementation that was immediately understandable. even when i told Opus 4.7 to use packages, and even how to use them, it still added a ton of math weirdness, matching against raw MIDI packet bytes, indirection after indirection, etc. even worse, it still did so after giving them the public API i wanted them to implement.

GPT 5.5 nailed it first try. incredibly impressed with this model and feel much safer delegating some harder tasks to it

merlindru · 2026-04-24T00:42:29 1776991349

Anthropic has started to ask for IDs for use of their products period

I don't like that trend. I get why they're doing it, but I don't like it

brigandish · 2026-04-24T02:44:13 1776998653

Are you in the UK? I've not had this happen to me (I'm not in the UK) so I'm wondering if the Online Safety Act has affected this, as it has with other products.

litigator · 2026-04-24T03:00:49 1776999649

I am from the UK and have not had this happen to me (Yet? perhaps)

merlindru · 2026-04-22T17:51:08 1776880268

I like VSCode and can't really switch to Zed, but Zed has two very good autocomplete models

their own, called Zeta 2

and then Mercury by Inception. May also be available in VSCode through some third party extension like Kilo, not sure

merlindru · 2026-04-22T14:06:22 1776866782

that's akin to saying "i do not need their product therefore i don't care"... so what's your point? someone may have made it part of their workflow!

throwaway27448 · 2026-04-22T16:46:46 1776876406

True. Some people shouldn't use git if their workflow doesn't beg it.

merlindru · 2026-04-22T13:45:40 1776865540

about once a month since my childhood i get an awful, slowly ramping headache that makes it unable for me to think or function.

it seems to happen more when i'm overweight, making me think it's blood pressure (BP) related, but then doing the valsalva maneuvre, which spikes BP, doesn't cause any problems at all.

i've tried acetaminophen, even 1.2g of it, to no avail. it doesn't help.

i've also tried every other remedy, such as curcumin, fire/ice locally, hot and cold showers, neck massages, working out muscles that may be involved in it, everything. nothing helps.

except for ibuprofen. 400-600mg kills it every time.

at least for me, there seems to be a definite difference, as ibuprofen can anecdotally help in some situations that acetaminophen can't. i wonder what exactly it can / can't treat and why.

merlindru · 2026-04-21T15:58:07 1776787087

dunning krueger in the training materia

merlindru · 2026-04-18T15:19:10 1776525550

i've noticed this too. i suspect they either did weight surgery on the model and distilled it from Mythos Preview, or they are currently not saving it correctly / having another adaptive thinking bug.

i asked it to look at writing research (especially from NN/g) and come up with some alternatives for a heading that is roughly supposed to convey:

"this app lets you create custom shortcuts for all mac apps, even sophisticated ones, mouse wheel ones, ..."

it came up with the following headlines:

  1. ONE APP, MANY MAC APPS
  2. ONE INSTALL, MANY APPS ONE APP.
  3. ONE PACK PER MAC APP.
  4. ALL YOUR APP SHORTCUTS IN ONE PLACE
  5. [app name] IS ONE APP. PACKS COVER THE REST.

whereas GPT-5.4 came up with

  1. The Shortcuts Your Apps Are Missing
  2. What Your Apps Still Don't Let You
  3. Do The Parts You Still Do by Hand
  4. When Built-In Shortcuts Run Out
  5. Where Your Apps Stop Short

now both of these aren't amazing, but please tell me how in the world "ONE APP MANY MAC APPS" makes sense as a headline for fucking anything lol

that's not something even GPT-3.5 would come up with.

"one install, many apps" ........huh???

merlindru · 2026-04-18T15:05:29 1776524729

They swapped the tokenizer which either means a new pretrain, or token/weights surgery. The latter one seems more likely both because

- economics: i'd wager a bet that Opus 4.7 is just distilled Mythos Preview - performance: surgery like this would explain the spiky performance and weird issues

just spitballing tho

merlindru · 2026-04-18T06:14:03 1776492843

one thing to keep in mind is that you have to use GPT-5.4 differently from codex. they "work" in different ways. i was aghast when i noticed how terrible Codex was against Claude Code only to conclude it was me who wasn't using it right a couple days later

Opus 4.6 and 4.7 are better than GPT-5.4 xhigh, but only marginally. I can't give proper pointers on what to change because it's incredibly hard to quantify.

In essence, though, GPT-5.4 needs explicit instructions not to take liberties - this is included in the default system prompt of Claude Code which leads me to think Opus is just as overzealous as GPT-5.4 unless explicitly told off.

And it takes EVERYTHING you say at face value. Questions like "don't you see why this is bad?" will be answered with "yeah, i do." which is also kind of cool...

because with Opus in Claude Code i constantly have to reassure the model i'm not insinuating anything, lest it takes my question and runs with it into a frenzy of "oh shit my bad let me fix it im so sorry" type changes.

merlindru · 2026-04-18T06:06:05 1776492365

but how true is this? this is almost impossible to measure and those that do[1] find no significant difference

i personally haven't noticed any downgrade at all.

it's entirely possible there's a mass delusion going on where everyone gets wowed by 4.6 initially, then accepts the new baseline and gets used to it, then thinks that baseline is no longer impressive and thus degraded

it doesn't help that anthropic changed defaults for its claude code harness for all users suddenly

the best and only evidence i've seen for actual degradation is that the web version of opus 4.6 failed the car wash test, and since you cannot simply choose to "disable adaptive thinking" and other parameters with the web version, you truly may have gotten a worse product

[1] https://marginlab.ai/trackers/claude-code-historical-perform...