ArielTM's comments

ArielTM · 2026-04-28T01:33:24 1777340004

A different axis that holds up: a tiny always-loaded index pointing at per-fact files that load on demand. Claude Code's auto-memory uses this. The part that does the work isn't the lookup, it's the index itself acting as a filter. Every new fact has to summarize into one line that earns its keep, or it doesn't get saved. Stuff that misses the bar either gets rediscovered during work or wasn't load-bearing. 52% Recall@5 from forgetting is a real research win, but the production lever has been bounding the always-loaded set, not scoring what to forget

ArielTM · 2026-04-25T19:06:02 1777143962

Orchestration harnesses get eaten. Markdown ones don't. A CLAUDE.md, an evals script, a lessons-learned.md per repo. Smarter models just get more out of them. The bet that hurts is the hand-rolled AutoGen graph.

ArielTM · 2026-04-25T17:35:21 1777138521

The browser analogy holds because publishers wanted browsers. Sites lived with User-Agent and robots.txt because the click paid for it.

AI agents are the destination. No return click to bargain with. That's why Cloudflare just went default-block + 402 Payment Required instead of waiting on a standards body.

Open standards on the agent side are the easy half. Getting sites to show up is the part W3C can't fix alone.

ArielTM · 2026-04-23T12:58:12 1776949092

The architecturally distinct bit is that you're validating at the service-action layer (send-email, merge-PR, transfer-funds) instead of at the tool-call layer inside whichever agent's running. A permission hook in Claude Code is only as trustworthy as the Claude Code process itself, and it doesn't carry over if you swap in a different agent next week. PS sits one layer up with stable, cross-agent semantics, and it's the thing that actually holds the OAuth tokens, so the agent can't leak them even if it wanted to.

Push-to-approve on a separate device is also the right channel, since the whole point is that you don't trust whatever just asked.

Curious: are the per-service schemas hand-written or generated from each provider's OpenAPI?

chiedo · 2026-04-23T13:16:36 1776950196

"Hand" written for now! Didn't even think about using each provider's OpenAPI.

But yep, you get the nuance. The point is that the process eg Claude Code doesn't need to be "trusted" to behave.

ArielTM · 2026-04-23T12:43:01 1776948181

kqueue/FSEvents is tempting here, but Darwin drops same-process notifications. If you've got a publisher and listener in the same process the listener just never fires. Nasty thing to chase. stat polling looks gross but it's the only thing that actually works everywhere.

What happens on WAL checkpoint? When the file shrinks back, does that trigger a wakeup, or does the poller filter size drops?

xenadu02 · 2026-04-24T18:56:59 1777057019

This comment is completely incorrect.

kqueue VNODE events are delivered so long as your process has access to the file. There is no "same-process" notification filter.

russellthehippo · 2026-04-23T20:39:12 1776976752

Actually need to test this. Will report back

ArielTM · 2026-04-22T23:34:31 1776900871

The worktree part is the easy half. Running parallel Claude Code subagents, the bottleneck is never "can they run without stomping each other's files." That's solved the moment each one has its own checkout.

The hard problem is architectural consistency. Agent A renames a type to X. Agent B, in a different worktree, independently renames the same type to Y because neither saw the other's decision. When you merge, neither worktree is "wrong" but the code is incoherent. You need either a shared decision log that every agent reads before starting, or an orchestrator that hands out scope narrow enough that no two agents can collide.

Zed's post is solving filesystem-level parallelism. The harder coordination problem is semantic, and that's where time savings from parallelization go to die.

ArielTM · 2026-04-22T03:38:57 1776829137

The debate here is missing a practical question: is the judge from the same model family as the agent it's judging?

If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers, ideally different architectures.

Secondary issue: the judge only sees what's in the HTTP body. Someone who can shape the request (via agent input) can shape the judge's context window too. That's a different failure mode than "judge gets tricked by clever prompting." It's "judge is starved of the signals it would need to spot the trick."

ArielTM · 2026-04-22T03:29:47 1776828587

The hvc1 and 10-bit failures a few comments up aren't a FFmpeg-wasm fallback thing, they're a WebCodecs browser-gap. Firefox's HEVC path is partial and 10-bit paths are worse. Chrome mostly works and Firefox fails on the exact files iPhones and modern Androids record by default.

A "your browser can't decode this codec, try Chrome" nudge would probably spare people the bounce, especially on test imports from their phone.

kolx · 2026-04-22T11:23:38 1776857018

Good idea. Once I am through the licensing ordeal I have a giant stack of errors to tackle which should increase usability.

ArielTM · 2026-04-22T23:29:00 1776900540

Thanks!