More

zzleeper · 2026-05-02T11:05:40 1777719940

Looks cool, congrats!

I've also worked with this data, but only for research purposes:

https://www.finhist.com/bank-runs/episodes/13895.html https://www.finhist.com/bank-runs/index.html

Surprisingly, I found out that layout was the trickiest thing, as newspaper articles often had multiple layers of headers, spanned multiple columns, etc.

Do you have a preferred solution on that?

brettnbutter · 2026-05-02T11:24:52 1777721092

Nice collection you have there.

Just asked the Sleuth for some examples of that, and here's one to add to your Unional National one: https://www.finhist.com/bank-runs/episodes/19827.html

https://snewpapers.com/components/0b22f0ca-60d2-4d63-be99-74...

Yes I agree the layouts are the trickiest part. I tried a few and ended up using some of the Paddle Paddle models for document layout analysis and orientation and such, which give bounding boxes and predicted reading order, but the reading orders aren't great even with SOTA most recent models on complex layouts, or even simple layouts when you have mastheads or images or other artifacts to work around. It's still valuable information that can be combined with heuristics though to stitch together a more accurate reading order, as the starting point of a pipeline

zzleeper · 2026-05-02T11:57:47 1777723067

Great! Was thinking about PP but because I only ran an order of magnitude fewer articles (under 1mm pages; by piggybacking on Dell's OCR) I relied on Arcanum ( https://www.arcanum.com/en/newspaper-segmentation/about/ ) which was cheap enough (but I think not cheap enough at your scale).

Cheers!

brettnbutter · 2026-05-02T12:32:09 1777725129

Hmm, I just tried to upload the jpgs of some of todays samples to Arcanum via https://www.arcanum.com/en/newspaper-segmentation/try-it/ and it didn't work. I'll try it again later, but it seems based on a cursory look that it wouldn't return info that I would need to correct it if I didn't like the output, and that I'd still have to stitch the individual pages back together myself?

Probably much cheaper than my process though...

zzleeper · 2026-04-26T02:56:40 1777172200

Hard to have any idea of what happened due to the FOW. There's some grainy footage (posted by Trump) of someone running past the SS checkpoint; where you can see some shots fired by the SS. Then Blitzer who says this person might have had multiple large guns, etc.

zzleeper · 2026-04-25T23:34:36 1777160076

So, Thiel, Musk, and?

zzleeper · 2026-04-21T22:43:37 1776811417

I'm sworn off from Musk-related products, and this will prob make cursor worse (switch to X's LLM for instance). So, any suggestions for switching? Codex; Claude Code? (I like my IDE and I like the freedom to choose a model, which is why I stuck with Cursor even when it felt more expensive)

boc · 2026-04-22T04:01:11 1776830471

Zed is snappy as an IDE, and ghostty for your CLI. I've done like 99% of my work in the past month just in ghostty + CC.

MintPaw · 2026-04-22T17:54:21 1776880461

Everyone talks badly about Cursor and it is kinda a piece of junk, but no, there's nothing that has the features of: being able to see agent diffs in an editor, seeing diffs inline in chat, be able to click them to jump to the code, and being able to click old chat messages to edit/fork them.

Those are basically my only requirements, and it feels like I've tried everything and they're all everything only has 1 of those features. Zed is the closest, it technically has those features, they're just buggy and have provider specific quirks.

So I'm stuck on Cursor until Anthropic invents IDE technology, or at least VS Code wrapper technology.

perrylaj · 2026-04-22T22:51:20 1776898280

Jetbrains IDEs have AI support with all the things you've described, and in a more polished experience that requires significantly less maintenance and tuning. It does that while affording an actual IDE experience that works well for supported languages/projects out of the box, without the need to constantly tune plugins and experience jank misaligned UX that seems to be the norm for VSCode and derivatives.

No association with Jetbrains, and despite having a license, don't even use their AI support much myself (mostly using CC, with IDE integration for diff viewing). But if you haven't tried it recently, probably worth a revisit if you're open to Jetbrains products.

RevEng · 2026-04-25T23:04:48 1777158288

I hope their models improve. I used Junie when it first came out and it was okay but unreliable. I use Cursor with composer right now and I never have any issues. I sure do miss using PyCharm though.

lemonish97 · 2026-04-21T22:46:23 1776811583

OpenCode and Github copilot are still options if you want the freedom to choose different models.

YmiYugy · 2026-04-22T04:44:00 1776833040

I really doubt they'll swap in Grok. Grok seems pretty dead. Probably more likely they'll reuse the hardware for composer.

If value is a concern, Codex. It's pretty hard to beat those subsidies. If you really want model freedom, Copilot is surprisingly decent value and as of right now let's you use your sub in other harnesses like OpenCode.

Sammi · 2026-04-22T09:02:51 1776848571

Codex is not a replacement for an IDE. Yes I still need an IDE.

When coding agents work they're great. When they don't I still need the IDE. They usually don't work that great when I'm working on something novel or brownfield. Which happens quite regularly.

But I definitely still want ai autocomplete. I'm not a Vim user. Coding isn't about typing for me, it's about solving problems. So a tool that does lots of the typing for me is a godsend.

So do I go for VS Code + Copilot? Because it was bad when I tried it again for a few days in November. Slow to respond and gave poor results. Cursor is snappy and gives useful results most of the time.

merlindru · 2026-04-22T17:51:08 1776880268

I like VSCode and can't really switch to Zed, but Zed has two very good autocomplete models

their own, called Zeta 2

and then Mercury by Inception. May also be available in VSCode through some third party extension like Kilo, not sure

limelight · 2026-04-22T23:43:42 1776901422

I switched to Windsurf recently and it's been pretty good for providing a Cursor-like experience; pricing is pretty similar.

daheza · 2026-04-22T17:27:28 1776878848

Same I'll be switching off Cursor today to either claude code or kiro. Luckily my company lets us choose which agentic software we want to use. I won't touch anything Musk is related to, he is toxic and anything he touches turns toxic and supports him.

skzo · 2026-04-22T22:40:13 1776897613

Kilo code; vscode extension as well but open source and based on OpenCode.

solenoid0937 · 2026-04-22T03:58:10 1776830290

If you are very cost constrained, Codex. Otherwise, Claude Code.

bonzini · 2026-04-22T07:04:54 1776841494

If you only use AI casually, the $20/month subscription to Claude can be enough.

acjohnson55 · 2026-04-22T01:37:30 1776821850

I use VSCode and Conductor right now.

zzleeper · 2026-04-09T19:47:29 1775764049

Wow this is amazing. Did you write all those MD files by hand, or used an LLM for the simple stuff like extracting abstracts?

ctoth · 2026-04-09T19:53:28 1775764408

I used https://github.com/ctoth/research-papers-plugin to produce the annotations. The thing that's really cool is how they surface the cross-links in the collection, for instance look at https://github.com/ctoth/Qlatt/blob/master/papers/Fant_1988_...

Claude is much faster and better at reading papers than Codex (some of this is nested skill dispatch) but they both work quite incredibly for this. Compile your set of papers, queue it up and hit /ingest-collection and go sleep, and come back to a remarkable knowledge base :)

zzleeper · 2026-03-25T05:29:35 1774416575

Found it interesting but would have been easier to see an example of how the html looks in the github page!

robtoscani · 2026-03-26T02:13:54 1774491234

I've just added a simple example on the github page of how the resulting difference-file looks in an html-browser, I hope this helps to get a better idea.

robtoscani · 2026-03-25T13:06:11 1774443971

Good point. I'll add something on the github page showing the html result, thanks for the suggestion!

zzleeper · 2026-03-21T17:55:45 1774115745

LMK if you finish it, sounds like something my daughter would enjoy!

zzleeper · 2026-03-20T00:26:31 1773966391

That's a refreshing article. Easy to read and I learned a few things!

zzleeper · 2026-03-15T21:46:02 1773611162

Another +1, it would be incredibly useful to play with this approach! (and fun)

zzleeper · 2026-03-15T06:22:54 1773555774

I'm sure he didn't bought the WaPo to make a profit. More like to have an influence.

mrwh · 2026-03-15T06:26:37 1773555997

It's noblesse oblige, or rather an example of the end of noblesse oblige, that the super rich don't even have to pretend to do things for others any more. Which, I would suggest, is a short-sighted and ultimately hubristicaly stupid change...

ithkuil · 2026-03-15T07:12:57 1773558777

And influence he got. Gutting it was an act of influence and carried the message he wanted to carry across quite perfectly

GolfPopper · 2026-03-15T07:33:09 1773559989

His reason for buying it has been right there in front of us all along: Democracy Dies In Darkenss

It's just like "To Serve Man".

thomassmith65 · 2026-03-15T08:09:37 1773562177

This is absurdly pedantic, but the fact that the Twilight Zone episode relies on a pun makes the two phrases somewhat different.

They would be alike if the book title had been "If Mankind isn't at the Table, Mankind is on the Menu"