Hacker Newsnew | past | comments | ask | show | jobs | submit | zzleeper's commentslogin

Looks cool, congrats!

I've also worked with this data, but only for research purposes:

https://www.finhist.com/bank-runs/episodes/13895.html https://www.finhist.com/bank-runs/index.html

Surprisingly, I found out that layout was the trickiest thing, as newspaper articles often had multiple layers of headers, spanned multiple columns, etc.

Do you have a preferred solution on that?


Nice collection you have there.

Just asked the Sleuth for some examples of that, and here's one to add to your Unional National one: https://www.finhist.com/bank-runs/episodes/19827.html

https://snewpapers.com/components/0b22f0ca-60d2-4d63-be99-74...

Yes I agree the layouts are the trickiest part. I tried a few and ended up using some of the Paddle Paddle models for document layout analysis and orientation and such, which give bounding boxes and predicted reading order, but the reading orders aren't great even with SOTA most recent models on complex layouts, or even simple layouts when you have mastheads or images or other artifacts to work around. It's still valuable information that can be combined with heuristics though to stitch together a more accurate reading order, as the starting point of a pipeline


Great! Was thinking about PP but because I only ran an order of magnitude fewer articles (under 1mm pages; by piggybacking on Dell's OCR) I relied on Arcanum ( https://www.arcanum.com/en/newspaper-segmentation/about/ ) which was cheap enough (but I think not cheap enough at your scale).

Cheers!


Hmm, I just tried to upload the jpgs of some of todays samples to Arcanum via https://www.arcanum.com/en/newspaper-segmentation/try-it/ and it didn't work. I'll try it again later, but it seems based on a cursory look that it wouldn't return info that I would need to correct it if I didn't like the output, and that I'd still have to stitch the individual pages back together myself?

Probably much cheaper than my process though...


Hard to have any idea of what happened due to the FOW. There's some grainy footage (posted by Trump) of someone running past the SS checkpoint; where you can see some shots fired by the SS. Then Blitzer who says this person might have had multiple large guns, etc.

So, Thiel, Musk, and?

I'm sworn off from Musk-related products, and this will prob make cursor worse (switch to X's LLM for instance). So, any suggestions for switching? Codex; Claude Code? (I like my IDE and I like the freedom to choose a model, which is why I stuck with Cursor even when it felt more expensive)

Zed is snappy as an IDE, and ghostty for your CLI. I've done like 99% of my work in the past month just in ghostty + CC.

Everyone talks badly about Cursor and it is kinda a piece of junk, but no, there's nothing that has the features of: being able to see agent diffs in an editor, seeing diffs inline in chat, be able to click them to jump to the code, and being able to click old chat messages to edit/fork them.

Those are basically my only requirements, and it feels like I've tried everything and they're all everything only has 1 of those features. Zed is the closest, it technically has those features, they're just buggy and have provider specific quirks.

So I'm stuck on Cursor until Anthropic invents IDE technology, or at least VS Code wrapper technology.


Jetbrains IDEs have AI support with all the things you've described, and in a more polished experience that requires significantly less maintenance and tuning. It does that while affording an actual IDE experience that works well for supported languages/projects out of the box, without the need to constantly tune plugins and experience jank misaligned UX that seems to be the norm for VSCode and derivatives.

No association with Jetbrains, and despite having a license, don't even use their AI support much myself (mostly using CC, with IDE integration for diff viewing). But if you haven't tried it recently, probably worth a revisit if you're open to Jetbrains products.


I hope their models improve. I used Junie when it first came out and it was okay but unreliable. I use Cursor with composer right now and I never have any issues. I sure do miss using PyCharm though.

OpenCode and Github copilot are still options if you want the freedom to choose different models.

I really doubt they'll swap in Grok. Grok seems pretty dead. Probably more likely they'll reuse the hardware for composer.

If value is a concern, Codex. It's pretty hard to beat those subsidies. If you really want model freedom, Copilot is surprisingly decent value and as of right now let's you use your sub in other harnesses like OpenCode.


Codex is not a replacement for an IDE. Yes I still need an IDE.

When coding agents work they're great. When they don't I still need the IDE. They usually don't work that great when I'm working on something novel or brownfield. Which happens quite regularly.

But I definitely still want ai autocomplete. I'm not a Vim user. Coding isn't about typing for me, it's about solving problems. So a tool that does lots of the typing for me is a godsend.

So do I go for VS Code + Copilot? Because it was bad when I tried it again for a few days in November. Slow to respond and gave poor results. Cursor is snappy and gives useful results most of the time.


I like VSCode and can't really switch to Zed, but Zed has two very good autocomplete models

their own, called Zeta 2

and then Mercury by Inception. May also be available in VSCode through some third party extension like Kilo, not sure


I switched to Windsurf recently and it's been pretty good for providing a Cursor-like experience; pricing is pretty similar.

Same I'll be switching off Cursor today to either claude code or kiro. Luckily my company lets us choose which agentic software we want to use. I won't touch anything Musk is related to, he is toxic and anything he touches turns toxic and supports him.

Kilo code; vscode extension as well but open source and based on OpenCode.

If you are very cost constrained, Codex. Otherwise, Claude Code.

If you only use AI casually, the $20/month subscription to Claude can be enough.

I use VSCode and Conductor right now.

Wow this is amazing. Did you write all those MD files by hand, or used an LLM for the simple stuff like extracting abstracts?


I used https://github.com/ctoth/research-papers-plugin to produce the annotations. The thing that's really cool is how they surface the cross-links in the collection, for instance look at https://github.com/ctoth/Qlatt/blob/master/papers/Fant_1988_...

Claude is much faster and better at reading papers than Codex (some of this is nested skill dispatch) but they both work quite incredibly for this. Compile your set of papers, queue it up and hit /ingest-collection and go sleep, and come back to a remarkable knowledge base :)


Found it interesting but would have been easier to see an example of how the html looks in the github page!


I've just added a simple example on the github page of how the resulting difference-file looks in an html-browser, I hope this helps to get a better idea.


Good point. I'll add something on the github page showing the html result, thanks for the suggestion!


LMK if you finish it, sounds like something my daughter would enjoy!


That's a refreshing article. Easy to read and I learned a few things!


Another +1, it would be incredibly useful to play with this approach! (and fun)


I'm sure he didn't bought the WaPo to make a profit. More like to have an influence.


It's noblesse oblige, or rather an example of the end of noblesse oblige, that the super rich don't even have to pretend to do things for others any more. Which, I would suggest, is a short-sighted and ultimately hubristicaly stupid change...


And influence he got. Gutting it was an act of influence and carried the message he wanted to carry across quite perfectly


His reason for buying it has been right there in front of us all along: Democracy Dies In Darkenss

It's just like "To Serve Man".


This is absurdly pedantic, but the fact that the Twilight Zone episode relies on a pun makes the two phrases somewhat different.

They would be alike if the book title had been "If Mankind isn't at the Table, Mankind is on the Menu"


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: