More

fzysingularity · 2026-03-02T19:46:49 1772480809

VLM Run (https://vlm.run) | 1x Infrastructure Engineer + 2x AI/ML Engineer | Santa Clara, CA (HQ)

VLM Run is building infrastructure for production Vision-Language Model (VLM) systems — fast inference, tool-use + orchestration, reliable structured outputs, and the observability to iterate quickly. We’re a deeply technical team of veteran AI / computer-vision engineers (20+ years combined, MIT/CMU PhDs) who’ve shipped production ML infrastructure across autonomous driving and LLMs.

Open roles:

1. Infrastructure Engineer (Full-time, ONSITE): $150K–$220K + 0.5–3% equity https://app.dover.com/apply/VLM%20Run/8d4fa3b1-5b38-42e1-927...

2. AI/ML Engineer (Full-time, ONSITE): $150K–$220K + 0.5–3% equity https://app.dover.com/apply/VLM%20Run/1a490851-1ea1-4f12-a0f...

Email hiring "at" vlm.run with your GitHub + a couple recent projects.

P.S. We recently launched Orion, our visual agent that can reason and act over images, videos and documents. You can chat with Orion at https://chat.vlm.run and see capabilities at https://docs.vlm.run.

Apply: https://app.dover.com/jobs/vlm-run

fzysingularity · 2026-03-01T17:04:14 1772384654

AI allows you to accelerate the initial build process, but I think engineering is all about craftsmanship. Today most LLMs have poor taste and chipping away the cruft matters more than ever.

fzysingularity · 2026-02-24T19:36:13 1771961773

uvx probably is the way to go here (fully self-contained environment for each skill), and use stdout as the I/O bridge between skills.

fzysingularity · 2026-02-16T16:38:19 1771259899

The cold-boot time on this model can hardly be called “serverless”

fzysingularity · 2026-02-11T17:56:37 1770832597

ELO scores for OCR don't really make much sense - it's trying to reduce accuracy to a single voting score without any real quality-control on the reviewer/judge.

I think a more accurate reflection of the current state of comparisons would be a real-world benchmark with messy/complex docs across industries, languages.

fzysingularity · 2026-02-11T16:44:14 1770828254

Apple OCR even on the Mac is insanely good, in fact way better than AWS textract/GCP cloud vision OCR.

Any idea what model is being used?

AlphaSite · 2026-02-11T17:04:24 1770829464

Probably some custom model built for their hardware.

fzysingularity · 2026-02-02T20:39:17 1770064757

VLM Run (https://vlm.run) | Infrastructure Engineer + DevRel + AI/ML Engineer | Santa Clara, CA (HQ)

VLM Run is building infrastructure for production Vision-Language Model (VLM) systems — fast inference, tool-use + orchestration, reliable structured outputs, and the observability to iterate quickly. We’re a deeply technical team of veteran AI / computer-vision engineers (20+ years combined) who’ve shipped production ML infrastructure across autonomous driving and LLMs.

Open roles:

1. Infrastructure Engineer (Full-time, ONSITE): $150K–$220K + 0.5–3% equity https://app.dover.com/apply/VLM%20Run/8d4fa3b1-5b38-42e1-927...

2. Founding DevRel (Full-time, ONSITE/REMOTE): $90K–$140K + 0.5–3% equity https://app.dover.com/apply/VLM%20Run/de84c63e-fd0a-418b-929...

3. AI/ML Engineer (Full-time, ONSITE): $150K–$220K + 0.5–3% equity https://app.dover.com/apply/VLM%20Run/1a490851-1ea1-4f12-a0f...

Email hiring "at" vlm.run with your GitHub + a couple recent projects.

P.S. We recently launched *Orion*, our visual agent that can reason and act over images, videos and documents. You can chat with Orion at https://chat.vlm.run and see capabilities at https://docs.vlm.run.

Apply: https://app.dover.com/jobs/vlm-run

visioninmyblood · 2026-02-02T21:03:19 1770066199

The roles seems more oriented towards visual agents. Is this is similar to google vision agent but with more capabilities?

fzysingularity · 2026-01-07T23:13:13 1767827593

> It's like going to the grocery store and buying tabloids, pretending they're scientific journals.

This is pure gold. I've always found this approach of evals on a moving-target via consensus broken.

fzysingularity · 2026-01-07T21:25:52 1767821152

I'd love to see Claude Code remove more lines than it added TBH.

There's a ton of cruft in code that humans are less inclined to remove because it just works, but imagine having LLM doing the clean up work instead of the generation work.

fzysingularity · 2025-12-03T19:46:30 1764791190

Here's a short cookbook exploring an agentic approach to vision–language tasks: detection, segmentation, OCR, generation, and combining classical CV tools with VLM reasoning.

Happy to run examples if you leave a comment.

[1] IPython notebook: https://github.com/vlm-run/vlmrun-cookbook/blob/main/noteboo...

[2] Colab: https://colab.research.google.com/github/vlm-run/vlmrun-cook...