Hacker Newsnew | past | comments | ask | show | jobs | submit | eugene3306's commentslogin

This billing cycle I was billed $20 three times.

I contacted my bank and got a reply (from a human) that all three payments are valid.

Emails from Anthropic state that the first two payments failed, but the third went through. Fin says that my question will be elevated to a human being, but so far I was not contacted.


This makes a good benchmark LLMs:

``` look at this paper: https://arxiv.org/pdf/2603.21852

now please produce 2x+y as a composition on EMLs ```

Opus(paid) - claimed that "2" is circular. Once I told it that ChatGPT have already done this, finished successfully.

ChatGPT(free) - did it from the first try.

Grok - produced estimation of the depth of the formula.

Gemini - success

Deepseek - Assumed some pre-existing knowledge on what EML is. Unable to fetch the pdf from the link, unable to consume pdf from "Attach file"

Kimi - produced long output, stopped and asked to upgrade

GLM - looks ok


> Once I told it that ChatGPT have already done this, finished successfully.

TIL you can taunt LLMs. I guess they exhibit more competitive spirit than I thought.


Opus seems to be wired currently to get you to spend more money. Once you tell it "Stop defrauding me, just get to the right solution" it often gets it.


I am like "Yeah ok, use the Arcee Trinity models!" and its like, you got it boss, 3 opus agents in parallel, got it!


I always start the chat with "we have been going in circles" before giving any context.


I copy and pasted the abstract into DeepSeek and asked your question. It's a bit unfair to penalise it for not knowing PDFs.

It got a result.


If you like creating such things, consider contributing to Terminal Bench Science, https://www.tbench.ai/news/tb-science-announcement.


I changed the prompt to this:

""" Consider a mathematical function EML defined as `eml(x,y)=exp(x)−ln(y)`

Please produce `sin(x)/x` as a composition on EMLs and constant number 1 (one). """


meta.ai in instant mode gets it first try too (I think?)

``` 2x + y = \operatorname{eml}\Big(1,\; \operatorname{eml}\big(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(L_2 + L_x, 1), 1) \cdot \operatorname{eml}(y,1)),1)\big),1\big)\Big) ```

for me Gemini hallucinated EML to mean something else despite the paper link being provided: "elementary mathematical layers"


this should be a tangential proof for the dying bunch of people who still believe that LLMs are just parrots. EML are literally a new invention


So what is the correct answer?


why don't they publish at ARC-AGI ? too expensive?


Arc agi was never a good benchmark that tested spatial understanding more than reasoning. I'm glad it's no longer popular


What do you mean? It definitely tests reasoning as well, and if anything, I expect spatial and embodied reasoning to become more important in the coming years, as AI agents will be expected to take on more real world tasks.


spatial or not, arc-agi is the only test that correlates to my impression with my coding requests


I've created Asterisk Codex Skill, but turns out there is ten seconds timeout for scripts


Just in case the author is here: what's the FPS?


On Reddit, author says "The preview is shown at 20fps for a 3x scale image (90x90 pixels) and 50fps for a 1x scale image. This is due to the time it takes to read the image data from the sensor (~10ms) and the max write speed of the display.", and adds that optical mice motion tracking goes to 6400 fps for this sensor but you can't actually transmit image at that rate.

https://old.reddit.com/r/electronics/comments/1olyu7r/i_made...


Too late. Browser-use local LLMs are already a thing


what's point of comparing token prices? especially for thinking models.

Just now I was testing the new Qwen3-thinking model. I've run the same prompt five times. The costs I got, sorted: 0.0143, 0.0288, 0.0321, 0.0389, 0.048 . And this is for single model.

Also, in my experience, sonnet-4 is cheaper than gemini-2.5-pro, despite token costs being higher.


I think the proper way of estimating the cost is the cost of entire run of a test. Like in aider's leaderboard.


I use crypto often. I am Russian, I left Russia when Putin started the war. For me, it is quite hard to open a bank account. So I use crypto. I work remotely for a Singaporian company. Now I'm in Vietnam, I can pay for my groceries with crypto using QR code. I can cash USDT crypto with a rate better than paper bills.

I have two bank accounts in Kazakhstan. Both card credentials were stolen after I used a popular hotel booking website, which, by the words of reddit, shares my card details with hotels. Some money was stolen. Seems like 3D-security only affects my payments, and theifs have a freedom to choose a website without 3D. Now I have to keep that cards always locked. Unlocking them for a short moments, when I need to make a card payment. Like booking an hotel, or buying an airline ticket.


It is wild how Vietnam really transformed from a complete cash society to a mostly digital one in just a few years. Covid did it.

Nothing more annoying than having your largest bill be worth about $20 and having to carry stacks of them around for things like just paying rent.


>Now I'm in Vietnam, I can pay for my groceries with crypto using QR code.

That sounds great. What do you use?


Fizen


lol. That is too complex for the purpose of long-range movement in VR. Just make a controller with a control patterns of electric unicycle, or maybe "hoverboard".


The unicycle idea is actually interesting, I could imagine that working well for a game where you play a little robot character, or a guy stuck on a unicycle. I guess with all the existing biological limitations that hinder immersion, it's smarter to explore ways to get the human mind/body to accept other standards, so that you lean into the limitations


Is this material safe to handle?

It feels like this thing soon will start appearing all over ebay and aliexpress


I hope it is. It will be a perfect toy for little kids, right along the little ball magnets.


Those magnets can be deadly to children of swallowed:

https://health.ucdavis.edu/news/headlines/little-magnets-are...


It has lead in it.


Lead is safe to handle, you just can’t eat it.


Handling lead tends to result on lead being on your hands, which has a nasty tendency to result in lead being on your lunch if you are not careful.


Basic safety precautions are fine.

To put it in perspective: millions of people in the US regularly handle ammunition and shoot firearms at indoor ranges. Those contain lead in many forms: the projectile itself, lead fulminate in the primer compounds, lead suspended in the air after firing, etc.

You make sure to have adequate ventilation, don’t touch your face, and wash your hands when you’re done. It’s important, yes, but not really that big a deal.


Given that the effects of lead tend to be a subtle change in mental state, and show up months or years later, we really don’t know that it’s fine.


Would that be legal, considering they have a patent (pending) for it? Not denying it won't appear anyway just curious


It is illegal in places that have rule of law.

This is to create a risk reduction mechanism for investing in capital to make this at scale, which will cost on the order of $100-500M to scale for world use through trial and error.

If IP is ignored, no business will invest in the initial experiments due to first mover disadvantage in game theory.


>If IP is ignored, no business will invest in the initial experiments due to first mover disadvantage in game theory.

Not if you belive that


This has been shown to be historically true, because startup costs can be tens to hundreds of million in R&D.

The business that spends will have their workers immediately poached if they don’t have IP protecting their initial startup costs.


I'm not sure how it works, but this was the first result:

> Patents are territorial and must be filed in each country where protection is sought.

[0] https://www.stopfakes.gov/article?id=Is-My-US-Patent-Good-in...


Not a patent attorney, but as far as I know: The patent is currently pending (as in, being evaluated to see if it will be granted). Once it is granted in one country it can be expanded to multiple countries within a couple of months, given that the first country is part of the Patent Cooperation Treaty. So if an invention is marked “patent pending”, you know you will have the risk of being sued by said company at some time in the future if you copy the invention.


Dude with something this important to humanity give everyone involved a couple million then let it run wild


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: