Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> perhaps humans also mostly reason using previous examples rather than thinking from scratch.

Although we would like AI to be better here, the worse problem is that, unlike humans, you can’t get the LLM to understand its mistake and then move forward with that newfound understanding. While the LLM tries to respond appropriately and indulge you when you indicate the mistake, further dialog usually exhibits noncommittal behavior by the LLM, and the mistaken interpretation tends to sneak back in. You generally don’t get the feeling of “now it gets it”, and instead it tends to feels more like someone with no real understanding (but very good memory of relevant material) trying to bullshit-technobabble around the issue.



That is an excellent point! I feel like people have two modes of reasoning - a lazy mode where we assume we already know the problem, and an active mode where something prompts us to actually pay attention and actually reason about the problem. Perhaps LLMs only have the lazy mode?


I prompted o1 with "analyze this problem word-by-word to ensure that you fully understand it. Make no assumptions." and it solved the "riddle" correctly.

https://chatgpt.com/share/6709473b-b22c-8012-a30d-42c8482cc6...


My classifier is not very accurate:

    is_trick(question)  # 50% accurate
To make the client happy, I improved it:

    is_trick(question, label)  # 100% accurate
But the client still isn't happy because if they already knew the label they wouldn't need the classifier!

...

If ChatGPT had "sense" your extra prompt should do nothing. The fact that adding the prompt changes the output should be a clue that nobody should ever trust an LLM anywhere correctness matters.

[edit]

I also tried the original question but followed-up with "is it possible that the doctor is the boy's father?"

ChatGPT said:

Yes, it's possible for the doctor to be the boy's father if there's a scenario where the boy has two fathers, such as being raised by a same-sex couple or having a biological father and a stepfather. The riddle primarily highlights the assumption about gender roles, but there are certainly other family dynamics that could make the statement true.


It's not like GP gave task-specific advice in their example. They just said "think carefully about this".

If it's all it takes, then maybe the problem isn't a lack of capabilities but a tendency to not surface them.


The main point I was trying to make is that adding the prompt "think carefully" moves the model toward the "riddle" vector space, which means it will draw tokens from there instead of the original space.

And I doubt there are any such hidden capabilities because if there were it would be valuable to OpenAI to surface them (e.g. by adding "think carefully" to the default/system prompt). Since adding "think carefully" changes the output significantly, it's safe to assume this is not part of the default prompt. Perhaps because adding it is not helpful to average queries.


I have found multiple definitions in literature of what you describe.

1. Fast thinking vs. slow thinking.

2. Intuitive thinking vs. symbolic thinking.

3. Interpolated thinking (in terms of pattern matching or curve fitting) vs. generalization.

4. Level 1 thinking vs. level 2 thinking. (In terms of OpenAIs definitions of levels of intelligence)

The definitions describe all the same thing.

Currently all of the LLMs are trained to use the "lazy" thinking approach. o1-preview is advertised as being the exception. It is trained or fine tuned with a countless number of reasoning patterns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: