This simply isn't true of humans for the kinds of examples used in the paper. If I read "Olaf Scholz was the ninth Chancellor of Germany" then I have no trouble reversing that and determining that the 9th chancellor of Germany was Olaf Scholz. This is not an inference that's 'cognitively challenging'.
Humans may sometimes fail to make simple inferences of this sort when building their factual databases, but they don't systematically fail to do so.
Also, you may have missed this part of the paper:
>The Reversal Curse shows a basic inability to generalize beyond the training data. Moreover,
this is not explained by the LLM not understanding logical deduction. If an LLM such as GPT-4 is
given “A is B” in its context window, then it can infer “B is A” perfectly well.
The paper does not claim that GPT-4 cannot perform logical deduction, but only that it does not appear to make use of it when generalizing its training data.
> This simply isn't true of humans for the kinds of examples used in the paper. If I read "Olaf Scholz was the ninth Chancellor of Germany" then I have no trouble reversing that and determining that the 9th chancellor of Germany was Olaf Scholz. This is not an inference that's 'cognitively challenging'.
For a better comparison, recall your school experience, say with history lessons, or geography lessons - where you would cram a hundred "A is B" relationships, and then take a test that demanded you know the reversals. Not as easy.
The "reversal curse" failures I've seen with LLMs are very similar to asking a random person, out of a blue, some unusual reversal of some random fact they ought to know, and then being surprised they can't answer quickly.
> The paper does not claim that GPT-4 cannot perform logical deduction, but only that it does not appear to make use of it when generalizing its training data.
Well, neither can humans when cramming, if you don't give them time to pause and think about what they're learning. I believe the equivalent is happening here - LLMs can perform logical deductions, but at no point in the training process is this capability used.
>For a better comparison, recall your school experience, say with history lessons, or geography lessons - where you would cram a hundred "A is B" relationships, and then take a test that demanded you know the reversals. Not as easy.
This doesn't correspond to my school experience. In general I don't feel that I have to separately memorise "A is B" and "B is A". For example, if I learn that Elizabeth I was Henry VIII's daughter, I don't also have to learn that he was her father.
>LLMs can perform logical deductions, but at no point in the training process is this capability used.
Humans may sometimes fail to make simple inferences of this sort when building their factual databases, but they don't systematically fail to do so.
Also, you may have missed this part of the paper:
>The Reversal Curse shows a basic inability to generalize beyond the training data. Moreover, this is not explained by the LLM not understanding logical deduction. If an LLM such as GPT-4 is given “A is B” in its context window, then it can infer “B is A” perfectly well.
The paper does not claim that GPT-4 cannot perform logical deduction, but only that it does not appear to make use of it when generalizing its training data.