Right at the opening, we have a fundamental (and common) misunderstanding of what LLMs are.
When humans read the statement "A is B", we semantically transform that into a logical association. LLMs do not perform any semantics or logic.
Here's a simple example to demonstrate:
If we trained an LLM on something like "A is B C is D D is C.", we might be able expect the continuation, "B is A". If we then gave that LLM the prompt, "What is B?", we might expect the continuation, "B? is What".
Large models like GPT present more interesting continuations because they are trained on larger and more diverse datasets. More diversity also means more ambiguity, which results in continuations that are less predictable and more illogical.
When humans read the statement "A is B", we semantically transform that into a logical association. LLMs do not perform any semantics or logic.
Here's a simple example to demonstrate:
If we trained an LLM on something like "A is B C is D D is C.", we might be able expect the continuation, "B is A". If we then gave that LLM the prompt, "What is B?", we might expect the continuation, "B? is What".
Large models like GPT present more interesting continuations because they are trained on larger and more diverse datasets. More diversity also means more ambiguity, which results in continuations that are less predictable and more illogical.