Different models, similar number representations. Different models for different languages, similar concept representations. They have to learn all of this from human text input, so they're not divining it themselves. It all makes a strong case for universal grammar, IMO.
Surely the "universal grammar" is "every country adopting Western Arabic numerals, largely for commercial reasons, but also acknowledging that their indigenous systems kind of sucked in comparison." The fact that there are different languages truly means nothing, Arabic numerals spread much further than the Latin alphabet.
I really don't think this is evidence for "universal grammar" in any sense. It is evidence that we are all using the same very specific grammar for very specific cultural reasons.
This would be a sort of convergence? They were both right in part (Chomsky that there was structure there, Norvig that it could be sussed out using brute force statistics). As is often the case, when two smart people who have thought a lot about something complicated disagree, the truth comes out when their unstated assumptions are finally exposed to the light.
In this case, Chomsky's LAD almost certainly relies on Baldwin-effect structure to get around the paucity of stimuli, and the LLMs are just getting to "the same place" through sheer masses of data.
That's incredible. Had do idea this existed. I'm listening to it right now and I can fully understand it as polack! I might actually want to learn this as it'll help me communicate to all my different slavic friends.