Left unsaid in this piece is that OpenAI likely would have to increase parameters and compute by an order of magnitude (~10x) to train a new model that offers noticeable improvements over GPT-4, due to the diminishing returns seen in "transformer scaling laws."
Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.
Also, who says that the "transformer scaling laws" are the ultimate arbiter of LLM scaling? They overturned previous scaling laws and other scaling laws might overturn them. Furthermore, it's even possible that the transformer model won't even be used in later models. I remember Ilya making the point that just because the transformer model was the first one that looks like it can scale intelligence just by lighting up billions of dollars of GPUs, it doesn't mean it's the last one. Maybe it will even be like, the vacuum tube of AI models, and other ones are being made in secret. A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer.
> A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer
This reminds me a TV interview of the author Patrick Modiano, just after he won the literature Nobel price. The presenter asked him if the money would help. The author answered essentially that the next time he would be in front of a white page, the money surely wouldn't help.
In the case of surpassing transformers, money could help to give access to more compute power. It could also help to prevent the research from being public.
Modiano is a rich man, born into a rich family. Wealth doesn't help in front of a white page, but it sure helps being able to stay in front of that white page instead of having to go take up a job because you're not sure what you're eating tonight.
As always, wealthy people and their "money doesn't make happiness" bullshit.
Since he already didn't need to work another job to pay the bills, the extra money from the Nobel prize does not make a difference in this case as he can already put all his time into writing.
If someone is already working on a problem full-time, money only helps to the extent that resources they can be buy with money are the limiting constraint. However, beyond deep work needed for a single individual, when you need to explore potential opportunities in a broad space of possibilities, money can hugely effect the search of that space because work needed for major breakthroughs remains parallelizable. You can delegate subtasks to people if you can afford those people. You can hire more of the few specialized people who know about a niche to work on your problem instead of other problems. You can exploit synergies from crosspollination of ideas from bringing together brilliant minds into the same conversations. The influx of money is very very likely to increase the pace of innovation in AI. The breadth of possible avenues for breakthroughs is largely yet-to-be-explored.
Can't confirm OpenAI's position in particular, but $500k/yr/person is table stakes for a decent engineer directly connected to the company's bottom line. Double that for an actual expert, double it again if they're consulting, and put together a team of 3-10 of them. Those numbers aren't too far off.
I can see $5M per person being possible. $20M is the absurd part. 5 years with such a comp leads one to a net worth that is borderline filthy. Like elite, world renown athletes and actors level of wealth. Again, could all be true but just unexpected from my experience.
> I can see $5M per person being possible. $20M is the absurd part. 5 years with such a comp leads one to a net worth that is borderline filthy. Like elite, world renown athletes and actors level of wealth.
5 years at $20 million equals the highest paid FIFA player’s single year pay (or Tom Cruise’s pay for one movie, Top Gun: Maverick), or about two years of the highest paid NFL players, so it won't catch up the their wealth over that time (assuming similar lifestyle) as its losing ground to them every year.
(And the claim was total, not per person, salary, anyway.)
Those are highest salaries, they don’t include capital returns. Unsurprisingly, people make 7 figure salaries in sports, and who have been doing so for more than a few years, often have significant capital returns, as well.
There are CEOs that get paid a lot more than $5M/year in stock. Arguably that's also ridiculous, but it's certainly possible that paying a team of 20 highly skilled engineers $5M/year each brings more value to the company than paying a CEO $100M a year.
$20M in startup stock is not the same as $20M in Apple stock. I can totally believe OpenAI paying that figure on paper. You’d need to adjust the number for risk though
I wouldn't put any stock into a random twitter rumor by someone likely looking for clout. The source, some guy with likely a purchased checkmark and 12k followers (who knows how few before he claimed to have this insider knowledge), claims four(!) different "extremely reputable" sources that have independently confirmed it. How many people exactly are they making these offers to? Do they all happen to know this guy, someone with no discretion apparently, and everyone decided to tell him this information for what reason exactly?
99% chance it's made up.
That said, if they thought a specific individual had even a reasonable chance of coming up with an improvement on the current state-of-the-art AI architecture that they'd be able to keep entirely to themselves, $20M would be a massive bargain.
The rumor is still almost certainly fake, but for someone very specific at this critical time in the field, I don't know if the number would be that absurd.
Twitter rumors also claimed a parameter count of 100 trillion parameters and they visualized it with two circles with a huge size difference to make it look intimidating.
I guess the reason why AI is so interesting is that human stupidity is so widespread.
Actually what he has said is that the biggest performance gains were from the human feedback reinforcement learning.
There are also all of the quantization and other tricks out there.
Also they have demonstrated that the model already understands images but just haven't completed the API for this.
So they use quantization to increase the speed by a factor of 3 while slightly increasing the parameter count. Maybe find a way to make the network more sparse and efficient so in the end with the quantization the model actually uses significantly less memory. and continue with the RHLF focusing on even more difficult tasks and those that incorporate visual data.
Then instead of calling it GPT-5 they just call it GPT-4.5. Twice as fast as GPT-4, IQ goes from 130 to 155. And the API now allows images to be passed in and analyzed.
There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.
I bet they’re not saying how big of a model GPT-4 is because it’s actually much smaller we would expect.
ChatGPT is IMO a heavily fine-tuned Curie sized model (same price via API + less cognitive capacity than even text davinci-003) so it would make sense that a heavily fine-tuned Davinci sized model would yield similar results to GPT-4.
I wouldn't bet on their pricing being indicative of their costs. If MSFT wants the ChatGPT-API to be a success and is willing to subsidize it, that's just how it is.
It’s not only 10x cheaper, it’s also way faster at inference and not as smart as Davinci. IMO the only logical answer is that the model is just smaller.
I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be?
I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.
Interesting. I remember when the speedup of chat-gpt happened, the API prices dropped by around 10x, so I'd imagine there were some tricks of making them run faster.
If they still haven't implemented these, it would be positively surprising (to me) to see the model run at similar speeds as chatgpt now. It'd be a great achievement if they really packed such performance on similar architecture (say by just training longer)
The speed-up of the free and default "chatGPT" happened because they switched it from the full size GPT-3.5 to "GPT-3.5-Turbo", which is likely a finetune of the 10x smaller GPT-3 Curie.
If you have chatGPT Plus you can choose "Legacy" from the drop-down to get the smarter (and slower) 175B Parameter version of GPT-3.5. That version is the same speed as GPT-4 when load is low (early morning EST), which lends credence to the theory that GPT-4 is the same size as overparametrized GPT-3.
We are also starting to run out of high quality corpus to train on at such model scales. While Video offers another large set of data, we'll have to look at further RL approaches in the next few years to continue scaling datasets.
If they're running into any limits in that respect, my bet would be that the limit would only on what is easily accessible to them without negotiating access, and that they can easily go another magnitude or two just with more incremental effort
to strike deals. E.g. newspaper archives, national libraries and the like (I haven't looked at other languages, but GPT3's - since I don't know of any numbers for GPT4 - Norwegian corpus could easily be scaled at least two orders of magnitude with access to the Norwegian national library collection alone)
Depends on the quality. A ten trillion parameter model should require roughly 10 trillion tokens to train. Put another way, this would be roughly 10k Wikipedia’s or 67 Million books, Roughly 3-4 GitHub’s.
It’s been established that LLMs are sensitive to corpus selection which is part of why we see anecdotal variance in quality across different LLM releases.
While we could increase the corpus of text by loading social media comments, self published books, and other similar text - this may negatively impact final model quality/utility.
yeah i need a source on this. GPT3 corpus is, what, a few hundred TB? absoultely nowhere near the total amount of tokens we could collect eg from youtube/podcasts
I often see mistakes when chatgot is faced with more spatial reasoning, and I wonder if changes as simple as deep convolutional subnetworks in intermediaries layers would help the language model fit better in these situations. In short, I’m excited to see where things go, and can definitely see room for great improvement through improvements to the architecture!
How noticeable changes will be have little connection with loss reduction during training. Holding very complex thought processes may actually not diminish the loss function all that much. But they are very noticeable when we are interacting with these systems.
It’s the indication data of current research: train more and better, current models are oversized and undertrained. A good foundation model can exhibit massive quality differences with just a tiny bit of quality fine tuning (e.g. Alpaca vs Koala)
> Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.
Read OpenAI API docs on GPT model versions carefully, and look at them again from time to time.
I would suspect they probably conditioning data for gpt 5. Im guessing ‘training’ presupposes they have the training data primed & getting data into shape seems to be one of main cruxes
It could be that they are not training GPT-5 for a simple reason: Microsoft ran out of GPU compute [1] and they focus on meeting inference demand for now.
Also, the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week", then changed to "expect lower caps as we adjust for demand" to "GPT-4 currently has a cap of …". This sounds to me like they changed from having lots of compute to being limited by it. Also note how everything at OpenAI is now behind a sign up and their marketing has slowed down. Similarly, Midjourney has stopped offering their free plan due to lack of compute.
Seems like we didn’t need a 6 months pause letter. Hardware constraints limit the progress for now.
That or they're working on something like a 10-30B input model, dubbed GPT-NextGen, that essentially has the same results as gpt4, but with a lot more performance gains, and speed, and improvements. GPT-5 will suck, if it's a similar ratio slower to gpt-4, than gpt-4 is to gpt-3.5.
So, I think there's a lot of improvements where maybe gpt-4, is as far you go in terms of inputting data, and maybe better use cases are more customization of data trained on, or finding ways of going smaller, or even some model that just trains itself on the data requirements, similar to how we jump on google when we're stuck, it'd do the same and build up its knowledge that way.
I also think we need improvements in vector stores that maybe add weights to "memories" based on time/frequency/recency/popularity.
That sounds like having a mixture of experts model (at high scale popularly developed by Google): train multiple specialised models (say embedders from text to a representation) that could be fed into a single model at the end. Each expert would be an adapter of sorts, activating depending on the type of input
> the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week"
At the time I noticed that the wording they gave technically implied they expected the cap to get more limiting and then that's exactly what happened, and I haven't been able to work out if that is indeed what was the intended message or not.
(Why) Is that technically correct? I'm really curious since I too thought they meant that capping effect would increase (fewer messages allowed), and not decrease (more messages allowed), as was my intuitive understanding.
I asked it to help me code something. Then it stopped midway through, so I asked it to continue from the last line.
…It started from the beginning.
Now at the same point, I asked it not to stop. To keep going.
It started again from the beginning.
It went like this for about another 10 or so prompts. Hell, I even asked it to help me write a better prompt to ask it to continue from the line it cut off and I then used that. It didn’t work at all.
Then I ran out of prompts.
Three hours later, it did the same crap to me and I lost around 14 prompts to it being ‘stuck’ in an eternal loop.
Basically, OpenAI are sneaky devils. ‘Stuck’ my ass - that was intentional to free up resources.
or maybe you need to stop thinking everything is a conspiracy and realize bugs happen
I've been using GPT everyday for the last 3 years, it never happened to me
Maybe you can take your “stop thinking everything is a conspiracy” quote to someone who thinks everything is a conspiracy… as in right back at yourself, conspiracy believer.
Oh, neat. Thanks for the tip. I usually say "Please continue," but sometimes it reiterates too much of what it stated previously. (I've tried "Please continue where you left off," and so far that has worked 1/1 times that I've tried it.)
Oh. And also they are probably making ChatGPT Plugins ready for public release. Maybe the competition can catch up on the language model, but they will not likely catch up soon to the best language model with the most plugin integrations.
At this point, I wouldn't give much credibility to anything OpenAI claims about their research plans.
The game theory behind AGI research is identical to that of nuclear weapons development. There exists a development gap (the size of which is unknowable ahead of time) where an actor that achieves AGI first, and plays their cards right, can permanently suppress all other AGI research.
Even if one's intentions are completely good, failure to be first could result in never being able to reach the finish line. It's absolutely in OpenAI's interest to conceal critical information, and mislead competing actors into thinking they don't have to move as quickly as they can.
>>>The game theory behind AGI research is identical to that of nuclear weapons development...
Nuclear powers have not been able to reliably suppress others from creating nuclear weapons. Why would we think the first AGI will suppress all others perfectly?
The first nuclear power (the United States) chose to not. Had they decided to be completely evil, they certainly could have used the threat of nuclear annihilation (and the act of it for non-compliers) to achieve that goal.
They didn't have strategic ICBMs from the get go. At the start they only had tactical nukes enough to scare the japanese into submission. And not enough to completely nuke someone like russia. Their nukes also required accessibility because they didn't have rockets. Deploying nukes over a country they were already heavily firebombing was a walk in the park compared to deploying them in places where another power has air superiority.
Even if they wanted to - they would definitely fail on that mission. Soviets had nukes just 4 years after Americans, and in these 4 years US just wouldn't produce enough fission materials to annihilate USSR, not talking about UK and France.
The people (and even military personnel) of the United States wouldn't have tolerated capriciously dropping atomic bombs on Moscow three years after we helped them defeat the Nazis. But perhaps something in the nature of AGI will allow its "discoverers" to act more unilaterally evil and with fewer fetters. An army of amoral robots would certainly remove a lot of checks on certain kinds of behavior.
Not sure I would take that lesson away from history. There were quite a few voices in the US military at the end of WW2 that recognized that peace with the soviets was tenuous at best and wanted to push the nuclear advantage right then. The US was unmatched for 4 years until the Soviets had their first nuclear test. Had they chose to use nuclear weapons, they could have become a global empire and no one could have stopped them.
US didn't need nukes to obliterate USSR in 1945. Just cut it off the lend-lease program and attack on European and Pacific theaters at the same time. They didn't do it because US public won't tolerate attack on yesterday's ally against Hitler.
After decades of propaganda on both sides, the period just after World War II where the Western/NATO countries and the Soviet Union both enjoyed very high rates of affection for one another are now totally unknown and alien to people.
General Patton and a ton of other military generals wanted to invade Russia. The way Russia treated it's army, prisoners of war, the ideological differences and many other factors led to a potential alternate history where the US invades Russia after WW2.
It wasn't just Patton, there were a lot of other psychotic generals high up in the U.S. command structure like Curtis LeMay (inspiration for Jack D. Ripper) who wanted to go fully off leash in fighting communism, use nukes in Korea, etc. In the end we did use biological weapons and kill 5% of North Korea's civilian population, so one could argue that besides "no nukes" there wasn't all that much restraint in the end after all.
When I see comments like this, I wonder about the personal morality of the poster and how they arrived at their worldview. It may be hard to beleive, but there are some advantages to truthfulness in this world.
There are lots of advantages, but it doesn't mean that all actors would automatically stick to the truthfulness. There are also lots of advantages in being untruthful. Evaluate both possibilities is a rational behavior, and definitely not a reason to question one's personal morality.
so you think open ai has been responsible and transparent so far ? You think cruising around Africa and other developing countries basically bribing people to sign up for “world coin” in exchange is trustworthy behaviour ? Sorry I don’t buy this message at all. I’ll stop here but not my kind of person that’s for sure.
How should I know? It may be valid, but it's already too sharp for mere mortals to get near. This area seems to attract cranks and obsessives. The dialog is often not fruitful.
Why would that be the case? If anything you would expect the first iteration of AGI either kept completely secret or end up leaked indirectly or directly negating any benefits. Also AGI without weapons is not a military threat.
Perhaps it's time to call this synthetic intelligence instead of AI which has an implicit understanding of an alternative method to construct a human like AI.
What is clear is that on this earth itself we have cetacean, corvid, cephalopod intelligence which is wired very differently. Perhaps we need to respect the diversity of intelligences that exist and study this growth in LLM and adjoint areas as just synthetic intelligence.
Rebranding maybe could help drive a level of objectivity this conversation on ethics etc that seems to be missing
Actually I agree with them a new name would be helpful. I would propose inorganic intelligence to try to pick a term with less value judgments.
AI is really an overloaded term that includes 70 years of snake oil, Skynet, the Singularity and killer robots. I think we need a new name to start fresh.
And personally, I think we are extremely biased by our sci-fi to think of this tech as malevolent. As far as we can see, it can only know what we teach it since it relies on all of our perceptions to learn. LLMs seem both extremely promising as a useful tool and very pliant to the operator’s wishes. I’m way beyond “this is a fancy next word predictor” as I think it’s emergent behavior has many of the hallmarks of reasoning and novel inference, but at best I think it is only part of a mind and an unconscious one at that.
It could be useful for a similar reason as the euphemism treadmill. We could leave behind all of the misguided assumptions about AI with the old 'artificial intelligence' nomenclature and move forward with 'synthetic intelligence' which has our new understanding of what systems like GPT-4 can do.
>Since nobody actually knows what "intelligence" is
Everybody knows what intelligence is. Even if we can't agree on a precise definition, it's pretty obvious that it's the thing that humans and other animals do that involves learning, reasoning, planning, and problem solving. We can also agree that being successful at certain tasks constitutes intelligence. Solving a math problem is intelligence. Writing a poem is intelligence.
The devil is in the details and rather generic words that describe a gradient can never capture the exact nature of what we're trying to define in specific situations.
Only if you care about those details. Almost no one does.
In almost any conversation, everyone does in fact know what intelligence, porn, god and beauty are. Yes, all those ideas are fuzzy at the borders, but we almost never need to resolve them in detail when talking about them. When we do, then yes, things get tricky and there's a lot of disagreement - but at the end of the day, as the phrase I once read on the Internet goes, it all has to add up to normality. You can still work with fuzzy, casual concepts, even though you can't define them precisely.
You can never capture the exact nature of anything outside of logic and math. That's too high of a bar. Philosophers who have worked on this problem like Wittgenstein talk about concepts in terms of family resemblances, not exact definitions. If I'm trying to understand whether a system is intelligent, I don't need a logical proof. I learn whether it is intelligent by testing whether it can successfully do many of the same things that other intelligent systems do.
>But words are meant to convey meaning to other people, so what the word means to others is more important than what it means to you.
I pretty much agree with that, so I'm not sure where the disagreement is here. Let me go back to the original statement I was responding to.
>Since nobody actually knows what "intelligence" is, the word will mean to people whatever they want it to mean.
If I tell you someone is intelligent, you roughly know what I am talking about. Just because it's hard to formalize that doesn't mean that that the word can mean whatever people want it to mean. For example, if I tell you my friend is intelligent, you would be wrong to interpret that as meaning that my friend has red hair, because hair color is irrelevant to the traits that we normally associate with intelligence. The fact that there are right and wrong ways of interpreting my sentence implies that there is some generally agreed upon notion of what intelligence is, even if that notion is fuzzy and has grey areas.
I'm not sure we are disagreeing. I'm just having a discussion.
> If I tell you someone is intelligent, you roughly know what I am talking about.
Correct, because the context (you're talking about a human, and I know roughly what that means with humans) narrows the possibilities. But even there, it's a vague sort of intuitive knowledge, like trying to say what "art" is.
But when it comes to other areas -- such as machines -- context doesn't help narrow the possible meanings. What does saying a machine is "intelligent" mean? If you ask a machine learning person, you'll get a reasonably specific answer. If you ask the average person on the street, you'll get very, very different answers.
The reason is because we don't know what "intelligence" actually is. We don't even know, with any specificity, what it is in humans -- which is why psychologists assert that there are multiple kinds of intelligence (even if they disagree about how many there are).
> even if that notion is fuzzy and has grey areas.
I don't disagree at all. But the notion has more fuzzy and gray areas than solid ones. As an example, when most people imagine an "artificial intelligence", what they're really imagining is "consciousness". Is consciousness required for intelligence? Who knows? The answer to that depends on what you mean by "intelligence" and we don't agree enough on what that means to have that sort of discussion without beginning by defining the terms.
I don't think the difference between humans and machines matters here. We could ignore the "artificial" aspect and just focus on how we would decide whether some alien biological species is intelligent. I would say that the alien is intelligent if it displays the ability to learn, reason, form abstractions, and solve problems across a wide range of domains. I would apply the same criteria to a machine because I don't think the implementation details matter. It doesn't matter whether you are made of carbon or silicon, or whether you are running a neural network or propositional logic.
> I would say that the alien is intelligent if it displays the ability to learn, reason, form abstractions, and solve problems across a wide range of domains.
I don't disagree with this. What I'm saying is that that definition, while reasonable and I agree, is one that we've just decided on for this conversation.
It isn't one that would be considered complete and correct in all discussions about intelligence.
> I don't think the implementation details matter.
I agree, for the definition of intelligence you just cited. But my point is that "intelligence" is not well-defined or understood. I'm genuinely surprised that people think this is a controversial stance -- I really thought it was well-understood.
We can settle on a definition for rhetorical purposes (and, I would argue, that's mandatory in order to have any solid discussion about intelligence), but any definition we agree on will leave out a lot of things that people consider part of "intelligence".
No, that's not a definition that we've just decided on for this conversation. It's a core part of what people are talking about when they talk about intelligence. Just because we can't precisely define it to capture everyone's intuitions and edge cases doesn't mean that there aren't core features to the concept that everybody agrees on. In other words, anyone who says that learning, reasoning, abstraction, and problem solving aren't a part of intelligence is objectively wrong.
I disagree. I think that nobody knows what it is, as demonstrated by the fact that there is such a wide disagreement about what it is.
> We can also agree that being successful at certain tasks constitutes intelligence. Solving a math problem is intelligence. Writing a poem is intelligence.
As an example, I don't agree that either of those things indicates intelligence all by themselves. We've had programs that nobody would call "intelligent" to do both of those things for decades.
>We've had programs that nobody would call "intelligent" to do both of those things for decades.
So you're right that if I have separate algorithms, each designed for a specific purpose, that those algorithms aren't intelligent. However, if I have a general system that can learn how to solve a math problem, write a poem, and do a bunch of other things that humans can do, then that system is intelligent.
I think Artificial Intelligence has taken on the meaning that the intelligence is real but just that it's coming from machines. Synthetic intelligence (at least to me) sounds more like we're acknowledging that the machines aren't really intelligent and just simulating intelligence.
Why not just eliminate the middle man and call it simulated intelligence? That at least implies that there are different levels of fidelity as quantified by number of parameters and training data set size.
I had a chat with GPT about this and it came up with the term 'data grounded cognition' to describe an 'intelligence' that is derived purely from (and expressed through) statistical patterns in data.
I quite like the term, and it seems quite unique (perhaps cribbing from 'grounded cognition' though that's an entirely different idea AFAIK)
"Cognition" means understanding and knowing. As problematic as "intelligence" is when describing these systems, I think "cognition" is even worse. "Intelligence" is vague and "cognition" is specific, but "cognition" is also incorrect.
The googled definition says "the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses.", which I think fits ok.
Right. That's the definition I think doesn't fit. GPTs do not "understand", and certainly do not "experience" or have senses.
> Do you have a word that you prefer?
Nope. That's why I use "intelligence" despite the problems with it. "Intelligence" may be a blank slate on which you can write whatever meaning you wish, but at least it can stretch to mean something accurate.
I'd say that knowledge (stored as weights) is acquired through experience (starting from a mostly blank start, weights are formed through exposure to training data). For me that's a useful mental model which helps me think about what a LLM is and isn't.
'Understanding' has an ambiguous meaning, while thought and senses are certainly not applicable.
But it's a semantic discussion about a novel and not well understood topic so :shrug: :)
AI has always meant so many things to so many different audiences. I think attempting to argue that X is AI but Y isn't is generally going to be a subjective endeavour of pedantry.
that is assuming. Why don't we simply refer to our own intelligence as 'human intelligence' instead. We don't really know what intelligence is. So adding modifier in front of it will just lead to more confusion. AI helps us understand what intelligence actually is, to learn more of it's very essence. It's not that we already know what it is.
It isn't surprising to me that the world's leading AI company is signalling it's okay with slowing down all large scale LLM training that would allow other companies to be competitive. This is familiar territory for Microsoft (edit: guess I'm wrong, they don't get the 49% stock till later).
Why do people conflate OpenAI with Microsoft? Microsoft has an investment in OpenAI and provides infrastructure for them, but they are separate organizations.
Some of these replies are quibbling about percents of investment, but the elephant in the room is that the government and military and intelligence agencies have almost surely become involved by this point, and they must be providing some amounts of dark investment somehow at minimum. At maximum it's a new Manhattan-scale project.
You can go down the rabbit hole if you want, but if you want only the most superficial glimpse of it then consider that OpenAI board member Will Hurd was a CIA undercover agent and also a representative in the House Permanent Select Committee on Intelligence and also he is a trustee of In-Q-Tel which is the private investment arm of the CIA.
It's funny that you think the US military (or any military for that matter) is anywhere as competent as a modern tech company (which ALSO have their own incompetence problems).
Buddy, these guys are so far behind the times, they're constantly playing catch-up from 10-20 years ago.
People with corporate experience can advise governments.
Eric Schmidt, former Google CEO, led a multi-year project to develop a national AI strategy, https://www.nscai.gov/
> The Final Report presents the NSCAI’s strategy for winning the artificial intelligence era. The 16 chapters explain the steps the United States must take to responsibly use AI for national security and defense, defend against AI threats, and promote AI innovation. The accompanying Blueprints for Action provide detailed plans for the U.S. Government to implement the recommendations.
Obviously defence contractors are having a field day with all the budgets, it's not a surprise that made Palmer Luckey a billionaire. Pretty sure there are a ton of 'investments' in all kinds of defensive AI programs. You can bet the NSA has a lot of stuff that works very, very well. I mean there is a reason they are sitting on a chest full of zero-days or can crack a lot of everyday encryption schemes.
Also, consider the situation in which another state is putting significant resources into a similar project. Would it not follow that it is in the best interest of the U.S. to then fund and support OpenAI? The strategic calculus becomes almost trivial if we presume that "AI" really is going to be as transformative and "possible" as we imagine it being. It's why the Manhattan Project analogy works so well.
What is it about AI that makes people fall for conspiracy theories? If the US wants to reassure that AI will be regulated it wouldn't do it in secret. The ban on AI would be identical to a ban on guided munitions. Building a hobby rocket that uses GPS guidance for landing is illegal as it could be converted into a weapon. This is harder to enforce than AI but the regulation is highly successful.
They do own 49% [1]. So, sure they are separate organizations. But, when someone owns 49% of your house they have some sway in the decision making that happens. When you look at this from a integrations standpoint, where MS is going to have this baked into all their products, you can expand this logic way more. They are for sure influencing roadmap in areas they are interested in.
Checks out. Thank you for your work. Surprised Sam agreed to 49% after pay back. That is crazy amount of equity to give out. I mean it makes sense, especially since MS is providing the infrastructure. It will allow them to scale.
Look at it this way, if I repeatedly deposit $9999 into my bank account to avoid regulatory oversight for depositing $10000 then I'm still breaking the law by trying to avoid the regulatory trigger. This is called "structuring" and it is a criminal act.
But if I do this in a stock context and buy 49% control of multiple companies over and over, with all the same obviousness of my intentional avoidance of the regulatory trigger, it's considered a smart move and pretty much the status quo.
Yes, the practice of law says Microsoft does not own openai. But it's also obvious what's going on when companies do this.
Microsoft is the majority shareholder. That they're legally distinct organizations isn't as meaningful as it would be if Microsoft didn't effectively own OpenAI.
My bet is that (previously discussed by others and here) that they have cascades/steps of models. There's probably a 'simple' model that looks at your query first, which detects whether your query could result in a problematic (racist, sexist etc.) GPT answer, returning some boiler-plate text instead of sending the query to GPT. That saves a lot of compute power and time. If I were them I'd focus more on those auxiliary models which hold the hands of the main-GPT model; there are probably more lower-hanging fruits there. This would also explain why they didn't announce GPT-4 details; my bet is that the model itself isn't very impressive, you're just getting the illusion that it got better by these additional 'simpler' models.
> His point is that the raw model that became GPT-4 would do literally anything it asked you to.
It's unfortunate that people would abuse that and we can't have the raw model just for personal use. The story telling and characters alone would be worth it. The safety guards tend to seep into fictional scenarios, making them more bland and preachy.
I have been writing prompts for a GPT-based document 'digester' for business-internal people who can't code but do have the right background knowledge. Every day I have to expand the prompt because I found a new spot where I have to hold the thing's hands so it does the right thing :)
I feel like the GPT # has already suffered the same fate as nanometers in semiconductor manufacturing.
When manifest as ChatGPT, it is obvious that what presents as 1 magical solution is in fact an elaborate combination of varying degrees of innovation.
In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision to obscure how the sausage is actually made.
> In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision
In their technical report they give both reasons:
"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."
This is where the motives for building this technology become questionable.
On the one hand they won’t talk details because it’s about safety, then open ai just throws the thing on the Internet, with “tools”
Then it’s about the competitive landscape, which is completely ridiculous as you can’t be competing to building more powerful AIs in secret by means of an arms race and respect safety protocols and be “careful”. Realistically, the top minds , and thinkers of our time should be working carefully and together to develop tech like this. Not competing against each other in secret labs.
There’s just way too many mixed messages in their story. They seem confused themselves about what they’re trying to achieve. Which I find unsettling. Now I can’t help but think like everything else, this is just all banking hoping to establish a monopoly via regulation and raise as much money as possible.
They have scaling issue even with 3 and much more with 4, they need time to squeeze more $$ out of these models. 5 will come when they sense competition, they will have all the data and training methods on a turn key to meet that
There is a bit of a political history between the symbolists and connectivist that complicates that, basically the Symbolic camp is looking for universal quantifers while the connectivist were researching existential or statistical quantifers.
The connectivists left the 'AI' folks and established the ML field in the 90s.
Sometimes those political rifts arise in discussions about what is possible.
Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
LLMs are AI id your definition is closer to the general understanding of the word, but you have to agree on a definition to reach agreement between two parties.
The belief that AGI is close is speculative and there are many problems, some which are firmly thought to be unsolvable with current computers.
AGI is pseudo-science today without massive advances. But unfortunately as there isn't a consensus on what intelligence is those discussions are difficult also.
Overloaded terms make it very difficult to have discussions on what is possible.
Is a more strict term that hasn't typically applied to AI as an example of my above claim.
As we lack general definitions, it isn't invalid but no AI is thought to be possible with their claims.
AI being computer systems that perform work that typically requires humans within a restricted domain is closer to what most researchers would use in my experience.
"AI" is one of the few words that has a looser definition as jargon than in general discourse. In general discourse, "AI" has a precise meaning: "something that can think like a human." As jargon though, "AI" means "we can get funding for calling this 'AI'." I would say LLMs count as AI exactly because they can simulate human-like reasoning. Of course they still have gaps, but on the other hand, they also have capabilities that humans don't have. On balance, "AI" is fair, but it's only fair because it's close to the general usage of the term. The jargon sense of it is just meaningless and researchers should be ashamed of letting the term get so polluted when it had and has a very clear meaning.
> Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
Why? PAC looks a lot like how humans think
> But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
Why? Gödel's incompleteness appoes equally well to humans as machines. It's an extremely technical statement about self-reference within an axiom systems, pointing out that it's possible to construct paradoxical sentences. That has nothing to do with general theorem proving about the world.
Superficially, some aspects of human learning are similar to PAC learning, but it is not equivalent.
Gödel's incompleteness applies equally well to humans as machines, in writing down axioms and formula, not in general tasks.
The Irony of trying to explain this on a site called Y combinator, but even for just prepositional logic, exponential-time is the best that we can do for for algorithms and general proof tasks.
For first order predicate logic, finding valid formulas is recursively enumerable, thus with unlimited resources they can be found in finite time.
But unlimited resources and finite time are not practical.
Similar with modern SOTA LLMs, while they could be computationally complete they would require unbounded amount of ram to do so, which is also impractical. Also invalid formula cannot reliably be detected.
Why this is ironic.
The Curry's Y combinator: Y = λf.(λx.(x x)) (λx.(x x)), lead to several paradoxes show that untyped lambda calculus is unsound as a deductive system.
Church–Turing thesis shows that lambda calculus and Turing machines are equivalent.
Here is Haskell Curry's paper on the Kleene–Rosser paradox which is related.
Semantics are nice, but it doesn't matter what name you give to technology that shatters economies and transforms the nature of human creative endeavours.
An AI's ability to contemplate life while sitting under a tree is secondary to the impact it has on society.
>technology that shatters economies and transforms the nature of human creative endeavours
one pattern in modern history has been that communication and creative technologies, be it the television or even the internet, had significantly less economic impact than people expected. Both television and the internet may have transformed business models and had huge cultural impact, but all things considered negligible impact on total productivity or the physical economy.
Given that generative AI seems much more suited to purely virtual tasks or content creation than physical work I expect that to repeat. Over past cycles people have vastly overstated rather than underestimated the impact tech has on the labor force.
> but all things considered negligible impact on total productivity or the physical economy.
It is obscenely hard to measure those sorts of things.
OK, here is one example.
It used to be that engineers (the physical kind, not the software kind) had their own secretaries to manage meetings and fetch documents.
Those secretaries are all out of work now, replaced by Outlook and PDFs.
Modern farms are wired with thousands upon thousands of IOT sensors, precisely controlling every aspect of the fields and crops. Soil is maintained in perfectly ideal conditions. The internet is what made this possible.
The Internet has allowed for individuals to easily trade stocks, which has had who knows how large of an impact on the economy, but I am willing to guess it isn't a small one.
The Internet also enabled all sorts of algorithmic trading to pop up.
Television is of course a huge source of economic output in its own right.
6.9% of the US GDP is Media and Entertainment, not sure if that includes video games or not.
The tech industry is at least 10% of the US GDP, remove the Internet and that drops dramatically.
What, the internet had negligible impact on the productivity or the economy?
Are you only talking about consumer products like YouTube or are you including everything related to the unparalleled exchange of data globally?
I cannot imagine the impact of a global information network for businesses having less impact than a few x2’s on just about every relevant axis you can imagine.
>> The connectivists left the 'AI' folks and established the ML field in the 90s.
The way I know the story is that modern machine learning started as an effort to overcome the "knowledge acquisition bottleneck" in expert systems, in the '80s. The "knowledge acquisition bottleneck" was simply the fact that it is very difficult to encode the knowledge of experts in a set of production rules for an expert system's knowledge-base.
So people started looking for ways to acquire knowledge automatically. Since the use case was to automatically create a rule-base for an expert system, the models they built were symbolic models, at least at first. For example, if you read the machine learning literature from that era (again, we're at the late '80s and early '90s) you'll find it dominated by the work of Ryszard Michalski [1], which was all entirely symbolic as far as I can tell. Staple representations used in machine learning models of the era included decision lists, and decision trees, and that's where decision tree learners, like ID4, C45, Random Forests, Gradient Boosted Trees, and so on, come; which btw are all symbolic models (they are and-or trees, propositional logic formulae).
A standard textbook from that era of machine learning is Tom Mitchell's "Machine Learning" [2] where you can find entire chapters about rule learning, decision tree learning, and other symbolic machine learning subjects, as well as one on neural network learning.
I don't think connectionists ever left, as you say, the "AI" folks. I don't know the history of connectionism as well as that of symbolic machine learning (which I've studied) but from what I understand, connectionist approaches found early application in the field of Pattern Recognition, where the subject of study was primarily machine vision.
In any case, the idea that the connectionists and the symbolists are diametrically opposed camps within AI reserach is a bit of a myth. Many of the luminaries of AI would have found it odd, for example Claude Shannon [3] invented both logic gates and information theory, whereas the original artificial neuron, the Pitts and McCulloch neuron, was a propositional logic circuit that learned its own boolean function. And you wouldn't believe it but Jurgen Schmidhuber's doctoral thesis was a genetic algorithm implemented in ... Prolog [4].
It seems that in recent years people have found it easier to argue that symbolic and connectionist approaches are antithetical and somehow inimical to each other, but I think that's more of an excuse to not have to learn at least a bit about both; which is hard work, no doubt.
[3] Shannon was one of the organisers of the Dartmouth Convention where the term "Artificial Intelligence" was coined, alongside John McCarthy and Marvin Minsky.
One comment on the article from what I’ve read so far. The article states that GPT bombed an economics test, but after trying out the first two questions on the test, I think that the test itself is poorly constructed.
The second question in particular drops the potential implicit assumption that only 100 people stand in line each day.
I face this issue in my CS masters program constantly, and would probably have failed this test much the same as GPT did.
That substack article poorly understand's Turing's paper anyway. Cars aren't even mentioned. Chess is briefly mentioned at the end. I wouldn't base any opinions off of it.
Turing's test was not "this computer fooled me over text, therefore it's an AI". It's a philosophical, "we want to consider a machine that thinks, well we can't really define what thinking is, so instead it's more important to observe if a machine is indistinguishable from a thinker." He then goes on to consider counterpoints to the question, "Can a machine think?" Which is funny because some of these counterpoints are similar to the ones in the author's article.
Author offers no definition of "think" or "invent" or other words. It's paragraph after paragraph of claiming cognitive superiority. Turing's test isn't broken, it's just a baseline for discussion. And comparing it to SHA-1 is foolish. Author would have done better with a writeup of the Chinese room argument.
The absurdity in all these debates is how quickly people move the goalposts around between "artificial human intelligence" and "artificial omniscience (Singularity)" when trying to downplay the potential of AI.
Wow, that blog led me down a rabbit hole. Wonder why Yarvin didn't comment on the societal and political impact of LLMs. Sam Altman semed to be supportive of Democratic Socialism and central planning on Lex Fridman's podcast.
The AI maximalists think we're on a exponential curve to the singularity and potential AI disaster or even eternal dominance by a AI dictator [Musk].
Realistically though, the road to AGI and beyond is like the expansion of the human race to The Moon, Mars and beyond, slow, laborious, capital and resource intensive with vast amounts of discoveries that still need to be made.
Without having an understanding of the architecture required for general intelligence, it is impossible to make claims like this. Nobody has this understanding. Literally nobody.
The human brain uses on the order of 10 watts of power and there are almost 8 billion examples of this. So we have hard proof that from a thermodynamic perspective general intelligence is utterly and completely mundane.
We almost certainly already have the computational power required for AGI, but have no idea what a complete working architecture looks like. Figuring that out might take decades, or we might get there significantly quicker. The timespan is simply not knowable ahead of time.
I'm not concerned in the slightest about "the singularity" and non-aligned superintelligences. AGI in the hands of malicious human actors is already a nightmare scenario.
I found out today I don't exactly have Covid brain fog. Covid has triggered my bipolar disorder, so I have flu-like symptoms and hypomania, a combo I've never experienced before so I'm not used to it. It's a bit wild.
Thanks! I've had Covid for 6 weeks now. Now that I've discovered it's mental effects are real, I can up my meds to help. The strange thing is I was using cannabis to help manage my bipolar, not on a regular basis, but as needed to help make myself aware when I am manic, but now I can't stand cannabis. I can't stand certain foods either. I'm not sure if things ever get back to "normal". It's a Covid adventure.
Take a look at Auto-GPT, it doesn't seem like AGI is far off. I say the AGI in a weak form is already here, it just needs to strengthen.
Tracking problematic actions back to the person that own the AGI will likely not be a difficult task. The owner of an AGI would be held responsible for its actions. The worry is that these actions would happen very quickly. This too can be managed by safety systems, although they may need to be developed more fully in the near future.
Sorry I have Covid with brain fog right now so maybe you could help me out
Edit: Off the top of my foggy head, LLMs as I understand them are text completion predictors based on statistical probabilities trained on vast amounts of examples of what humans have previously written, whose output is styled with neuro linguistic programming also based on vast numbers of styles of human writing. This is my causual amatuer understanding. There is no logical, reasoning programing such as the Lisp programmers attempted in the 1980's, but clearly the logical abilities of the current LLMs fall short and they are not AGI for that reason. So how do we add logic abilities to make LLMs AGI? Should we revisit the approaches of the Lisp machines of the 1980's? This requires much research and discovery. Then there's the question of just what is general intelligence. I've always thought that emotional intelligence played a huge role in high intelligence, a balance between logic and emotion or Wise Mind is wisdom. Obviously we won't be building emotions into silicon machines or will we? Is anyone proposing this? This could take hundreds of years to accomplish if it is even possible. We could simulate emotion but that's not the same, that's logic. Logical intelligence and emotional capability I think are a prerequisite for consciousness and spirituality. If the Universe is conscious and it arises in a focused manner in brains that are capable of it then how do we build a machine capable of having consciousness arise in it? That's all I'm saying.
In fact Greg Brockman explicitly said they are considering changing the release schedule in a way that could be interpreted as opening the door for a different versioning scheme.
And actually there is no law or anything that says that any particular change or improvement to the model or even new training run that necessities them calling it version 5. It's not like there is a Version Release Police that evaluates all of the version numbers and puts people in jail if they don't adhere to some specific consistent scheme.
> "it’s easy to create a continuum of incrementally-better AIs (such as by deploying subsequent checkpoints of a given training run), which presents a safety opportunity very unlike our historical approach of infrequent major model upgrades."
Translation: training GPT-5 will cost time and money, so we’re going to cash in on the commercialization of GPT-4 now while it’s hot. A bird in hand is worth two in the bush.
now assuming gpt-4 vision isn't just some variant of mm-react(ie what you're describing), that's what's happening here. https://github.com/microsoft/MM-REACT
images can be tokenized. so what happens usually is that extra parameters are added to a frozen model and those parameters are trained on an image embedding to text embedding task. the details vary of course but that's a fairly general overview of what happens.
the image to text task the models get trained to do has its issues. it's lossy and not very robust. gpt-4 on the other hand looked incredibly robust. they may not be doing that. idk
No worries. Like I said, that was just a general overview.
Strictly speaking the model doesn't have to be frozen (though unfreezing tends to make the original model perform much worse at NLP tasks) and the task isn't necessarily just image to text (Palm e for example trains to extract semantic information from objects in an image as well)
GPT-4's architecture is a trade secret, but vision transformers tokenize patches of images. Something like 8x8 or 32x32 pixel patches, rather than individual pixels.
Multi-model text-image transformers add these tokens right beside the text tokens. So there is both transfer-learning and similarity graphed between text and image tokens. As far as the model knows they're all just tokens. It can't tell the difference between the two.
For the model, the tokens for the words blue/azure/teal and all the tokens for image patches with blue are just tokens with a lot of similarity. It doesn't know if the token its being fed is text, image, or even audio or other sensory data. All tokens are just a number with associated weights to a transformer, regardless of what they represent to us.
I've had this thought that the next generation of AI isn't a long "training" period, but rather it probably makes sense to to train a barebones version, and to give it a "sleep cycle". During this time it could use the context (think of it as short term memory) and then fine tune the parent model with it turning the important stuff into long term "memories", with probably a pruning type mechanism for rarely used stuff to keep the important stuff a priority. Would turn AI into individuals with specialized knowledge, but maybe that's more useful even? Like I don't need an AI with an expertise in law, I just want to use it to automate this specialized business process I have which isn't easily automated.
I think Dall-e 3 basically exists in Bing. It is significantly better than Dall-e 2 and is close but not quite at the quality of Midjourny v5. I just generated a series of about 40 portraits and even the hands are significantly closer.
Open source is still so far ahead of midjourney it's not even funny. Like, a racist RimWorld mod author (Automatic1111) built a UI for stable diffusion which unlocks far more capabilities out of it than midjourney will ever have.
I'm not sure if you are trolling or not, but if you aren't then you haven't seen Midjourney v5. But I wouldn't blame you because your information is only like one month out of date which is short in normal timespans but so long in AI timespans.
No, midjourney V5 is not close to bridging the gap. I can fix hands on any model within stable diffusion by using several control nets for the hands. Luckily a 3D openpose extension which makes that super easy already exists for the Automatic1111 SD UI!
Out of the box midjourneyv5 generates great images but SD has controlnet, plethora of custom models to pick from, can run locally on most machines gives it an edge in my opinion.
I find most of the community models for Stable Diffusion to be pretty poor. A lot of models made from merges, waifu only models, models made from very small training datasets, and still stuck on 1.5 while 2.1 is much better overall.
However you have the amazing LoRas for 2.1 that are very powerful. The community is a bit ignoring them, but I think the potential is great.
But MidJourney is a lot better at text to image and image to image for now.
I feel like Bing Image Creator is at least DALL-E 2.5, it feels like it has higher quality outputs for the same prompt. Could also just be some form of post-processing, though.
I suspect it may be problem of input parameter exhaustion. Do they have enough source material that is safe/vetted for the next jump in training material. I can imagine that model poisoning is a real thing now...
"Some time" tomorrow? Next week? Next month?
This doesn't mean anything, its like saying "we don't have any plans to change anything" when a company acquires another.
Its all BS
With all the ChatGPT boom and startups built on top o their API hassle, OpenAI receives troves of data to train and fine tune their models on. GPT 3 was trained on only 17 gigabytes of data, and GPT 4 is not far away with 45 gigabytes. On the other side, Alpaca or Vicuna was fine tuned from LLaMA using only megabytes, if not hundreds of kilobytes of training data. I believe it is much more feasible path to significantly improve current generation LLMs.
There's got to be predictable ways of improving LLMs besides training data scale and parameter count. Arent LLMs robust enough to learn on their own via interacting with the world? Like put them in a turn based simulated environment.
I wonder if there's an assumption for how big an LLM should be before it could even conceivably be an LLM. Is there a minimum size necessary before that capability is plausible?
It is OK to slow down development, take some more profit, maybe keep doing the human in the loop RL refinement, etc.
From an engineering standpoint, even the less powerful GPT-3.5turbo model handles NLP tasks, really nice tools like LangChain and LlamaIndex that I covered in my last book make it easy to use your own data sources.
I think the possibilities of using what we currently have in useful projects are vast.
is there some kind of convention that defines what specifically constitutes GPT-n or does this just mean "we're not working on the successor to GPT-4 yet"?
There may be conventions but in no way can anyone force them to follow them. It's just a name for a release. They absolutely are working on the successor models and have stated they plan to release a model by June. Whether they are working on a new architecture or training running, they certainly have experiments, but who knows how serious they are.
Regardless they can and will call future models anything they want. They could easily just decide that the minor improvements that come out in a few months are called GPT-4.2 and the major new training run is called GPT-4.5 instead of GPT-5.
No, it is just an arbitrary version number for this series of models from OpenAI. They will flip to 5 when they make an architecture change that will force them to begin training from scratch. Until then they will continue to produce more refined versions of 4, potentially more general training or fine-tuned task-oriented training.
The way it currently works, there is a quite clear boundary, as all the smaller iterations are based on something of a fixed size that was expensively pretrained, and then have either finetuned weights or some extra layers on top, but the core model structure and size can't be changed without starting from scratch.
So if some particular GPT-4 improved successor is based on the GPT-4 core transformer size and pretrained parameters then we'd call it GPT-4.x, but if some other GPT-4 successor is a larger core model (which inevitably also means it's re-trained from scratch) then we'd call it GPT-5, no matter if its observable performance is better or worse or comparable to the tweaked GPT-4.x options.
Based on published research from Google and Meta it is somewhat known how much more capability is possible with the current approach, but it would require an extreme increase in compute and training set to achieve it. There are diminishing returns, but the returns appear to continue for a good while, even without any new model architecture discoveries. Right now the expense will likely mean that progress will be limited to the pace of Moore’s law.
In terms of what this improvement would actually look like in terms of real world, emergent capabilities, no one knows.
No, but llama's training architecture was designed around studying the curve of output quality vs training data size so we do have companies looking into this.
Just as in Pascal's wager, the conclusion relies on the unwarranted assumption which privileges a particular outcome over its exact opposite - e.g. a diety with exactly inverted criteria for heaven and hell, punishing those who believe in Christian God, and "Roko's antibasilisk", which spares those people who'd get punished by Roko's basilisk and punishes everyone else.
exactly! I gave a heartfelt letter to my shredder the other day and it simply destroyed it. issues like these are why AI alignment research is so critical.
They need a few algorithmic improvements first, imho. GPT4 is noticeably slower than GPT3.5 and apparently costs a lot more to use, implying some serious compute costs.
They could train it with more data in the hopes of getting another big leap there, but what data is left? They've fed it everything it seems.
So what's left is getting the runtime reduced in terms of the model size. Hire some brilliant minds to turn an N-squared into an N-log-N (or something to that effect).
He has just admitted that O̶p̶e̶n̶AI.com has partially trained GPT-5 and is already planning to test the 'so-called' useless guardrails around it.
There is no 'revolution' around this. Just 'evolution' with more data and more excessive waste of compute to create another so-called AI black-box sophist with Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).
At some point, with their tremendous lock-in strategy, O̶p̶e̶n̶AI.com and Microsoft will eventually use the lock-in to upsell and compete against their partners.
> Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).
I actually find it pretty amazing that more people aren't given pause by Sam Altman's involvement here. After the WorldCoin stuff, I'd think that he'd be viewed with a much more skeptical eye in terms of his ethics.
The late-night March 31 release of Worldcoin had the (unintended?) side-effect of making me think "a token to prove my personhood" was an April Fools Joke when I saw it the next morning and never thought about it again.
From my understanding they are training their new GPT models off of a checkpoint from the previous generation, so they technically have partially trained multiple future models in their GPT lineage.
GPU shortage ended not because of a crash in crypto, but because Ethereum switched to PoS and ditched GPUs. Other cryptos offer just a fraction of payouts from mining, so it's not profitable to mine them on GPUs unless you have free electricity.
Literally none of the LLMs we're talking about were trained on consumer GPUs where the market shortage mattered, they used things like nVidia A100 pods or custom hardware like Google TPU clusters.
The GPUs which are good for cryptocurrencies are decent for using ML models but are not good for training LLMs and vice versa, as the hardware requirements start to diverge. Training LLMs requires not only high compute power but also lots and lots of memory and extremely high-speed interconnect for large models, which ends up costing far more than the pure compute cryptomining needs, making it not cost-efficient for mining.
Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.