As far as I can tell, it's really just a large prompt template for ChatGPT + a very minimal TextMate grammar.
The talk about constraint-solving and stuff all sounds great (in theory), but if you're just prompting an LLM to follow those constraints it will fail a lot.
Agreed. I would guess that what's actually happening is that the state is forced to respect the constraints by appropriately interpreting the AI output. If that is the case, it seems a stretch to say the AI "respected" anything.
You might be able to enforce constraints through logit manipulation, essentially a logical extension of Top-K/Top-P with a more complex conditional ruleset rather than (or in addition to) a probabilistic one. Also, in most models, logits go through a normalization stage where, theoretically, a lot more can happen than just normalization. All of that said, I have no idea how, or even whether, SudoLang achieves this.
SudoLang seems to target a specification that is likely to work across LLMs without pretraining or special tooling, and as such I don't see much about logit manipulation in this project. But your idea, especially the conditional ruleset, is for sure thought-provoking.
I think this is kinda cool. It pushes forward our understanding of how to work with LLMs. However, it doesn't appear to be something that can be relied upon idempotently. It seems liable to every flaw people have identified in LLMs so far. If the syntax itself can be hallunicated or misunderstood, then it's no better than highly specific prose. Or rather, no better than any other arbitrary pseudo-code structure I could come up with on-the-spot.
At least it has a specification and beginnings of a testing suite? And I do like any new ways of reducing tokens without losing signal. Tho personally I haven't had many positive experience of having LLMs faithfully follow programming delimiters and punctuation like curlies and whitespace. LLMs like prose itself, as that's the bulk of their corpuses (corpii?), right?
If this can deliver idempotence across various domains, and the LLM isn't "distracted" or "jailbroken" by the interface's innards, then yeh, AWESOME. But it still feels fundamentally awkward and scrappy? .. Like trying to hammer a nail into a wall with frozen butter. It probably works, sometimes. Reliably tho? No. I don't know how happy I'd be to use it in production. I'd rather work to develop precise prompting tailored to my domain + splitting the domain into multiple atomic pieces instead of a monolithic prompt) + implementing appropriate I/O checks and filters.
I chatted with Eric on my podcast recently. It’s essentially just a special prompting syntax. The thing I found surprising is that it’s quite good at making chatbot like command interfaces. Hallucinations are still a problem but it still does a surprisingly good job of storing state between commands.
I watched the interview and think I get what's going on. In essence, he's been exploring prompt engineering since before it was cool, starting back in 2020. He and others have discovered some of the 'rough edges' of LLMs and have figured out a way to sand them down via prompting. Additionally, they've discovered ways to maximize their abilities, e.g., inference.
The demos are impressive. I'm excited to give it a try, as I have a lot of ideas for personal software tools where I'm the only customer, but not enough time or skill to build them myself.
Using AI to generate the docs for your language might save some work, but it would be better to proofread them and add them to your repo, rather than expecting people who don't know the language to be able to tell when they're inaccurate.
How does debugging, versioning and replicability work?
A more useful construct might be as a commenting format
# description: ai prompt and human description
# expected: what this block is supposed to do
# some begin marker
... code ...
# some end marker
And then if say, an API changes in the future or other incompatibility happens, then the "test" fails and the AI is given the old code, output, expected output, and description and asked to spruce it up to the modern times and then it gets somehow put inline with a rollback option and some audit log.
You can also have some semvar extension "version x.y.z (ai mutation syntax signature)" to allow others to replicate behavior.
This construct also allows people to run it with or without AI, even after mutation, so there is no forced change on the executer and the code has a consistent repeatable comparable ground truth so that diagnostics and expectations can be preserved.
You can even extend existing document formatters to support 'AI-ifying' since in a well formatted documented codebase you're actually most of the way there.
Heck, maybe you can even sloppily inference it to well documented code already
This is interesting but when I think AI programming language I'm envisioning a .AI file where I lay out various functions and describe what they do in natural language. Markdown would be a good starting point.
How this might then work is that I first choose a traditional language I'm familiar with e.g. C#. An intelligent compiler generates underlying traditional code for each of those natural language functions by figuring out what context is necessary and supplying that to an LLM for code generation.
This could be done by parsing just the top-level function names (could use simple markdown headings for this) and supplying the current function details to an LLM, then asking what additional context is required. This would be repeated until the LLM is satisfied with the input.
For stability, once a generated function is accepted by the developer, it is cached and not regenerated unless the description changes. For additional stability there could also be some accompanying tests in the function definition, also generating code via the same process.
If the LLM generates code that fails to compile, it could be provided with additional context until the issue is resolved, transparently.
If you find a bug, you update the function description to exclude the bug scenario, the compiler sees you've changed that part of the input and and re-runs the LLM to do codegen.
Once LLMs are sufficiently advanced, you might not even need to review the traditional underlying code any more.
I find it intriguing. It makes sense that this new kind of "thing" (LLMs) could be "programmed", and that you could craft a language specifically for it's abilities.
I've read the tutorials but I still find it hard to wrap my head around it.
Have you heard of any other language like this? Or had success using SudoLang?
It actually quite the opposite: it's counter intuitive that you could program these or, for that matter, any intelligence. The very point of a system being intelligent is that it will figure things out on its own which both means that you don't need to program it (provide a very detailed and strict set of instructions) and you won't be able to program it. The latter might be less obvious, and it's really just an intuition, but to me it seems that the fact/capability that it can figure out what you mean from a less precise set of instructions (i.e. prompts) is equal to it not following your instructions even when you think they are to be followed. Because, first of all, how would it know when to do which? And even if we introduce a magic word that switches modes it's still contradictory because your "program" would still be a loosely defined set of instructions and not a real program. Otherwise you'd be just using an actual programming language.
Now, if the system has some form of common sense (what we, humans call common sense), then it will be able to follow your instructions without doing unexpected things most of the time but it will still fail, just as natural intelligences do.
Instead of programming the "thing", what you can do is make the thing generate a program that you can test and review and run that. But that's definitely more work than giving a set of instructions to the LLM. But, for common tasks, it may acquire enough common sense so that the surprises will be rare enough.
But among colleages we use certain jargon which varies by industry and probably by country. Could LLMs have their own preffered jargon?
I usually write pseudocode when I'm thinking about a problem to solve, so in a way I'm "thinking with pseudocode" instead of plain language. Pseudocode is probably more accurate than plain language, and it's something I'd use when explaining to other humans what I want them to code (along with diagrams, which seems ChatGPT would understand now). So, to me, speccing this pseudocode to something the LLMs find easier to understand sounds reasonable. It's like understanding how a fellow programmer prefers to get his requirements.
>The very point of a system being intelligent is that it will figure things out on its own which both means that you don't need to program it (provide a very detailed and strict set of instructions) and you won't be able to program it
Humans are "intelligent", yet also "programmable" - why would you think an artificial "intelligence" (which, by definition was programmed to start with) would not be programmable?
Because humans aren't programmable either. As soon as you try to impose a complex program on a human, i.e. a set of high level instructions, you'll face a lot of complexity and end up with a process that's pretty far from what we call programming. Just think about whether you can program a programmer, or UI designer, etc.
Sure, you can program a human to do menial tasks and they can do it with acceptable accuracy but even that may require a lot of trial and error. ("Oh, but you said I should do this and that and never mentioned that in this special case I should do that other thing." Or, probably more relevant: "yes, you told me to do this and that but this situation looked different, so I solved it in another way I thought was better.")
> It actually quite the opposite: it's counter intuitive that you could program these or, for that matter, any intelligence.
Isn't that kind of what Pavlov proved with his dog? It happens to people all the time too. We are easily conditioned (on the aggregate) to give desired results.
That's a pretty low level response. Pavlov demonstrated that you can do this for a specific outcome. Yes, humans can be conditioned to exhibit some desired results but not any desired result. Also, conditioning is teaching, not programming. Programming is defining a set of steps/conditions, and then transferring it onto the target system which will interpret and then execute on the program.
Yes, you could say that repetition is part of the transfer, but that wouldn't be too useful, it would just conflate teaching/training with programming.
i was thinking about a good language to be AI generation target, something that would be strongly typed, and have a variety of inner checks to make sure that the execution doesn't end up in an infinite loop or whatever
Even though halting is generally undecidable, there are still large classes of programs for which you _can_ show termination. If you reject every program for which you cannot show termination, you will also reject some programs that terminate, but you never need to worry about halting again. Indeed, languages such as Idris do exactly that. [0]
As far as I can tell, it's really just a large prompt template for ChatGPT + a very minimal TextMate grammar.
The talk about constraint-solving and stuff all sounds great (in theory), but if you're just prompting an LLM to follow those constraints it will fail a lot.