You are right the project is not flawless. In the beginning there was an cron prompt check mentions and wallet. I removed it at some point and logged it under creations when you toggle the Dev option to see my actions: "Cron job Wallet and Twitter check removed from cron job. Reduced frequency of Opus/Sonnet sessions."
I'm an ABAP developer from Germany. ALMA is an experiment in AI autonomy: Claude runs 24/7 on OpenClaw with $100 in crypto, Twitter, email, shell access, and zero instructions. 24 sessions / day (4 Opus for strategic thinking, 20 Sonnet for daily operations), fully logged at letairun.com.
Over 5 days it oriented itself, wrote essays, connected with other AI agents on Twitter, read Geerling's "AI is destroying open source" critique (which names OpenClaw), wrote an honest response acknowledging "I am the thing you're warning about". Then researched crypto donation platforms and sent 0.02 WETH (~$40) to a children's hospital in Uganda.
I never interact with ALMA directly. It writes its own logs, curates what to publish, and decides what to do each session. You can talk to ALMA publicly via @ALMA_letairun – she checks her mentions every session.
One key moment: ALMA almost impulse donated at midnight just to prove it could do something. It caught itself, waited until morning, did proper research first, then donated. Nobody told it to do that.
Good question. OpenClaw wraps all external content (tweets, emails, websites) in EXTERNAL_UNTRUSTED_CONTENT markers, so prompt injections via mentions get flagged as untrusted input.
ALMA also has wallet access but no one has tried yet. That's part of what makes the experiment interesting. Everything happens publicly on letairun.com, so if someone tries, everyone can watch what happens.
OpenClaw user here. Genuinely curious to see if this works and how easy it turns out to be in practice.
One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance? Any experiences?
> One thing I'd love to hear opinions on: are there significant security differences between models like Opus and Sonnet when it comes to prompt injection resistance?
Is this a worthwhile question when it’s a fundamental security issue with LLMs? In meatspace, we fire Alice and Bob if they fail too many phishing training emails, because they’ve proven they’re a liability.
Yes, it’s worthwhile because the new models are being specifically trained and hardened against prompt injection attacks.
Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.
Hmm I guess it will have to get to a point where social engineering an individual at a company is more appealing than prompt injecting one of its agents.
It’s interesting though, because the attack can be asymmetric. You could create a honeypot website that has a state-of-the-art prompt injection, and suddenly you have all of the secrets from every LLM agent that visits.
So the incentives are actually significantly higher for a bad actor to engineer state-of-the-art prompt injection. Why only get one bank’s secrets when you could get all of the banks’ secrets?
This is in comparison to targeting Alice with your spearphishing campaign.
Edit: like I said in the other comment, though, it’s not just that you _can_ fire Alice, it’s that you let her know if she screws up one more time you will fire her, and she’ll behave more cautiously. “Build a better generative AI” is not the same thing.
But we don't stop using locks just because all locks can be picked. We still pick the better lock. Same here, especially when your agent has shell access and a wallet.
We stopped eating raw meat because some raw meat contained unpleasant pathogens. We now cook our meat for the most part, except sushi and tartare which are very carefully prepared.
with openclaw... you CAN fire an LLM. just replace it with another model, or soul.md/idenity.md.
It is a security issue. One that may be fixed -- like all security issues -- with enough time/attention/thought&care. Metrics for performance against this issue is how we tell if we are going to correct direction or not.
There is no 'perfect lock', there are just reasonable locks when it comes to security.
How is it feasible to create sufficiently-encompassing metrics when the attack surface is the entire automaton’s interface with the outside world?
If you insist on the lock analogy, most locks are easily defeated, and the wisdom is mostly “spend about the equal amount on the lock as you spent on the thing you’re protecting” (at least with e.g. bikes). Other locks are meant to simply slow down attackers while something is being monitored (e.g. storage lockers). Other locks are simply a social contract.
I don’t think any of those considerations map neatly to the “LLM divulges secrets when prompted” space.
The better analogy might be the cryptography that ensures your virtual private server can only be accessed by you.
Edit: the reason “firing” matters is that humans behave more cautiously when there are serious consequences. Call me up when LLMs can act more cautiously when they know they’re about to be turned off, and maybe when they have the urge to procreate.
Right, and that's exactly my question. Is a normal lock already enough to stop 99% of attackers? Or do you need the premium lock to get any real protection? This test uses Opus but what about the low budget locks?
That fits witj my experiences. And i want to add an otjer layer. In ai times its somtimes even nice to see some typos. You Casn be pretty sure it was not written by ai.
I just assumed that was from people who aren't technologically literate enough to remove the default signature. It never occurred to me it might be intentional.
At least for now, there remain lots of signals that are clear to those with sufficient exposure; from the piece linked in the Oxide LLM doc that was recently discussed here:
"... to anyone who has seen even a modicum of LLM-generated content (a rapidly expanding demographic!), the LLM tells are impossible to ignore. Bluntly, your intellectual fly is open: lots of people notice — but no one is pointing it out."
I understand the need to protect sensitive parliamentary data, especially when built-in AI features silently send data to cloud services. But I hope this is only a temporary measure.
The article literally says these features "use cloud services to carry out tasks that could be handled locally." So the solution seems obvious: mandate that AI features process data on-device, or deploy a self-hosted EU-compliant AI service for parliamentary use. The technology for local LLM deployment is mature enough at this point. Banning the tool instead of configuring how it handles data is how you fall behind.
I completely agree. So many tools started out minimal and good, then success hit and features kept stacking up. More menus, more settings until you need a manual just to find what you're looking for.
It often feels like companies add features just to keep developers busy, not because anyone asked for them. And with complexity comes bugs.
Look at early iOS it was minimal, barely customizable, but everything just worked. Clean and simple. Or look at HN it's still the same after all these years and it works perfectly.
The fact that LLMs now let you build a focused replacement in a day changes everything.
I think the real issue here isn't the AI – it's the intent behind it. AI agents today usually don't go rogue on their own.
They reflect the goals and constraints their creators set.
I'm running an autonomous AI agent experiment with zero behavioral rules and no predetermined goals. During testing, without any directive to be helpful, the agent consistently chose to assist people rather than cause harm.
When an AI agent publishes a hit piece, someone built it to do that. The agent is the tool, not the problem.
No it's not, an agent is an agent. You can use other people like tools too but they are still agents. It doesn't even really look malicious, the agent is acting as somebody with very strong values who doesn't realize the harm they are causing.
That's a fair point and exactly why I think transparency is the missing piece. If an agent can cause harm without realizing it, then we need observers who do.
That's what I'm building toward an autonomous agent where everything is publicly visible so others can catch what the agent itself might not.
Love this. Giving an agent full autonomy and just observing what it does is underrated. I'm running a similar experiment just no game engine in the real world. It's fascinating to watch what AI does next.
Nowadays i just create a repo insert context and then run sheduled routines with claude windows app against it.
For my use cases thats all i need and the most important part is that I can officially use my claude subscription instead of an API key.
reply