They can work really well if you put sufficient upfront engineering into your architecture and it's guardrails, such that agents (nor humans) basically can't produce incorrect code in the codebase. If you just let them rip without that, then they require very heavy baby-sitting. With that, they're a serious force-multiplier.
They just make a lot of mistakes that compound and they don't identify. They currently need to be very closely supervised if you want the codebase to continue to evolve for any significant amount of time. They do work well when you detect their mistakes and tell them to revert.