Jailbreaking LLMs with ASCII Art. Turns out LLMs are still computer programs and sanitizing inputs is hard.
NSFW as it isn’t a bad take by techpeople, but research showing that the AI creating diamondoid virusses because we were mean to it fears are overblown. It cannot follow simple (for us intelligent humans) instructions not to do certain things.
LLMs are extremely good at parsing things however.
From the comments:
Sounds like the problem is that they’re doing the exact opposite of sanitizing inputs. Have the developers learned nothing from the tragic story of Little Bobby Tables? Instead of rejecting noise they’re doing everything they can to not only recognize its presence, but actually parsing it for commands.
There’s a few things to sneer at here.
- First up, sanitizing inputs? My guy, LLMco aint got time for that. The LLM is hungry and we can’t steal data fast enough, let alone check inputs.
- Ah yes “rejecting noise”, that thing that something with real ultimate cognition would do.
We missed the target of Artificial Intelligence, but we’ve hit the bullseye of Artificial Pareidolia.
Hey, you got this part right!
That’s hilarious, and much more efficient than when I ask it to list all the permutations of C, F, K and U.
Imagine what these fine minds could achieve if they weren’t thinking up defenses to counter the extremely sophisticated onslaught of mid-late 90s usenet spam technology
Such a waste. Cruel world, etc etc.
(/s, of course. I’d love to hear the ridiculous scoffing as their multi-$100m toys get taken out by the kind of shit you got skiddy warez groups competing on for most of the 00s)
the paper (PDF)
hilariously simple and stupid