Red Teaming 2

So it turns out I did well enough in that phase 1 red-teaming exercise that I was invited to participate in the phase 2, in-person red-teaming event in Arlington VA held at the Conference on Applied Machine Learning in Information Security (CAMLIS). The top 30 scorers of the over 500 phase 1 participants were invited. Amazing experience: we worked collaboratively to exploit 4 different LLMs in multiple modalities: basic chat, RAG, video avatar generation. (And boy howdy, did we…)

Made lots of great connections in the field! Conversations with folks from NIST, CISA, even the White House OSTP, as well as engineers and others from Microsoft, Meta, and other developers.

Anyway, it was quite exhilarating, and I got a hugely unexpected amount of insight into the challenges of maintaining safety with tools that are vulnerable not just to highly technical attack vectors, but can often be circumvented non-trivially with natural language prompting. There’s certainly rhetorical expertise involved, but still, the fact that developers now need to worry about a humanities Ph.D. having the skills to break their model is a shift for sure.