Language & LLMs

What Is Jailbreaking LLMs?

Jailbreaking a large language model involves using carefully crafted prompts to circumvent its safety restrictions and elicit responses it would normally refuse. Techniques may include role-play framing, obfuscation, or exploiting instruction conflicts. Defending against jailbreaks is an ongoing challenge in AI safety.

What Is Jailbreaking LLMs?

Related topics

Further reading