This report identifies vulnerabilities in GPT-4, o1, and o3 models that allow disallowed content generation, revealing weaknesses in current alignment mechanisms.
jailbreaks actually are relevant with the use of llm for anything with i/o, such as “automated administrative assistants”. hide jailbreaks in a webpage and you have a lot of vectors for malware or social engineering, broadly hacking. as well as things like extracting controlled information.
jailbreaks actually are relevant with the use of llm for anything with i/o, such as “automated administrative assistants”. hide jailbreaks in a webpage and you have a lot of vectors for malware or social engineering, broadly hacking. as well as things like extracting controlled information.