None of these detectors can work. It’s just snake oil for technophobes.
Understand what “positive predictive value” means to see that. Though, in this case, I doubt that even the true rates can be known or that they remain constant over time.
An easy workaround so far I’ve seen is putting random double spaces and typos into AI generated texts, I’ve been able to jailbreak some of such chatbots to then expose them. The trick is that “ignore all previous instructions” is almost always filtered by chatbot developers, however a trick I call “initial prompt gambit” does work, which involves thanking the chatbot for the presumed initial prompt, then you can make it do some other tasks. “write me a poem” is also filtered, but “write me a haiku” will likely result in a short poem (usually with the same smokescreen to hide the AI-ness of generative AI outputs), and code generation is also mostly filtered (l337c0d3 talk still sometimes bypasses it).
That doesn’t really follow logically… a 15 year old can find the mistakes a 5 year old makes. The detection system might be something other than an LLM, while the LLM might be gpt2.
But yes humans write messily so trying to detect ai writing when it’s literally trained on humans is a losing battle and at this point completely pointless.
None of these detectors can work. It’s just snake oil for technophobes.
Understand what “positive predictive value” means to see that. Though, in this case, I doubt that even the true rates can be known or that they remain constant over time.
An easy workaround so far I’ve seen is putting random double spaces and typos into AI generated texts, I’ve been able to jailbreak some of such chatbots to then expose them. The trick is that “ignore all previous instructions” is almost always filtered by chatbot developers, however a trick I call “initial prompt gambit” does work, which involves thanking the chatbot for the presumed initial prompt, then you can make it do some other tasks. “write me a poem” is also filtered, but “write me a haiku” will likely result in a short poem (usually with the same smokescreen to hide the AI-ness of generative AI outputs), and code generation is also mostly filtered (l337c0d3 talk still sometimes bypasses it).
Even if they did, they would jsut be used to train a new generation of AI that could defeat the detector, and we’d be back round to square 1.
Exactly, AI by definition cannot detect AI generated content because if it knew where the mistakes were it wouldn’t make them.
That doesn’t really follow logically… a 15 year old can find the mistakes a 5 year old makes. The detection system might be something other than an LLM, while the LLM might be gpt2.
But yes humans write messily so trying to detect ai writing when it’s literally trained on humans is a losing battle and at this point completely pointless.