A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It

themachinestops@lemmy.dbzer0.com · 3 days ago

A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It

ulterno@programming.dev · edit-2 1 day ago

Another point is, the reason Google’s AI is able to identify CSAM is because it has that in its training data, flagged as such.

In that case, it would have detected the training material as ~100% match.

I don’t get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.

And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it’s E-Mail/GDrive and all on their servers and you can’t expect privacy.

Considering how many such stories of people having problems due to this system is coming up, is there any statistic of legitimate catches using this model? I suspect not, because why would anyone use Google services for this kind of stuff?

arararagi@ani.social · 1 day ago

You would think, but none of these companies actually make their own dataset, they buy from third parties.

ulterno@programming.dev · 23 hours ago

I am not sure which point you are answering to.
COuld you please specify.