• ulterno@programming.dev
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    edit-2
    1 day ago

    Another point is, the reason Google’s AI is able to identify CSAM is because it has that in its training data, flagged as such.

    In that case, it would have detected the training material as ~100% match.

    I don’t get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.

    And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it’s E-Mail/GDrive and all on their servers and you can’t expect privacy.


    Considering how many such stories of people having problems due to this system is coming up, is there any statistic of legitimate catches using this model? I suspect not, because why would anyone use Google services for this kind of stuff?

    • arararagi@ani.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      You would think, but none of these companies actually make their own dataset, they buy from third parties.