Oh yeah I see what you mean. I struggle with discerning them though.
I worry that the training data for deepfakes is suspiciously normative. (there seems to be no neurodiverse, queer, or (physically) disabled people in those training sets).
Well first deepfakes need to die. It’s a dangerous tech that should not exist at all and does not need any more research.
To be fair, i haven’t dug into deepfake models, but i assume you would train them on the specific person you’re trying to deepfake: i mean for basic video stuff going with a pre-trained model may be ok but for audio there’s no way you can get away with it ;)
I guess it works fine for very formal speech, where most voices sound alike. But i have strong doubts that it would capture the subtleties of popular/queer speech
Actually the opposite, among the best ones are cartoon characters from TV shows, see https://fakeyou.com/ for example, and that is probably still far from what can be done with a bit more effort.
Oh yeah I see what you mean. I struggle with discerning them though.
I worry that the training data for deepfakes is suspiciously normative. (there seems to be no neurodiverse, queer, or (physically) disabled people in those training sets).
Well first deepfakes need to die. It’s a dangerous tech that should not exist at all and does not need any more research.
To be fair, i haven’t dug into deepfake models, but i assume you would train them on the specific person you’re trying to deepfake: i mean for basic video stuff going with a pre-trained model may be ok but for audio there’s no way you can get away with it ;)
There are also specific ML models for audio that sound pretty convincing in replicating a specific person’s voice.
I guess it works fine for very formal speech, where most voices sound alike. But i have strong doubts that it would capture the subtleties of popular/queer speech
Actually the opposite, among the best ones are cartoon characters from TV shows, see https://fakeyou.com/ for example, and that is probably still far from what can be done with a bit more effort.
Edit: https://www.wired.co.uk/article/simpsons-deepfake-voice-actors-ai
You’re precisely proving my point that you need a huge sample of voice from that person in order to “replicate” them ;)
EDIT: For example, in the case of Simpsons, they have 25 seasons of voice data to train the model.