Github copilot is already in clear violation of the gpl repos its using. But yes, I 100% agree that if its currently inadequate, then enforcement should be expanded. This backfires on no one except those who use gpl projects maliciously.
Reading public code is not a copyright violation, neither is reproducing tiny snippets from it. The latter falls under fair use and/or doesn’t even have sufficient complexity to fall under copyright in the first place, e.g you can’t copyright “1+1=2”.
And if you use the copilot for reproducing more complex code, then the programmer but not the tool is doing a copyright violation.
Strongmanning your argument you could think this copilot itself is a derivative work of the code it read, but this AFAIK isn’t the case as it is building its own database out of it and then only referencing this database. You might have a slightly stronger argument that this database is a derivative work, but as far as I can tell there is nothing in the GPL that forbids creating a code database and reading from it. If there was, then Github itself (a giant code database) would be in violation of the GPL.
GPL specifies that derived works have to be licensed under GPL, and similarly for other licenses. Their ML model wouldnt exist without the GPL code, ergo its a derived work. Github is not comparable at all, because the code hosted there is just data, not a core part of its functionality.
Feel free to disagree, but my (somewhat limited) understanding of such AI models says that the model data is not core part of its functionality either.
Edit: It’s like saying “the internet” is a core part of Google’s search algorithm’s functionality.
Github copilot is already in clear violation of the gpl repos its using. But yes, I 100% agree that if its currently inadequate, then enforcement should be expanded. This backfires on no one except those who use gpl projects maliciously.
It is clearly not in violation, I am not sure where you get that impression. Just like Github the software itself in not in violation of the GPL.
And the article gives some very good examples in how expanding copyright could backfire very badly.
How is it not in violation? Its reading repos, reproducing snippets from them, without crediting the authors or the licenses.
Reading public code is not a copyright violation, neither is reproducing tiny snippets from it. The latter falls under fair use and/or doesn’t even have sufficient complexity to fall under copyright in the first place, e.g you can’t copyright “1+1=2”.
And if you use the copilot for reproducing more complex code, then the programmer but not the tool is doing a copyright violation.
Strongmanning your argument you could think this copilot itself is a derivative work of the code it read, but this AFAIK isn’t the case as it is building its own database out of it and then only referencing this database. You might have a slightly stronger argument that this database is a derivative work, but as far as I can tell there is nothing in the GPL that forbids creating a code database and reading from it. If there was, then Github itself (a giant code database) would be in violation of the GPL.
GPL specifies that derived works have to be licensed under GPL, and similarly for other licenses. Their ML model wouldnt exist without the GPL code, ergo its a derived work. Github is not comparable at all, because the code hosted there is just data, not a core part of its functionality.
Feel free to disagree, but my (somewhat limited) understanding of such AI models says that the model data is not core part of its functionality either.
Edit: It’s like saying “the internet” is a core part of Google’s search algorithm’s functionality.