I’m not familiar with GitHub Copilot’s internal workings, copyright law, or the author of the article. However, some ideas:
GitHub Copilot’s underlying technology probably cannot be considered artificial intelligence. At best, it can only be considered a context-aware copy-paste program. However, it probably does what it does due to the programming habits of human developers, and how we structure our code. There are established design patterns - ways to do things - that most developers follow; certain names we give to certain variables, certain design patterns that we use in a specific scenario. If you think of programming as a science, you could say that the optimum code for common scenarios for a language have probably already been written.
Human devs’ frequent use of 1) tutorial/example/sample code of frameworks, libraries, whatnot and 2) StackOverflow code strengthens this hypothesis. Copilot is so useful (allegedly) - and blatantly copying, for example, GPL code (allegedly) - simply because a program trained on a dataset of crowdsourced, optimal solutions to problems devs face will more often than not simply take that optimal solution and suggest that solution in its entirety. There’s no better solution, right? For all I’ve heard, GitHub Copilot is built on an “AI” specializing in languages and language autocompletion. It may very well be that the “AI” simply goes, when the dev types this code, what usually comes up after? Oh, that? Let’s just suggest that then.
There’s no real getting around this issue, as developers probably do this when they write their code too. Just use the best solution, right? However, for many algorithms, developers know how they work and implement them based on that knowledge; not because in most code the algorithm looks like this algorithm in FOSS project XYZ. They probably won’t use the same variable names too. Of course, it could be argued that the end product is the same, but the process isn’t. This is where the ethical dilemma comes up. How can we ensure that the original solvers of the problem, or task, are credited or gain some sort of material benefit? Copilot probably cannot just include the license of the code it has taken and its author when suggesting code snippets, because of how the dataset may be structured. How could it credit code snippets it uses? Is what it does ethical?
I do agree with the article that Copilot does not currently violate copyright law of code protected by the GPL or other licenses, simply due to exceptions in the application of copyright licenses, or the fine print. I don’t know what could be a possible solution.
As for Julia Reda, she is a former member of the EU parliament specializing in Copyright law from the perspective of the Pirate Party… tl;dr pro-copyleft but ultimately anti-copyright in general.
I’m not familiar with GitHub Copilot’s internal workings, copyright law, or the author of the article. However, some ideas:
GitHub Copilot’s underlying technology probably cannot be considered artificial intelligence. At best, it can only be considered a context-aware copy-paste program. However, it probably does what it does due to the programming habits of human developers, and how we structure our code. There are established design patterns - ways to do things - that most developers follow; certain names we give to certain variables, certain design patterns that we use in a specific scenario. If you think of programming as a science, you could say that the optimum code for common scenarios for a language have probably already been written.
Human devs’ frequent use of 1) tutorial/example/sample code of frameworks, libraries, whatnot and 2) StackOverflow code strengthens this hypothesis. Copilot is so useful (allegedly) - and blatantly copying, for example, GPL code (allegedly) - simply because a program trained on a dataset of crowdsourced, optimal solutions to problems devs face will more often than not simply take that optimal solution and suggest that solution in its entirety. There’s no better solution, right? For all I’ve heard, GitHub Copilot is built on an “AI” specializing in languages and language autocompletion. It may very well be that the “AI” simply goes, when the dev types this code, what usually comes up after? Oh, that? Let’s just suggest that then.
There’s no real getting around this issue, as developers probably do this when they write their code too. Just use the best solution, right? However, for many algorithms, developers know how they work and implement them based on that knowledge; not because in most code the algorithm looks like this algorithm in FOSS project XYZ. They probably won’t use the same variable names too. Of course, it could be argued that the end product is the same, but the process isn’t. This is where the ethical dilemma comes up. How can we ensure that the original solvers of the problem, or task, are credited or gain some sort of material benefit? Copilot probably cannot just include the license of the code it has taken and its author when suggesting code snippets, because of how the dataset may be structured. How could it credit code snippets it uses? Is what it does ethical?
I do agree with the article that Copilot does not currently violate copyright law of code protected by the GPL or other licenses, simply due to exceptions in the application of copyright licenses, or the fine print. I don’t know what could be a possible solution.
Thanks for stating my point more eloquently :)
As for Julia Reda, she is a former member of the EU parliament specializing in Copyright law from the perspective of the Pirate Party… tl;dr pro-copyleft but ultimately anti-copyright in general.