• nutomic@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    3 years ago

    How is a programmer who uses copilot going to know that they snippet they are getting suggested comes from a GPL-licensed project? At the moment thats impossible, so it cant be the standard assumption tha tthe output is public domain.

    • poVoq@lemmy.mlOP
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      3 years ago

      How is a programmer going to know that the person who posted code on stackoverflow hasn’t taken it from a GPL licensed project? But the question is besides the point and irrelevant to the question if the ML model itself is a (legally speaking) derivative of the training data used. IANAL this is currently not the case under existing copyright legisation around the globe.

      As for the output itself: There is the legal concept in copyright law that really small snippets of text or sound can not be copyrighted. If the AI then assembles genuinely new code and functionality from these snippets (theoretically feasible, but not what the co-pilot does), then this resulting code is in the public domain as IANAL currently a machine can not have copyright (and the legal case of it’s owners being able to claim copyright AFAIK hasn’t been fully established in courts). But if a human programmer uses a tool like the co-pilot to assemble these snippets he or she can claim copyright of it.

      But if the result is nearly indistinguishable from a copyrighted piece of code than that programmer will not be able to proof that is wasn’t in fact a copyright violation and thus in praxis it is.

      • nutomic@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        3 years ago

        Posting code on stackoverflow doesnt magically put it in the public domain, as copilot allegedly does.

        (theoretically feasible, but not what the co-pilot does)

        I am not considering what copilot could or might do in the future. I am talking about what it does now, and that is generating exact copies of 10+ lines. Including license texts which it certainly didnt assemble on its own.

        • poVoq@lemmy.mlOP
          link
          fedilink
          arrow-up
          2
          ·
          3 years ago

          No one is claiming that the co-pilot is magically putting all code it suggests in the public domain. That is just a strawman argument.

          If the code sippets it suggests have insufficient technical complexity to be considered a copyrightable piece of information, then like any other such text snippet (regardless of the source) is in the public domain. This is half or single line type of auto-completion level stuff.

          If the programmer choses to continually pressing the autocomplete button so that a sufficiently complex piece of code is pasted into their editor, then that programmer has to be aware that this is likely a copyright violation, just like if he or she was cut and pasting large code pieces from stackoverflow or any other source where the license isn’t clear.

          • nutomic@lemmy.ml
            link
            fedilink
            arrow-up
            2
            ·
            3 years ago

            Will copilot warn the original author of the stolen code in that case, so that they can sue the copyright violator? Why does copilot even allow inserting more than one line in that case? If you are right that means that it is actively enouraging copyright violation, which puts it on the same level as thepiratebay.org.

            • poVoq@lemmy.mlOP
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              edit-2
              3 years ago

              Will your preferred code editor warn the original author that you just cut an pasted some copyrighted code into it? How would it even know?

              It allows inserting more than one line because it is dumb and can not know if the piece of code it referenced is copyrighted or not and who wrote it. It just looks at the immediate context of the place of your cursor, then looks at its database where it says “usually these three words are followed by these other three words or letters” and then suggests that (very simplified speaking).

              And no it is not anywhere close to the Piratebay ;)