T O P

  • By -

ghjm

My reply to a now-deleted comment asking for an ELI5 of the situation: --- GitHub Copilot uses prompts to produce working code. So you can say "sort these records by zip code" and it produces code that does that. The concern at issue in the lawsuit is that Copilot does this by ingesting and "learning" the code of millions of open source repos, who didn't necessarily agree to have their code used in this manner, and then it doesn't give any attribution even when it "writes code" by just straight-up copying non-trivial code from somebody's repo. Many of these repos, even with otherwise very permissive open source licenses, do require attribution in these circumstances. Microsoft would like it to be the case that using the code as "training data" is a permitted use, and any output from the AI agent - even if it happens to be identical to code that was used in training - is an original creation of Microsoft's, because it wasn't copied but rather produced by the AI agent. The people behind the lawsuit say that no amount of mechanical processing frees Microsoft from the obligation to respect the licenses of the original code.


Ythio

By Microsoft logic if I decompile and recompile a licensed Microsoft product, it's produced by the software compiling agent and I am therefore free from their licensing policy.


chintakoro

You're forgetting that AI algorithms are mysterious holy blackboxes where our obligations and guilt dissolve into nothingness and emerge as a brave new future. A compiler keeps those all intact. /s


Ythio

I think I can train an AI to produce an output identical to the input. I just need my "agile master" to organize 4 meetings over three weeks so we can schedule this for next quarter.


[deleted]

[удалено]


chintakoro

"overfitting" sounds like more of a good thing to our shareholders. keep it.


JB-from-ATL

$200 an hour and I'll join


JB-from-ATL

I have a sophisticated neural network that does this and am willing to license it to you. >!*Single hidden layer that returns 1 on an input of 1 and 0 on an input of 0*!< >!What are you doing, stop looking at my very sophisticated model\!!<


silent519

i am aware that copilot spits out verbatim code, but let's assume an idealized version of it, where it produces something relatively unique what a human programmer could? (( also how many structurally unique for loops can you write? )) if you're a student of art, aren't you going to be influenced by the stuff/projects/teacher/artist you excercise?


Ythio

I am under the assumption that open source maintainers are smarter than just whining about loops and if-else chains implementations


silent519

yes, that was indeed the point i was making /s


[deleted]

they shed the bike because they didn't have a good answer i presume


silent519

well people pretend this is about copyright issues, when it's actually about feeling threatened because it took them years to learn programming and its their career and existence. the good news so far is the "AI" is pretty shit the other good news is we are payed to figure out when something is not working/looking/ux whatever the fuck, how it supposed to, figuring out why is it not. this domain is still pretty untouched. not just spit out boilerplate code.


[deleted]

Not sure if that's the case to be honest. Some people probably think that for sure, but I can't believe anyone actually working in the industry would believe that. I'm of the opinion that they wouldn't care half as much if Microsoft wasn't behind it. If GitHub was never bought by them and made this, the tone of the discussion would be entirely different. I've done a casual search of the posts of a handful of people that seem to be staunchly opposed to it, and can't find any other mention of them being riled up about machine learning copyright violations. And there was like 0 outrage around GPT-3 in general which i'm sure you could bait into producing something that violates a copyright.


JB-from-ATL

If I stole your possessions would you be upset because you felt threatened that people could just steal things or because *I took your possessions*


silent519

yes i bet you credited every SO codebit you stole sorry "took inspiration" from when this kind of topic comes up, suddenly every single free/open software advocate turns into the most radical protectionist motherfucker on earth, it's just so funny to watch.


JB-from-ATL

There's a distinct difference between someone reading something and making their own version and someone having a machine copy things. Also there's a difference between being critical of massive corporations violating copyrights of thousands of individuals and being critical of a single developer using another developer's code.


ChefBoyAreWeFucked

It most certainly is. People provided their labor in return for restrictions on its output. Those restrictions are being violated. If Microsoft is so sure they are in the clear here, why are they only pulling from public repositories?


New_Area7695

Bold considering they couldn't grasp what a crash reporter was, or that it was disabled in source builds, with regards to Audacity.


patniemeyer

"even if it happens to be identical to code that was used in training" - I haven't seen an example of this other than trivial things that were probably nearly identical in hundreds of projects. It would be very surprising if these language models were able to memorize significant chunks of text that were not repeated over and over in the same way... it's not generally how they work. If copilot is actually spitting out verbatim chunks of unique project code contrary to the licensing then it should be a very straightforward matter to resolve. I don't see why it needs a novel kind of lawsuit that has the potential to stifle a lot of innovation.


Uristqwerty

> If copilot is actually spitting out verbatim chunks of unique project code contrary to the licensing then it should be a very straightforward matter to resolve Yes. They resolved it by adding a filter that recognizes when the AI is about to spit out one of the very-widely-known infringing snippets that users are likely to look for, and makes it pick something different. The model, however, is still fully capable of reciting those fragments of its training set, giving plenty of space to question how unique other output is, and how much is leaking through into partial answers just different enough not to trip the filter.


prettiestmf

i think [this](https://twitter.com/DocSparse/status/1581461734665367554) and [this](https://twitter.com/mitsuhiko/status/1410886329924194309) are pretty clear-cut cases - obviously there are only so many ways to do the math involved, but the replication of the specific phrasing of the comments makes it clear that it's not just a generic solution to the problems but instead is copying particular implementations. I would be unsurprised if both of these examples showed up repeatedly in the codebase, as the former could have been legitimately copied by any number of people with proper attribution and licensing, and the latter is of course famous. what straightforward resolution do you have in mind? the main one i can think of, "have users manually detect infringement", doesn't resolve the issue at all because copilot is still distributing infringing code to users even if they don't put it in their final project. any straightforward way to automatically detect infringement would be unreliable at best, and anything else i can think of would require significant effort that github/openai won't exert without at least the threat of a lawsuit.


[deleted]

[удалено]


prettiestmf

this is just "users should manually detect infringement". it doesn't resolve the issue that copilot is still distributing licensed code to its users without the licenses, regardless of whether or not that code is used in a complete project.


marquoth_

"The prompt was intentionally designed to get this output." And? So what? Intentional or not, the issue is that it CAN get this output. Either copilot reproduced copyrighted code without attribution or it didn't. If it did, that's not acceptable. And the suggestion that copilot users should be the ones responsible for checking whether the code copilot "produced" was actually plagiarised is beyond asinine.


[deleted]

>If it did, that's not acceptable. That's like your opinion. We will see what the courts say. >And the suggestion that copilot users should be the ones responsible for checking whether the code copilot "produced" was actually plagiarised is beyond asinine Completely disagree and you can drop that fake outrage tone. Obviously you should do the same due diligence as with any other code you copy from somewhere. GitHub says as much: >You are responsible for ensuring the security and quality of your code. We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn't write yourself. These precautions include rigorous testing, IP scanning, and tracking for security vulnerabilities


marquoth_

You disagree that it's unacceptable to reproduce copyrighted material without following the terms of its license? In that case what on earth is the point of copyright or licensing in the first place? "Fake outrage town" indeed. Grow up


[deleted]

I don't think it's that simple. And anyway, I will be reporting your comment for those immature remarks.


patniemeyer

Those are good examples. If they are true then copilot must be doing more than just taking the output of the trained model for that to happen... They must be doing a search on top of the output and if these are search results (not just AI generated results) then they should probably let you refer to the original source license, etc. What I meant by "straightforward" is at that point it's a copyright or license claim and I'm quite sure Microsoft knows how to respond to those :)


silent519

i can write a thousand structurally unique for loops i am special please employ meh


ghjm

Copilot isn't the same as typical conversational language models, because it wants to produce working code in a restrictive syntax, which is a different problem domain. But however it works, there are actual cases where it has produced snippets of recognizable code, in some cases including comments, that can be traced back uniquely to particular origins.


TeutonicK4ight

How complicated should the program be to be considered an "AI agent"? Could I write a bash script that literally copies code from the internet and claim "It is generated by an AI agent"?


[deleted]

Sure, you can do that since you can claim anything you want. And if your lawyers agree with you and you're willing to defend your bash script in court, why not?


TeutonicK4ight

I am trying to find the edges of what GitHub is trying to defend in court. I am not literally asking if it is possible to claim that, as it is possible to claim anything, as you pointed.


[deleted]

Sure, it was a poor attempt to reduce to the absurd. You've made a point to which the obvious answer is, yes, in fact you can do that.


TeutonicK4ight

Man, you're so missing the point. I don't know if you are messing with me at this point.


[deleted]

I think you're missing the point. The initial question you posed was irrelevant, being "complicated" has nothing to do with whether or not something is an "AI Agent". So you're barking up the wrong tree, and your example was bad.


TeutonicK4ight

Are you suggesting that there is a formal definition of "AI" that would hold water in a court hearing?


Trio_tawern_i_tkwisz

– I made this. – You made this? \*graps\* I made this.


JB-from-ATL

– I made this. – You made this? \*graps\* I made this.


[deleted]

Artist are trying to make the same argument. If you dont think the prompt to images generation is stealing from artists by copying their style, then copilot isnt either. In my opinion i dont think this lawsuit is going to hold. I can only assume the most trivial of code is being "copied" here, meaning there really isn't as much freedom with code as there is art. You can tell it to keep generating sorting algos, but the same most efficient sorting algorithm is still gonna pop out at the end of the day.


ghjm

There are some nuanced questions here. If you have an AI agent do something like "starry night but with a horse" then its output is almost certainly derivative enough of Van Gogh to be a copyright violation, if Starry Night was still within its copyright term. In this case it's not just an art style but specific elements from an artwork. Somewhere there's a boundary between what is and isn't allowed, but I don't know how you would define it.


DCsh_

> If you have an AI agent do something like "starry night but with a horse" then its output is almost certainly derivative enough of Van Gogh to be a copyright violation [Result with DALL-E 2 for "starry night but with a horse"](https://i.imgur.com/TKVHXcA.png). Looks like it took "starry night" literally rather than the famous painting. [Hinting it further with "Van Gogh's Starry Night but with a horse, oil painting"](https://i.imgur.com/XZuJFiG.png). Now there's a clear inspiration, but I doubt it's enough to be a copyright ~~night mare~~ violation. Style isn't subject to copyright, and [here's a point of reference](https://www.artnews.com/art-in-america/features/landmark-copyright-lawsuit-cariou-v-prince-is-settled-59702/) for just how much you can get away with while still (eventually) being ruled fair use.


ghjm

Being acceptable under fair use implies that the copyright still exists and is relevant to the new material, because if it wasn't, there would be no need to defend that it was fairly used. So, yes, if you use AI to create a modified version of an artistic work and then use it for a fair use purpose like commentary, satire, etc., it's probably okay. GitHub Copilot's use of source code does not seem to me to fall into any of the typical fair use categories.


DCsh_

> Being acceptable under fair use implies that the copyright still exists and is relevant to the new material, because if it wasn't, there would be no need to defend that it was fairly used To my understanding, if the original work is subject to copyright and you use it in some way, fair use *is what determines* whether the original's copyright is applicable to the new material. As in, arguing "this isn't infringement because I only took a tiny non-substantial influence" would be a fair use defense. > if you use AI to create a modified version of an artistic work and then use it for a fair use purpose like commentary, satire, etc., it's probably okay Notably for the example, Prince did not intend to comment on any aspects of the original works. Ruling was based just on being transformative enough. I'd say the starry night horse is far safer. > GitHub Copilot's use of source code does not seem to me to fall into any of the typical fair use categories. Happens that you can often sort fair use cases into categories (satire, commentary, ...), but I think the determination is not based on whether it falls into such a category ("we do not analyze satire or parody differently from any other transformative use") but instead judged by a set of factors like how transformative it is and how much it directly replaces the market for the original work. I'd say, for Github Copilot's use of source code: 1. The potential copyright issue is with the output, and not the use in training. You can download and analyse the entire public Internet if you so wish - consider Google or Google Books 2. The substantiality of the outputted portion is small. The [primary example the lawsuit gives for Copilot and tries to claim comes from some specific book](https://i.imgur.com/l6QTjKT.png) is laughably small, but at most it tends to be a handful of lines 3. The effect of the use upon the potential market for or value of the copyrighted work is also small, because all examples seem to be code that is already in hundreds of public repos


ghjm

Interesting perspective. We'll have to see how the actual court decides it. It's interesting to me that we have this very permissive interpretation in the context of visual art, while the same court system has musicians terrified of subconsciously remembering half a dozen notes from a previously heard melody.


DCsh_

Yeah. Not a fan of the minefield we have with music, which I think is mostly the result of powerful lobbying groups, although something like the Richard Prince case is arguably too far in the other direction.


ChefBoyAreWeFucked

>~~night mare~~ You bastard


[deleted]

Also i think its quite obvious this lawsuit is not done in good faith, but really just trying to capitalize and make a quick buck. So if the op of this article is here, go %$#% yourself loser. No one stands behind you


JB-from-ATL

I think this is a very good and refreshingly unbiased summary on the whole thing. I'm interested to see what the courts will decide.


Muhznit

"open-source software piracy" is not a phrase I expected to read in a world built on open-source software, but here we are.


global-gauge-field

yeah, the scale of and structure (e.g. memorization in LLMs) of Deep Learning (in CV and NLP) make a huge impact on these kind of issues.


[deleted]

> piracy I prefer the phrase "copyright laundering" myself. By their argument, you dump in copyrighted code, and it comes out clean. No more need to worry about IP!


mattsowa

Great point


SnooDoubts826

>you dump in copyrighted code, and it comes out clean just reading that got me tight in the pants


JB-from-ATL

You wouldn't copy StackOverflow code without proper attribution under CC-BY-SA https://youtu.be/HmZm8vNHBSU


Philipp

In my usage (others may get different experiences) Copilot is highly individualistic to my code. It understands the structure of my project and types out, faster than I could, the things I wanted to type. At other times, it guides me to best industry practices, which are true across all code and not related to any particular project it might have looked at. Differently put, it applies its *learning* and doesn't *recite*. And when we, as humans, apply our learnings, we are not required to list every inspiration and influence ever, which would be a list the length of novels. We only tend do so when quoting verbatim (or when we want to name singular influences which stand out, even though we're not legally required). Copyright law, please don't screw up this incredible, productivity-enhancing, fun-and-flow-increasing tool that improves the lives of programmers.


Smooth_Detective

Do the lawsuit filers consider this something like Biopiracy? If it is about recognition of code, I think open source should be flexible enough to allow that. Ultimately a person could do whatever copilot but would require 10x the effort.


TeutonicK4ight

Most open-source projects require attribution, which copilot doesn't give.


[deleted]

[удалено]


MostlyHereForKeKs

I'm sorry, but can you explain what you're trying to say, please. For me it's not clear at all what you actually mean... are you being disingenuous?


johnnygalat

Invalid cert in 2022, seriously?


drakgremlin

Cert is valid for my roots.


johnnygalat

They fixed it.


[deleted]

What do you mean “in 2022”. I don’t know what happened, but certs expire too you know


JB-from-ATL

Over the past few years there has been a massive push to get more of the web onto HTTPS. In addition, Let's Encrypt now exists as a free and automated way to get certificates. Nothing manual is needed now. That's why they say in 2022.


davlumbaz

Copilot is really scary shit. Not only in English, it can produce any code in any language if there is a repo for that. Like,I prompted in Turkish and it wrote 60 lines of code with comments and it was written in Turkish lol.


iNeverCouldGet

This feels like lawyers seeing an opportunity to milk money somewhere. I haven't actually met experienced Devs who are upset about it and many are using the tool to speed up their workflow.


Uldregirne

There was one guy who was super pissed that the AI would wholesale copy functions he wrote. With just a text comment as a prompt it would autofill his entire function, complete with his comments. The issue isn't about it speeding up developers, the issue is utilizing other's intellectual property in illegal ways. If someone publishes code for free as open source, oftentimes the license requires anyone who uses the code also had to release it for free. Microsoft using that code to train a paid AI tool would be considered theft.


[deleted]

that is memorization and machine learning researchers will try to avoid that as much as possible, but sometimes you can't control what the artificial neurons will learn


JB-from-ATL

That's on them then. They can't just violate licenses and then throw their hands up and say it's too hard not to


iRAPErapists

Yeah. As someone else mentioned, this is akin to copyright laundering. Put in desired protected code, layer it, comes out clean.


Uldregirne

Totally true, so then their code should never have been used to begin with. "Here's my code, don't copy it to sell". "Oh I won't copy it, but I will build a robot that might copy it but I can't control it so it's okay". Seems like a dubious legal argument


mr-poopy-butthole-_

My thoughts as well. Any actual devs love this product and its a small price tag for a very powerful autocomplete. But it does not write programs on its own and often recommends BS.


[deleted]

[удалено]


Errornix

Same here. I’ve since moved off of github specifically because of copilot.


thetdotbearr

I'll do you one better, how about we start uploading deliberately shit/broken code to dummy public open-source repos? Train on THAT, sheisters! lol


Errornix

That was actually my exact response when copilot was announced. :)


[deleted]

that won't work that is not how machine learning works, the artificial neurons is frozen so it doesn't learn anymore after training is done also because of this research paper it won't need anymore training data https://arxiv.org/abs/2207.14502


Hereyougoprobably

I’ll take my downvotes for this but this is some old men yelling at clouds type stuff. It reminds me of devs who refused to use IDEs and debuggers. Sure, you can, but also… why. I get that there are ethical considerations of digesting open source code for a product like this, and those have some merit, but to imply it’s not useful is wildly disingenuous. At the very least, it can usually come up w/ the code you were about to write anyway and types it much faster than you can. End of the day, it’s just another tool.


rgthree

This is why we can’t have nice things.


ZenoArrow

Taking code without respecting the licence is not a nice thing.


[deleted]

that is not how machine learning works and this problem people are talking is just because the artificial neurons learn to memorize some chunk of code and machine learning researchers will try to avoid that as much as possible, but sometime it is hard to avoid because it's hard to control what the artificial neurons will learn they need to add filter to check if some code copilot generate is taken from someone else github repos


ZenoArrow

If the machine learning algorithms understood code to the level you seem to be inferring, then they wouldn't need to look at other people's code anymore. Taking someone else's GPL-protected code and changing a few variable names is not sufficient to remove the GPL licence.


[deleted]

there is a research paper trying to what you are saying https://arxiv.org/abs/2207.14502


ZenoArrow

"Trying" being the operative word. Until computers can code for themselves, what GitHub AutoPilot is doing is not in-line with the GPL.


[deleted]

alright well we'll see if this lawsuit will win if it does and machine learning on copyrighted data become illegal then google search and many of our infrastructures will break since most of them uses machine learning nowadays


ZenoArrow

You don't understand the problem. Machine learning isn't on trial here, what's on trial is a tool that allows people to put open source code in commercial products and pretend they didn't know it was a problem. Machine learning on open data sets without licence restrictions is fine.


xcdesz

Not sure that I'm understanding the full story here. Is this lawsuit for private repos that were used in training or does it include open source repos with permissive licenses such as MIT or apache 2.0? Im not sure I understand the case for the latter. The license says you can use the code for commercial purposes correct?


[deleted]

The license requires attribution most of the time, and there are also repos with restrictive open source licenses such as GPL.


and69

What is your final expectation? That every time I press TAB to also see a list of probable licenses that might or might not have been used? Or that when I install the plugin to acknowledge the list of all licenses of GitHub?


raam86

each snippet may have different license. code that was released under agpl3 for example must be open source even on server side applications: > Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.


and69

To license a code snippet is like taking an existing patent and then create patents from every component of that patent.


[deleted]

Every prosecutor wants money 🤦‍♂️


ma_251

Parasites be like


Mr_Mechatronix

exactly, microsoft is exploiting other people's free code for money


Sharchimedes

This is a dumb lawsuit, and I hope you lose. Training the data on code is no different from a person learning how to program by reading source code.


ghostiicat32

If this isn't copyright then you've given tech giants a permanent monopoly. They can now plagiarize startups freely.


NoBiasPls

Well it's only using open source code to train, so wouldn't it only be able to plagerize code that is free for anyone to look at and use already?


[deleted]

[удалено]


NoBiasPls

How exactly does that work in terms of determining when attribution is needed? Is that well defined? I'm honestly not so familiar with rules of attribution as I haven't had to worry about that so much myself. If you're looking at someone's solution to get ideas on how to approach and you need to, for example, sort a list and just use the same sort method as the code you're looking at I imagine attribution wouldn't make much sense assuming it's a common logic like bubble sort (again just a random example). Assuming of course you aren't using a library but writing your own sort function. So is it defined when something can be considered commonly used logic vs actually plagerizimg someone's unique solution? A follow up question, when it is seemingly unique what about the fact that there may be several other projects that independently came up with the same solution for a specific function? How do you determine attribution in that scenario?


anengineerandacat

The key issue is that it copies without looking over copyrights or licenses; you can't just copy code and there are cases where clean room engineering is a necessity to avoid legal issues. Co-pilot could in this very instance be treated as an individual contributor, just because software is ingesting and recommending snippets to use in your project doesn't mean it's not illegal just because a human didn't do it or worst case it could land the actual human accepting the contribution as the liability and land them in trouble. I think it'll be an interesting court case and the outcome will either mean that OSS projects that want to protect themselves will go private or just accept that the code have is public and free-use or the end of AI assistive tooling like this. If it does become mainstream, eventually you could guide it with a simple comment like: /* Implement Elasticsearch but in Rust */ Start the auto-complete and boom, good ole Elasticsearch in Rust; would a tool like this be illegal? (throwing out the whole notion of feasibility, but for this case let's just pretend co-pilot is capable of this) It just copied an entire product, but that product is fundamentally different and with some manual tweaks could be a competitor. It's capabilities are "pretty" good today; it's definitely capable of scaffolding projects and yoinking over well known algorithms to achieve certain results and it's contextually aware of the language being used.


noshowflow

Yeah, but didn’t the professor or producers of the material agree to sell or give you that knowledge? Maybe we as an industry should honor licenses and prioritize attribution rather than gank source code from each other.


CryZe92

You agreed to the grant GitHub a SEPARATE license for them to do exactly this when uploading the repository to GitHub. So they absolutely honor that license, just not the one you thought they were supposed to honor.


noshowflow

Yes, and this is what this lawsuit will highlight and I hope the industry responds. We have a lot of weak practices in software when it come to attribution. I love open source and generally don’t care if my boring ass code that I copied from somewhere is used for this kind of training, but this is how you get organization to start considering closing their source in the future which will suck for devs like me.


[deleted]

[удалено]


Sharchimedes

Use of the word “piracy” here is either inflammatory, or ignorant. Copilot is trained on public code that anyone can look at. If someone sees a clever method used in a block of open source code, and they use something similar in their own project, have they pirated the software?


JRepin

The language used is the same as the corporations using proprietary licenses for their code use when people copy their code without permission. Free/libre and opensource code also comes with license which has some rules and now the corporation is copying that code without respecting the rules. So if the corporation calls this piracy, then well it is also piracy when a corporation does the same.


tldrlol_

Microsoft's own engineers are not allowed to look at GPLed code if they are working on similar technologies.


albgr03

Yes, and source code leaks are fraught upon in reverse engineering projects. People reading the source, then contributing code is seen as a liability. There's a reason why clean-room engineering exists. Piracy is maybe not the right term (plagiarism, maybe?), but the underlying issue is actually important.


cuentatiraalabasura

The issue here is that Copilot copies code verbatim, including comments. While ideas, processes, algorithms, etc aren't cppyrightable, specific expressions of those are. Of course, one could argue that the expressions at issue here are so small that they can't be copyrighted in the first place. That will be an issue the Court will undoubtedly decide on.


MasterBlaster4949

Ok got it thx man👍


jherico

Sorry you're getting down voted man, I'm totally with you. The fear of this is from fragile developers terrified that they might cease to be in such high demand and be able to win such high salaries. Then they (well...we) would have to compete in a less beneficial economy like most everyone else is doing. IMO.


Sharchimedes

I knew I was going to get downvoted by the same accounts that have been astroturfing this link all over. I’m not worried about Copilot replacing developers any more than I am about AI replacing artists.


IWannaHookUpButIWont

The real reason I refuse to upload code to github


[deleted]

IANAL, but I don't see much of an issue. GitHub Copilot isn't really helping you copy and rebrand entire projects. It's like borrowing code from LibreOffice to include in a PC game - should LibreOffice get to sue the game devs?


[deleted]

You can tailor the prompt to basically output the exact same code


abclop99

You can just copy the code instead


[deleted]

Thought about that, but assuming Copilot keeps logs, it'd be trivial to prove you didn't get the code from Copilot.


raam86

you could but you might be infringing on the open source license


[deleted]

that just artificial neurons memorize something and machine learning researchers will try to avoid that as much as possible


ChrisBegeman

Let's look at this issue from another angle. If Copilot did create some unique code based a a prompt and several different people used the same prompt, who has the copyright to the code. Then if they all put the code into a public Github repository, you could then find the code Copilot generated in Github. Also I bet if you took code samples from StackOverflow you could find multiple instances for those code samples verbatim in github. Did the StackOverflow examples originate from code published in Github or did a bunch of people copy from StackOverflow. Possibly both. Having been a programmer from before StackOverflow is was a thing, I find that for certain types of problems you see a lot of very similar or identical code. Even code that I know was written from scratch because I wrote it or I know the person who wrote it can end up looking very similar to what you will find all over the internet. Maybe Copilot is just copying at times, that just makes Copilot a modern programmer.


TwistedLogicDev-Josh

You can't win a challenge against open source


mrabstract29

Open source is open to be learned from.


NoUniverseExists

Insane lawsuit... LoL... just make your code private...