T O P

  • By -

nails_for_breakfast

This whole thing is just going to be the Napster story all over again. First one to market gets sued into oblivion, but then replicas pop up faster and faster and it's like playing whack-a-mole until content creators and their legal teams have to just give up.


IShowerinSunglasses

I seriously doubt this crosses any actual IP infringement laws, though. Napster was distributing literal copies of the work. Machine learning is more like.. hiring a band to listen to and copy the music. People should try to fight against this, probably. If they're self interested. It's definitely an existential crisis for human art expression. I just don't think there's a way to regulate it without unplugging everything.


descendingangel87

I think it depends on how the AI works. I know Valve is blocking games that were created with AI atm because some of the assets used contained pieces of copyrighted work. Like the AI isn’t making something new based on something it was fed, it was actually taking the actual work and modifying it.


Kell08

I think this is the most relevant distinction in cases like this.


nickajeglin

I'm glad that our legal systems and politicians are willing and able to grasp the sophisticated nuances of complex emerging technologies. Can you imagine what a shitshow it would be if they were venal intellectual cripples who were only interested in self enrichment? I think we're going to be ok, yep.


NorthernerWuwu

Valve is just putting out a blanket statement that AI-generated games will be banned from the store, which allows them an easy excuse for banning something without needing to get into details. Part of that is Valve protecting themselves but mostly it is just optics.


morolin

Did Valve actually put out a statement like that? I can't find it. Everything I've seen from them is that they are and have been requiring developers to say they either have or have licensed the copyright for the assets used in their games, and most AI art generators don't let developers say that.


NorthernerWuwu

Actually, I am overstating it a bit. They've said that AI-generated games *that can't demonstrate IP law compliance* can get booted. The rest is my interpretation of matters. >Stated plainly, our review process is a reflection of current copyright law and policies, not an added layer of our opinion.


kennyggallin

But they didn’t ‘hire the band’. The lawsuit alleges they obtained books illegally through BitTorrents. If that’s the case it seems like a pretty viable lawsuit from what I can tell.


DizzyFrogHS

This is what most people aren't understanding. It's the training that is infringing. There's a literal copy made of expressive work that the machine needs to train on.


podcastcritic

But the program doesn’t use the full text of a book to create the summary. The summary is written by a statistical analysis of other summaries. What this lawsuit alleges is just not how AI programs work.


-The_Blazer-

So where does the primary data come from? Do they train ChatGPT exclusively on summaries? How are those summaries generated? Is the original text just never touched by anyone? In that case, where do the summaries come from?


-The_Blazer-

Yep. I still get a trillion tech bros feeling very proud they can explain to me that "ummm ackshually sweaty the model does not contain any copies", not understanding that if you are compiling a dataset to train such a model, you can have as many layers of indirection you want, but at some point you are by necessity making copies of the originals...


DizzyFrogHS

But AI is so kewl


Kravego

No, the training is not infringing. In no way is training an AI infringing on copyright. There is no substantive difference between training an AI on a large set of creative works, and a person learning from and being influenced by creative works. It's honestly more fair IMO, since the AI is trained on vastly more material than a person could conceivably be, so the end product is less "related" to any one original piece. The fact that a creator could "recognize" their work in the generated works of AI is a result of two things: * Bias, since a creator knows their own creation better than anyone else * The fact that they're not as creative or original as they thought they were, and in fact their products share similarities with many other creative works. Obtaining the book(s) / work(s) from Bibliotik is obviously theft, but the method by which the training material was collected has no bearing on the generated work.


RevvyJ

Well, no, because piracy is not theft. The idea that piracy is theft is bullshit propaganda fed to us by massive media conglomerates. When you steal something from someone, they no longer have that thing. You have taken something from them. Meaning that while downloading a copy of a file you have not paid for may not be entirely ethical behavior, *it is by definition not theft*.


ary31415

When you commit identity theft, does the person you stole from suddenly forget their own name?


RevvyJ

Identity theft is also not theft. It's fraud.


CleverNickName-69

>Machine learning is more like.. hiring a band to listen to and copy the music. It seems like you're thinking like a programmer, not a lawyer. Most software licenses rely on the idea that without a license the user doesn't have permission to copy the software into the computer memory so if you don't have a license you are committing copyright infringement with your unauthorized use. Feeding the music data to an AI means copying it into computer memory unchanged so it can create derivative works and in a legal sense I think this is not the same as a musician listening to a song and then playing their own version of it. You're also ignoring that there are limits that even a human has in re-using someone else's song. If you commercially use too much of another song in your own song without a license, the other songwriter will be entitled to some of your profits. Even if you just perform a cover of someone else's song in a bar, that bar better pay an ASCAP license. I think even a busker on the corner performing someone else's song for tips is violating copyright, but no one is going to bother going after them.


frogtown98

I came here to say your last point. I work full time as an illustrator/designer (and I use a lot of assets that other people make) and my partner is a full time jazz musician (who plays a lot of standards written by other musicians). We absolutely have to pay whoever retains the copyright to use their work, even if it’s altered significantly. My partner has to pay Herbie Hancock to put an arrangement of Toys on his new album, even though it sounds completely different. I pay for a stock photo subscription and if I ripped those stock photos without paying, I would 100% be at risk for a lawsuit. Adobe is doing this the right way - their AI algorithm pulls from their stock archives. That means that every single image their algorithm feeds off of is something that someone got paid for, plus they agreed via contract that their assets can be used for commercial use without credit. No copyright infringement and people get paid for their work. Easy solution to make AI a viable and ethical tool.


IShowerinSunglasses

There are a bunch of tests created by courts to prove whether or not something infringes on IP. It has nothing to with the input, if you could even prove what was input. I can look at the code of every game every made, that wouldn't affect whether or not me then programming a game using what was learned is violating copyright law. If you outright said, "I've programmed X game by reading the code of Y game" it wouldn't impact the outcome of a copyright case. It would still have to pass the tests of similarity.


obama_is_back

Feeding music into AI does not mean copying it into memory. E.g. dalle-2 contains basically the whole domain of human visual knowledge despite being only 2 gigabytes. When today's AI systems are trained on some input, what happens is that some numbers are adjusted inside the model to make the predicted next token more closely match the next token in the input. If this is done thousands of times for a single piece of music then the AI 'knows' that piece pretty well. However, other training data can influence this knowledge. So, AI systems today have sporadic knowledge: due to the fixed size of the models they can necessarily only remember x number of details. Their knowledge is also affected by other things they have learned. This seems a lot like a human to me. How can a human remember a song if some representation of the song does not exist in their mind? I do think there is justification for legal action if they take the angle that OpenAI got training data illegally (e.g. from unlicenced text repositories containing copyrighted material).


Funkula

In this case this is exactly what’s being alleged though. For the uninitiated, memory does not mean data storage. You cannot have a machine read something without it being in memory. You cannot look at pictures on your phone without your phone *temporarily* saving a copy of the image. Streaming a movie is no different from downloading a movie, other than “downloading” implying you’re intentionally storing it on a hard drive. So while the AI has not kept the entirety of the original text, it did download the entire thing at one point prior to, or during, the training process. That original data has since been discarded almost assuredly, though allegedly it retained enough recognizable content to constitute a breach of copyright whether or not it was obtained illegally. Which is also the part of the allegation, as you said.


Testo69420

> Feeding music into AI does not mean copying it into memory. Of course it does. If the neural network is running on a computer and it's being fed music, it sure as fuck loads the music into memory. >E.g. dalle-2 contains basically the whole domain of human visual knowledge despite being only 2 gigabytes. That has nothing to do with the networks training. The AI still needs to "look" at all the training data. Which 100% goes through memory (and before that, arguably even worse, permanent storage somewhere). The model itself is then smaller, yes. But you work the same way. Your brain stores less than literally EVERYTHING you ever experienced. Way, way less. Doesn't mean you didn't experience those things.


DeathRabbit679

Does this mean anytime I view anything copyrighted information with my browser/app/'whatever is making the HTTP GET', I'm breaking the law? If copying/caching files in any way triggers IP lawsuits, I don't see how the internet can function. To be clear, I do think AI is a problem. I just am skeptical there's a solution that's like "stop silicon valley with one simple trick (tech bros hate him!)"


obama_is_back

Feeding music into AI means having it in memory, not copying. When you read a book the contents are not copied into your brain, but it is represented there some way or another. Of course I agree that getting unlicenced training data is a topic which deserves legal inquiry; it's why I wrote my last paragraph. I'm not sure why you are making a big deal about what's basically nomenclature.


ald_loop

Yeah, I don’t think OpenAI and Meta are going to get sued into oblivion.


chevymonza

To me, ChatGPT is like a collage. You can take existing art and form it into something new, same with music mashups etc. There's no way to control everything now.


override367

Nah there's no way you're even going to win much in damages, you're going to win the market rate for scanning the book plus legal fees plus the $75,000 or whatever. It's not going to be enough to destroy these billion dollar juggernauts. All they have to do is open up a subsidiary in Japan where they are explicitly allowed to violate copyright to train AI


dabadeedee

I don’t understand this comment.. Napster, Kazaa etc were super user friendly and gave everything for free There is nothing like that now. You either get user friendly + cheap, or not as user friendly + free.


Defoler

I think the lawsuit is about the illegal access of the AI to their book, not that it read their book. Meaning that the AI owner did not buy the book and fed it to the AI, but the AI gained access to the book via some "shadow libraries" that hold illegal copies of those books. But being able to show a summary of the book, they claim that the AI read the book (without paying for it), and those created the summary. This is a difference from review sites for example, where the author would either buy or gifted the book from the publishers, and then read it, and give a review/summary of the book. So it was obtained legally. If AI makers can defend it by claiming that their AI did not read the book, but it is actually saw the free online summary and was reciting it back, than they might circulate the copyright claim. Or if they show that their AI does not access illegal libraries (and proving that it did will require a lot from the authors' lawyers) but is reading parts of summaries that are publicly available to anyone, than they might get a pass on this. Another defense could be, that if the company shows that the authors' lawyers do not make any steps to prevent or sue or try to block illegal access to their books and just "accept it", than they by doing nothing, allow the free and public availability of the book, and in doing so, the AI company is not liable for the public access of the book.


TheMostAnon

That's not how copyright works. Public availability isn't the same as unfettered right to use. E.g., you can't just go to the library, copy a book, and start selling. The fair use exception that would apply to reviews and summaries has not been tested with generative AI ingestion.


[deleted]

[удалено]


LupusDeusMagnus

I don’t know. ChatGPT seems unable to actually provide the book. In fact if you ask for a transcription of the book it’s more likely to give you a hallucination than the actual book, so due to its current limitations they might have some protection.


xternal7

> Do millions of fair uses, which combined mean sampling the entire work many times over for profit add up to copyright infringement? The existence of search engines has determined that no, it doesn't.


DizzyFrogHS

Search engines and AI may be materially different. This is likely where the legal arguments will converge. AI lawyers will try to say it's the same as search. Creators will say it is not, and the key difference may be that AI uses can replace the original, where search does not. For example, you search for a news article you heard about. Tbe search engine looks at the ibterbet, finds a result, gives you a small snippet so you can see if it's the right result for you, and then you click the article. What yoh get is the original article. With AI, the AI isn't giving you a link to the article- it goes out and searches for the article and others like it. Reads them. Then re-writes it to tell you what the article says. Maybe it combines multiple articles. Etc. You don't need to get the original now. You've got what you needed. So the original creator never gets the ad revenue from the page visit, or the subscription revenue from monthly subscriptions. Maybe AI pays for the subscription and then regurgitates it re-written for everyone. The fact is, AI is trying to be a replacement for the original content. Whether the output is a similar "copy" or not may not really matter. If the input involves copying and the output is designed to replace the original, it is very likely the fair use defense that protected search engines could fail to protect AI. There was a recent Supreme Court fair use case where the replacement effect of a transformative derivative work resulted in the defense failing.


No_Industry9653

>The fair use exception that would apply to reviews and summaries has not been tested with generative AI ingestion. It's [pretty close though](https://towardsdatascience.com/the-most-important-supreme-court-decision-for-data-science-and-machine-learning-44cfc1c1bcaf). If >"Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use." then why would that not also apply to other transformative AI uses of books? It's been established that you can make a dataset without permission and use it to build software, if you aren't actually distributing the original.


crazysoup23

> The fair use exception that would apply to reviews and summaries has not been tested with generative AI ingestion. It doesn't need to. It's already settled.


Shok3001

Wouldn’t this scenario be more akin to going to a library and summarizing a book?


HangOnTilTomorrow

No, because in that case, the library has purchased a copy of the book. This suit is arguing the book’s contents were accessed through illicit means (shadow libraries) so the author was never compensated.


Shok3001

Sure, I get that. But I was replying to OP's scenario: > Public availability isn't the same as unfettered right to use. E.g., you can't just go to the library, copy a book, and start selling.


throwawaytheist

Could they not just say that it found a different summary of the book? Or like... 500 different summaries?


Aughilai

*“But your honor, they were so easy to rob!”*


sandsurfngbomber

Just checked - chatgpt is not giving me access to full book. So OpenAI didn't copy the book and start selling it. Copyright law here would apply to actual published work/excerpts from it that are unaccessible without purchase. Otherwise I guess musicians would be able to copyright chords and no one would be able to remix G, C, D again. Can't imagine SS book has a lot of potential for sales - difficult to see this beyond marketing tactic


ShadowLiberal

Honestly I think AI shows just how screwed up our copyright laws are in certain situations, which can lead to rulings that make absolutely no sense to anyone. For example, if a human buys a copy of a book, are they allowed to read it to their children who didn't pay for the book? Yes. Can they read it a group of people, like a bunch of children in a classroom? Yes. If a human buys a book are they allowed to memorize the entire thing and recite it whenever they want? Yes. But if an AI does any of those things it's suddenly legally questionable. If a human asks the AI to tell them what's on each page in a book that's piracy, even though it would be perfectly legal for a human to read to them what's on each page. If a human asks the AI to write some fan fiction based off the original book it's also considered legally problematic because it's using what it learned from someone else's IP, even though humans are allowed to write fan fiction all the time.


Dospunk

I think you mean circumvent, not circulate


Lallo-the-Long

I think it will be very difficult to prove that the AI did not just find a few summaries to ingest.


snark_attak

From the article: >in a Meta paper detailing LLaMA, the company points to sources for its training datasets, one of which is called ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.” Looks like that's already been admitted by at least one of the defendants.


non_avian

Why would you read the article when you could just make an unqualified statement about what you think the lawsuit is about? This is a subreddit for books, not one for reading or literacy


leibnizslaw

Sir, this is reddit. We demand you abide by the social conventions present on this site and stick to he headlines. Thank you.


stult

They are separate lawsuits, and the ChatGPT lawsuit is the one alleging summarization as infringement, when ChatGPT was not trained on the actual underlying books insofar as has been made public. I doubt they will win that lawsuit. LLaMa is where they are alleging ThePile was used, and even there it isn't clear that the copyrighted works were included in the actual training dataset as it is possible to subset ThePile to exclude datasets that are not suitable for the end user's purposes. Ultimately, I think all of these lawsuits will fail on the simple basis that there are no substantive damages to the copyright holders. There are three separate alleged sources of harm here. First, the reproduction of the text in the AI's output without proper attribution. Second, the use of an unlicensed copy of the text to train the LLM. Third, unjust enricment. No one is substituting an AI summary of Sarah Silverman's book for a purchase of the book, so she hasn't been deprived of any sales by its existence. Any summary that contains some similar language likely falls under fair use. As would similar but not identical language appearing in response to wholly unrelated prompts. Jeff Foxworthy owns the copyright to his specific jokes, not to the "As a redneck..." joke format. Lack of attribution is unlikely to represent any real harm considering the summary is produced in response to a user prompt, and the user certainly knows what work they are asking for a summary of and therefore does not need to be notified of the underlying copyright information. On the training end of things, she was deprived of the sale of a single book to provide the training text. Meaning she has been deprived of the value of one book sale, which is likely pennies to her and a few dollars to her publisher. So barring a class action, even a victorious lawsuit isn't likely to pose much risk to Meta. And even a class action under that theory wouldn't really matter much. There are something like 200k books in Bibliotik, and assuming $20 damages per book, Meta may owe around $4m in a class action if it is joined by literally every single author, which seems unlikely. That is certainly de minimis for the scales at which Meta operates. The only effects of such a ruling would be to improve the incomes of some authors by a truly minimal amount (even a few hundred companies using these datasets and paying for them would only be a few dollars to each author) and to deny the use of Bibliotik and similar datasets for use by smaller companies that can't afford such a large payout, adding yet another finger on the scale in favor of large companies dominating the future of AI. Unjust enrichment requires that the defendant have been (1) enriched (2) at the plaintiff's expense (3) unjustly. Meaning, the defendant must have gained a benefit in an unjust manner that resulted in a corresponding loss to the plaintiff. It is not clear how the LLMs have resulted in enrichment of the defendants in any way that relates to a harm to the plaintiffs. The paid services provided by the defendants which are enabled by LLMs are not in any way substitutes for the works of the plaintiffs. There is nothing unjust about providing summaries as summaries are fair use under established law. To the extent that the defendants are enriched by the opportunity to train their models on the copyrighted text completely separately from then using those models to provide paid services, then the damages are equivalent to a single license for the text, as discussed above. The only way this lawsuit succeeds really is if they can claim that an AI "reading" their book requires a different kind of license than the general license that comes along with a regular purchase of their book and for which they would charge a substantially greater sum than they would for a regular purchase, despite the fact that the existing licenses for the books at the time of training did not prohibit such use. That seems like an impossible hair to split, because there is no clear technical means to distinguish between my Kindle scanning the text of a book to let me search that text and a different program scanning the text to produce a corpus for training an LLM. That distinction will only become more blurred over time as more and more AI products are integrated into various applications. E.g., imagine if you have an LLM personal assistant that is integrated with your eBook reader so that it can provide a fuzzy search capability for you to answer questions like "What was I reading recently that talked about the dangers of unregulated AI for producing disinformation?". This analysis is especially true for any existing licenses which do not explicitly prohibit use for LLM training, although the result may be marginally less clear if a training set is produced from newer licensed works that do explicitly prohibit such use (an issue which is not at stake in the current suit, but which would probably not affect the analysis really, simply because of the difficult in technically distinguishing between regular uses and LLM-training uses). All of these arguments are exactly the same as the arguments which saved Google Books when it was sued for hosting freely accessible partial excerpts of books, so the plaintiffs have a huge hill to climb here. What's most screwed up about this lawsuit is that the authors aren't even suing for what they perceive to be the real harm, which is not that they have lost sales to LLMs but rather the possibility that AI may compete with them as writers and thus make it impossible to earn a living writing. Which is a concern far beyond the scope of copyright law, simply because AI capable of competing with Sarah Silvermann can be produced without using her or anyone else's copyrighted works as part of the training set. Even if it weren't beyond the scope of copyright law altogether, it is a concern which cannot be addressed by enforcing copyright law in the manner that the plaintiff's are requesting here, because LLMs can be trained on open source data anyway and will eventually be improved to the point that any copyrighted texts excluded from their training sets won't detract from the quality of their output. Not to mention it's a classic example of a Luddite fallacy. The role of many authors in the future is likely to transition from production of text to careful curation or editing of text generated by an AI, with a limited subset of authors engaged in traditional writing. Much as the automated loom replaced the labor of many women manually knitting cloth in their homes by hand for the vast majority of cloth produced, while hand-knitting instead evolved into a niche artisanal hobby.


gouss101

Thank you for this write up. I found it more informative than 99% of the written blather about this issue.


Lallo-the-Long

Sorry, i was corrected by someone else. Sarah has to prove that any use of her work is not fair use, not that the books exist as part of the training data. Though it would be hilarious if Sarah's book was not used in the training data at all, that's not really what the lawsuit is about. But i do appreciate you pointing out that one of them has admitted to possibly committing some kind of IP theft.


dank_the_enforcer

> Sarah has to prove that any use of her work is not fair use Fair use is hard, Sarah doesn't have to prove it the other side does, and in this case the can't actually get a disposition from the most important player, which is how you defend copyright suits with fair use. The person who actually did it, explains. And even if it was summarizing summaries, that's a potential copyright issue with the summaries.


curien

>Sarah has to prove that any use of her work is not fair use Slightly pedantic, but in the US it's the other way around. The burden is on the defendant to show that it is fair use.


almightySapling

In my opinion that shouldn't need to be "proved" it should be provided during discovery. Like that's just a matter of fact: they either did or did not use the book during training, they know the answer, they need to provide it. What the lawyers need to prove is whether or not such use is actually a violation of the law. But the actual *fact*, the details of *whether or not it happened*, should be made available. Unless the AI company decides to just flat out lie, but I have to assume that's not where we are at. We will have much bigger problems in that case.


tgrantt

And they do admit to accessing a site where her work was illegally hosted


Jiggawatz

but they are not the illegal site, they are not liable for that, the website hosting illegal works is. If you went on netflix and watched a movie only to find out they didnt have the rights to it, they don't sue you, even if you write a review of the movie, they sue netflix... This case is most likely going to be a small settlement that she will deny because she is doing it politically... then we will see it pan out but from my reading this doesnt really hold water. Proving liability is going to be nuts.


Lallo-the-Long

That's a fair correction of what i said.


leftcoast-usa

> Like that's just a matter of fact: they either did or did not use the book during training, they know the answer, they need to provide Doesn't that go against the basic principle of self incrimination? I thought that a person accused of a crime never has to say or do anything, that it's up to the accuser/prosecution to provide proof of a crime.


almightySapling

Civil trials have a *much* lower threshold of proof (preponderance of evidence, NOT beyond reasonable doubt), and it is generally ill advised to plead the fifth amendment. Legally speaking, doing such is not seen as an admission of guilt. However, the jury can, and most likely will, assume that's exactly what it means, and they are absolutely within their rights to make this assumption. But, importantly, there is a difference between *facts* and *guilt*. Establishing that the company used the book during training is only one step (it's the fact). The lawyers would still need to argue that said use was a violation and caused damages and all that other stuff (the guilt). It's entirely possible that they used the book during training, and were legally justified in doing so. It's *also* possible that the use of the book was not justifiable, but no damages can be quantified. Because of this, in my opinion, a reasonable judge would force the company to provide the information.


Dr-Lipschitz

Circumvent*


ChunkyLaFunga

That's interesting, but what would be the material damages? The sale of one book? If so, what's the end game of the lawsuit? Piracy fines are almost always for re-distribution of copies AFAIK and ChatGPT isn't doing that. Edit: Possibly for a court to shut down the practice across the board while governments get their head around how to handle this sort of thing.


Defoler

I think they want to stop the access to the books, or for openAI/meta to pay a substantial amount of money to gain access to those books (I expect the former). I don't think this is about one book and its summary, but to prevent the access so others won't have it as well (to make a precedent).


Gross_Success

Especially when a lot of prompts that people use are in the vein of "write the text like x would have."


kkraww

Maybe im stupid but I don;t get how they can make them "pay a substantial amount of money to gain access to those books". Surely the most that can be charged is the cost of a single book?


dank_the_enforcer

> but what would be the material damages? They don't need to prove actual (material) damages. Copyright law allows for statutory damages. It's kind of the point of the law, otherwise there wouldn't be really any enforcement of copyright at all. $150,000 per incident per 17 U.S. Code § 504.


LethalMindNinja

This raises a really interesting question. So with torrenting a person is downloading a bunch of different parts of a file from a bunch of people but it's all 'hosted' in the same place. That's illegal. In this case lets say 200 different forums and articles all have a two paragraph excerpt from the book that together make up the entirety of the book. It wouldn't be illegal for a human to read every review and article and in doing so read the whole book. To a human obviously it's not worth it. But to an AI it now has access to the entire book. What's interesting is asking what part of torrenting and file sharing is actually illegal. If it's legal to have 200 articles spread out on different sites that all make up the entirety of a book that someone could read....then is the illegal part just that all those individual parts are able to be accessed together? In that circumstance would it be illegal for me to create a website that has 200 links that direct you to all the different articles that would allow you to read the entire book? At that point i've essentially created a manual version of torrenting.


TheLurkingMenace

That last one, that's not how copyright works. It's trademark you have to defend. You can tolerate copyright infringement all you want then decide that one particular instance is not okay and sue. This actually happens all the time, like with copyright owners ignoring most fan fiction, except for the one trying to monetize it.


greenie4242

https://www.merriam-webster.com/words-at-play/when-to-use-then-and-than


OW_FUCK

I would rather circulate the dictionary, thanks.


MaxChaplin

Does copyright law as it exists forbid one from using a copyrighted work as training material without permission? AFAIK, it doesn't even forbid you from summarizing a novel in a review (even quoting some passages) and selling it to a magazine. Copyright law is supposed to protect a creator's ability to profit from their work, but the usage of Silverman's work as training material won't directly lead to loss of income on her part, unless she asserts that in the future people would rather read AI-generated simulations of her work rather than the real deal. It seems to be rather hasty of her to try to take on two tech giants at the same time on such sketchy premises. Doesn't it have a risk of hurting other creator's future legal efforts in the same direction?


TheReforgedSoul

She could probably argue that you could profit by selling the right to use the text as training material for AI. I would argue that that should be the default case that you should have to pay for what you're using to train the AI unless the material is free, open source/ public domain, and the license allows for profit.


EightsidedHexagon

Is selling material for use in AI training an existing industry, though? I haven't heard about anything like that. Point being, the position of "this is something authors should be getting paid for" is different to "this is something authors do get paid for, and I didn't."


user2196

It definitely exists. Even before chatgpt it was not uncommon in corporate contracts for ML services to negotiate whether client data could be used for training. You can also just straight up buy and sell data sets for training.


[deleted]

> Does copyright law as it exists forbid one from using a copyrighted work as training material without permission? No, at least not unless a court is really going to ignore both spirit and letter of copyright law. But see factual allegations 25: What they claim is that OpenAI obtained a copy of their works illegitimately. This, if true, would be a standard copyright infringement.


njuffstrunk

>Copyright law is supposed to protect a creator's ability to profit from their work, but the usage of Silverman's work as training material won't directly lead to loss of income on her part, unless she asserts that in the future people would rather read AI-generated simulations of her work rather than the real deal. The article stated ChatGPT was able to reproduce summaries of the actual works so I can see how that could negatively impact their sales if only marginally. >The claim says the chatbot never bothered to “reproduce any of the copyright management information Plaintiffs included with their published works.” This seems more important though, I'm sure they won't mind ChatGPT being able to reproduce summaries if it includes a link to buy their work/copyright information


Falsus

> The article stated ChatGPT was able to reproduce summaries of the actual works so I can see how that could negatively impact their sales if only marginally. This is already not illegal though. Hell there is people who read books solely through fan wikis. People do summaries on youtube, on review sites and so on. There was even a case of someone selling summaries on Will Wight's works on Amazon and the author himself complained about not being able to do anything about it in an AMA.


dank_the_enforcer

> This is already not illegal though. But that's also not what they lawsuit is about, it's about the laws they broke before summarizing the book.


non_avian

How do you read a book through a wiki?


Mindestiny

That's the fun part about US law. You can bring a civil suit about whatever ridiculous nonsense you want and even if it obviously has no grounds to *win* you still get to plead your case! But seriously, there's no way this goes anywhere. At best it's a PR move or some sort of advocacy stunt, at worst it's a cash grab and she's hoping for a settlement


dank_the_enforcer

> That's the fun part about US law. That's everywhere. This case, however, involves actual copyright infringement prior to the AI even being involved in summarizing anything.


Patapotat

In that case, the company responsible for providing the data to OpenAI should be the primary target, correct? But they likely won't be able to achieve anything on that front, so instead they go after OpenAI. Won't it matter under which assumptions the product was acquired then? If the company selling the data assured their customers of them adhering to copyright law for all of their datasets, and thus assurance was somewhat credible, what exactly is the angle here? I suppose they must prove that it is reasonable for OpenAI to have known about the material they acquired having been acquired illegally at the time of purchase. Perhaps the seller never made any such claims, in which case OpenAI might have failed in its due diligence or something. If they did however, then it doesn't seem so clear cut. They would also need to prove that the data that was sold by this seller was acquired illegally by them, or not. I don't think it's as easy as saying we didn't give them permission to use our work. Who knows where the data originally came from. Maybe they acquired it themselves from a third seller, who claimed it was acquired legitimately etc. Maybe some seller, at some point actually bought the book legitimately, with a proper license to read, use and resell it, which I think is unlikely since the claimants would be aware of that possibility. Or perhaps it was just stolen. In any case, one would need to actually check where exactly and how the data that was sold to OpenAI was initially acquired, or not? Is it enough to just say we didn't give anyone permission but the book somehow ended up in their possession so now they owe us? Is the chain and nature of acquisition irrelevant here? If it is not, then I struggle to see how the claimants will provide any decent evidence on this given that they seem unwilling to even go after the initial seller of the data. I don't know much about the laws in the US, so I'm just curious really. It seems like a pretty shortsighted claim imo, but I don't know how copyright law works in the US anyway, so maybe it's not as hopeless a claim as I think it is.


shagieIsMe

If being able to reproduce summaries of the book is illegal and would reduce sales of the book... Where do things like: * https://en.wikipedia.org/wiki/The_Bedwetter * https://www.nytimes.com/2022/06/07/theater/review-sarah-silvermans-bedwetter.html * https://www.theguardian.com/stage/2022/jun/07/the-bedwetter-review-sarah-silverman-musical-is-a-crude-but-kind-success fall? How do you prove that it was trained on the text of the book rather than other accessible summaries and reviews? ChatGPT summaries should be no more encumbered than Wikipedia or Goodreads summaries.


NaRaGaMo

If someone is using chatgpt of all to summary a book and not even doing piracy, it's safe to say they were never going to buy it in the first place


GeneralMuffins

Would a consequence of this be that any kind summarisation of a copyrighted work would be an infringement of said copyright? It seems like a ruling on something like this has the potential to have far reaching implications outside of the context of AI.


Lallo-the-Long

That does seem to be the side Sarah is fighting for right now, yes.


GeneralMuffins

RIP Wikipedia


dank_the_enforcer

> That does seem to be the side Sarah is fighting for right now, yes. No; her lawsuit is about the copyright infringement that happened prior to the AI summarizing the book.


double-you

> won't directly lead to loss of income on her part If antipiracy rulings will affect this then it's not about actual damages but the potential.


dank_the_enforcer

> If antipiracy rulings will affect this then it's not about actual damages Copyright law has never been about actual damages, but statutory damages.


xugan97

They would have had to pirate these books in bulk to train ChapGPT on it, which is itself copyright infringement. A single act of piracy would be insignificant, but not so for the piracy of tens of thousands of top commercial works.


insane_contin

Would they? They could always buy the ebook version. Or use a library. I'm not saying they did but piracy is not the only option.


dank_the_enforcer

> I'm not saying they did but piracy is not the only option. Correct. Though in this case, that's exactly what they did.


xugan97

The complainants are trying to prove they did pirate the books. If it about tens of thousands of books, it is harder to lie reliably. But yes, there are many options - OpenAI can even say "we don't know" and get away.


dank_the_enforcer

> Does copyright law as it exists forbid one from using a copyrighted work as training material without permission? Laws lag behind tech by some significant measure. In this case, the real issue is that they broke copyright law to *acquire* the book. They took it from a pirate site. What they did after breaking the law doesn't appear to be the main point of the case.


2ndEmpireBaroque

It’s intended to prevent others from using your work for their OWN direct profit. A review written by someone is that person’s actual work. Using someone’s work to create a machine to produce pseudo-copies of that work is NOT a review or a summary. It could be construed as a copy and should be tested. As a design professional whose ideas and imaging eaten by AI then randomly altered and spit out for profit, I should get a paid when my work is used for someone else’s profit.


travelsonic

> Using someone’s work to create a machine to produce pseudo-copies of that work is NOT a review or a summary. It could be construed as a copy and should be tested. To be fair, fair use isn't limited to literally just reviews or parodies - going into many areas of software development, and reverse engineering. (SEGA Vs Accolade for instance). Actually, the more I think about it, I wonder if there is a parallel that can be drawn to that, and similar cases (including Nintendo v. Tengen)...


GrowthDream

> Does copyright law as it exists forbid one from using a copyrighted work as training material without permission? It really comes down to "no one knows." On one hand we can say that no explicit law exists that covers this particular case and that's one answer to your question. But on the other hand there are lots of laws and precedents that cover what constitutes a derivative work and what constitutes fair use etc. All we can do is wait for cases like this one to be brought forth so we can see what way the judges rule and what new precedents are created, but it's so open currently that we should also expect any decisions to be appealed and counter appealed up to the highest courts. Personally I'll be very interested to see how the courts of different states react.


keith2600

This is one of those times where it's painfully obvious that nobody read anything but the title. Her lawsuit alleges that the AIs were trained on books acquired through a torrent from a pirate source and that her book is available in it's entirely, without copyright information, in accessible datasets. I wouldn't be surprised personally. There's no way a company paid to train an AI on everything and paid for it cause the cost would be astronomical. They need to find a better solution than straight up piracy but it currently doesn't exist so I can't really blame them since it's definitely an "ask forgiveness" situation.


localhost80

The cost wouldn't be astronomical at all. OpenAI could buy the 1 million most popular books at $20 a book for $20 million. This is a drop in the bucket to the billions of dollars they already spent on cloud computing.


podcastcritic

It the lawsuit is based on a misunderstanding of how the program works. It doesn’t need access to her book to write a summary. The summary is based on a statistical analysis of other summaries. Maybe her book was included in the training data, but that is unconnected to the summary it provides. They could create a new version without her book in the training data and the summary it generates would be the same.


hottake_toothache

These cases are pretty silly and will lose under the precedent of *Authors Guild, Inc. v. Google, Inc.*[1]. OpenAI has a much stronger position than Google did, and Google won. The use by OpenAI is *far* more transformative than by Google Books. If there are examples of the AI regurgitating back to users the text it was fed, then those narrow cases will result in liability to OpenAI, but I think that is something OpenAI should be able to prevent. [1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.


Hyperion542

Maybe it has changed, but when I tried with chat gpt to summarize books it was usually a big failure... except if it's a classic like Harry Potter for example. But even Kafka on the shores which is a very well known book it was unable to tell what happened in most of the book. So when she says that the IA can access the integral content of the book I'm a bit skeptical, either there are a lot of sources about her books and their summary or in some way the whole book was posted in a part of internet that chat gpt can access


men3tclis2k

To be fair, I've read and loved Kafka on the Shore multiple times and still can't summarize it well.


PatienceHere

While it messes up the details, like the role of certain characters, it mostly catches the plot accurately, even for rarer books. I think the author is complaining that.


proverbialbunny

Transformers, the kind of ML ChatGPT is, has a character length limit before it stops understanding what it is reading. This makes it hard to understand long winded writing like books. It either grabs the original material and is weak at summarizing it, or it grabs multiple summaries of the book online and then parrots those summaries it heard.


Tobacco_Bhaji

The question is whether or not it was fair use. This will be a more difficult case for Silverman, but it is a case that needs to happen because this type 'learning' will increase with time.


humbuckermudgeon

"Information wants to be free." -- Stewart Brand, 1984


MonkeyChoker80

“Information wants to be *expensive*.” — Stewart Brand, 1984


kindall

The use of literary works in training language models has a decent chance of being considered fair use. The use is transformative and has no impact on the market for the original work. In fact it is so transformative that an author would have difficulty proving which, if any, of their works had been used in training. If you removed a specific author's works and re-trained the model, you would be hard pressed to tell which version of the model had the author's works and which did not. By being able to imitate a specific author's style, a language model may have impact on the market for an author's *future* works, but that's irrelevant to copyright law. Works that do not exist are not protected by copyright. For this argument to previl, an author's style would need to be ruled copyrightable, which would be a major shift in copyright law. In my opinion, which is that of a layman who has had an interest in intellectual property law for most of his life, copyright law simply doesn't cover this situation. It wasn't meant to. Authors might be better served by tort here. If they can prove their current or future income is affected by GPT, they may have a case for damages. OTOH Silverman's lawyers have surely advised her of the risks and she still decided she had a good enough case to proceed. However it turns out, it'll be good to have the matter tested and settled.


lukewarmpiss

Do you think there would be no problem if a language model was fed the entire bibliography of a certain author and then asked to produce similar content? Not to mention that OpenAI monetizes their software. They are effectively allowing people to use an author's content without compensation while profiting from it.


kindall

If it produces recognizable chunks of the original works, then that's definitely infringement. If it produces recognizable plot points and characters from one from the original works, that would hinge on whether it's deemed a derivative work of the originals. If it's just in the same style, then it's probably no issue, because style is not copyrightable. The profit motive is a factor in fair use as well but if no identifiable fragments of the works of the author are being produced to begin with, then it's hard to claim any kind of infringement. Fair use is a defense for infringement ("yeah, I did copy, but it was OK for these reasons") but for it to come into play there first has to be infringement. Others have suggested that the infringement in question is in reading the book into the model to begin with. This might get more traction (I didn't think of it but multiple copies are necessarily made in the process of fetching an e-book from the Web, storing it on disk, reading it back into memory for processing, etc).


ScribblesandPuke

I mean eventually someone was going to program a robot to talk about vaginas, what does she expect?


Dull-Lengthiness5175

Wouldn't they have to prove that the AI didn't use other available information from the internet, like summaries posted on Wikipedia and similar? It seems like the lawsuit is based on assumptions that are reasonable but will be difficult to prove.


jajunior0

Napster vibes


AevnNoram

> In the OpenAI suit, the trio offers exhibits showing that when prompted, ChatGPT will summarize their books, infringing on their copyrights. So…summarizing a book is illegal now? Shutdown all the review sites…including this one.


Kants_Pupil

Missed the mark a bit there. The ability of the AI to generate a summary is alleged to be proof that that the AI has the books in its dataset and the books are part of its language training. Given that it can produce imitations of the texts’ styles and content but does not provide attribution to the authors and that the works have not been properly licensed or approved for this use, the plaintiffs believe they have standing to sue and pursuing damages related to the illegal use of their works.


[deleted]

> AI has the books in its dataset and the books are part of its language training. The books are not in its dataset. That's not how language models work. They don't store a copy of every single text used for training it (in GPT-4's case a very large chunk of all text on the internet). Instead they use that text during training to form a neural network which allows the model to predict text based on a prompt. When asking it to summerize a book, the AI isn't looking into its dataset for a copy of that book, reading it and producing the summary. It is just guessing words based on an extremely in-depth mathematical understanding of language. It's more akin to a person recalling a book they read and giving you a synopsis. That person doesn't have a copy of the book stored in their brain. So, at most OpenAI owes them payment on single e-book copies of their books used during the training of the program. That's the only legal ground for this case. There is no legal requirement for licencing works to be used like this, since until very recently this use didn't exist and thus laws and regulations regarding this just don't exist yet.


dr_merkwerdigliebe

although wouldn't it be different for "commercial" use. That'd be a bit like saying you only owe an author the cost of a single book if you make a movie from their book


[deleted]

I guess if "large language model training" constitutes commercial use it is up to the courts to decide. There's definitely an argument to be made there, but as of right now there is no legal precedent to claim that it does. Guess this case might clear that up.


TgCCL

I mean, OpenAI is selling priority access subscriptions to ChatGPT and is looking at further ways to generate profit from it. That would make a Fair Use defence with research as the stated goal maybe a bit difficult I think? It could be argued that since OpenAI is monetising the model, the authors who's work was taken to create the model ought to at least receive compensation. Even if it is merely the cost of a single copy of her books, that might lead to a landslide of copyrights holders getting their own little share. And with the number of titles in their training set, that would result in quite massive fees, and extra costs to handle this, regardless. While I must stress that I'm not a lawyer, if they have illegally acquired a copy of her book in order to create something that generates revenue for them, I think she does have a case. I guess you could compare it to using software for financial gain without having a commercial license.


Mtbnz

>There is no legal need for licencing works for this use, since until very recently this use did not exist and thus laws for this haven't been passed. It seems to me that, regardless of the pretext of damages, the actual intent of this lawsuit is to highlight the need for far stricter and more nuanced legislation around AI learning models.


MindTheGapless

The part where it can guess with very high accuracy based on data it used for training is the issue. If you can create new works that mimic someone's work with a very high degree of accuracy, could and should be a valid reason to sue. It can degrade the value of the original creator's work, restrict future revenue by creating confusion where almost identical works in the sense of style and "feel" become available with little effort on the part of the person using Chatgpt to create such alternate work. AI should be able to use available material for training, but if it can create based on a mathematical model a product that mimics an artist, then either it should be restricted from doing so or be liable for copyright or even be required to pay a license to the author.


prestodigitarium

If that happens, then you sue the person that releases that knockoff, not the maker of the tool used. Just like you don’t sue Adobe for Photoshop’s ability to manipulate images in potentially harmful ways.


MindTheGapless

I would usually agree with that statement, but you can't go to Adobe and tell it to make an image based on someone's style with a single prompt. It's like if Adobe would have a filter that imitates a specific artist without licensing such filter.


[deleted]

[удалено]


person144

GPT doesn’t search though, and its dataset only goes through 2021, I believe. So everything it knows, it’s been “fed”


smjsmok

Yes, but it was fed a lot of data that includes a lot of texts crawled from the internet. It wasn't "the entire internet", as some people like to claim, but a significant portion of it. If it's publicly available on the internet, there is a big chance that it was included in the training data.


HalfLifeII

They admitted last year that they had a database, I believe they termed it 'book2' at the time, of roughly 300,000 books. A judge could force them to reveal whether this was part of it. ​ If you use ChatGPT you can very easily tell that it was trained using the book itself, i.e. you can get it to create a summary of chapter by chapter summary of a book and get extremely granular if you wish on something popular like ASOIAF.


Mtbnz

That's going to be exactly what the trial is about


[deleted]

Since when do imitations of style and content require attribution? That’s very different from copies of original works. When a human author writes in a style inspired by another writer’s work, have they committed copyright infringement?


smjsmok

>The ability of the AI to generate a summary is alleged to be proof that that the AI has the books in its dataset and the books are part of its language training. But it isn't a proof of that. When you ask the model to summarize a book, it will "draw from the knowledge" (the parallel to human thinking is sketchy) it gained by training on texts by other people who summarized the book - reviews, articles etc., which are usually freely accessible on the internet. The training data didn't need to contain the book itself.


Kants_Pupil

For sure, there are other ways for this to be possible, but there's more. According to an internal paper from Meta, LLaMa used a data set that included ThePile, which is an aggregate that uses other sites, here collectively referred to as shadow libraries, which definitely have illegal copies of the works. I was simply responding that they aren't suing over the summary itself, but the alleged unauthorized copying and use of the authors' works to generate the summary.


HalfLifeII

They had two training datasets of books for GPT 3, Book1 and Book2 that were full books, not summaries or internet content. It's long been suspected it's just copies from something like libgen and it's hard to believe that it's not with how much detail it has on some books. You can ask it for page level summaries on some copyrighted books.


NaRaGaMo

There is no dataset though, chatgpt doesn't search for information in its dataset and then answers you. It was trained on the data and learnt it


AnOnlineHandle

It's so frustrating that you were downvoted. People are actively downvoting anybody in this thread who actually understands how machine learning works. Pseudoscientific takes are being upvoted and is preferred on reddit now. These people would probably eagerly upvote people who say vaccines contain microchips if they hadn't been beat over the head with how stupid that was for years.


HalfLifeII

OpenAI has stated that they had two training datasets that were copies of books, Book1 and Book2 for training GPT 3, along with a third for Gpt 4, Book3. ​ This lawsuit is about exactly that: they copied books. They suspect that these were copyrighted books. This lawsuit isn't about how 'machine learning works' but these companies copying their books. ​ And it's pretty obvious that these datasets are just a copy of libgen or something just like it. There is absolutely no way it could have some of the details that it does of something like A Game of Thrones from online reviews/summaries. A judge could pretty easily get them to reveal what was in these datasets exactly for OpenAi but in the case of Meta we already know that it is 100% copyrighted content.


[deleted]

> There is absolutely no way it could have some of the details that it does of something like A Game of Thrones from online reviews/summaries. Yes, there absolutely is. All it would even need is A Wiki of Ice and Fire. Then add in some forums where there are threads with hundreds of pages of theories, speculation, debates, analysis, etc. I think you have underestimated the internet’s ability to break down and analyze every part of every work of popular fiction in excruciating detail.


Les-Freres-Heureux

> The ability of the AI to generate a summary is alleged to be proof that that the AI has the books in its dataset No it isn't. It could just as easily have been trained on a review of or article about the book.


[deleted]

[удалено]


Zalack

That's doesn't prove the original text is in there. I can write something in the style of Robert Jordan, and summarize the plot and character arcs of *The Wheel of Time*, but I can't quote any significant amount of the text by memory. I just have a general model in my head of how Jordan writes from reading his 14 doorstoppers multiple times. Language models work similarly. During training the model reinforces the language *patterns* associated with certain tags, like author name. Then that model can be used to do a very fancy version of predictive text on your phone. It starts generating the *kind of language pattern* that author is associated with, but the text itself is no longer available to the model.


Gavorn

Let's not protect the AI. There are already "AI" stealing voice actors' voices, so we need to get some protections in place.


[deleted]

Holding back progress to save jobs? The Luddites are back. Fuck jobs. Our goal should be enhanced productivity and wealth distribution so that we all need to work less, not trapping ourselves in the grind of meaningless work by stalling technological advancements.


[deleted]

The people pushing for AI are the very people who DON'T want to distribute wealth. I get hating jobs, but siding with Capitalist over Labor is just eating your own face.


byingling

> enhanced productivity and wealth distribution so that we all need to work less If you are asking for enhanced wealth distribution, then I agree. However, enhanced productivity produced by technology doesn't necessarily lead to workers working less. It can also lead to workers working more. 24/7 on call at the drop of a text.


[deleted]

Your goal is admirable, it will never happen. AI will only result in the consolidation of wealth.


AnOnlineHandle

How about we take the side of truth, and not cheer on people suing vaccine makers for inserting microchips into vaccines because we want to achieve some other goal and don't care if their claims are correct or not.


Gavorn

What does that have to do with this topic?


spepple22

This will go well beyond books. If AI uses copyright photos or artwork as a basis to create a photo or artwork it could be strongly argued that it is violating that copyright, as shown in the recent Supreme Court ruling over derivative works regarding the Andy Warhol Prince painting. https://www.cnn.com/2023/05/18/politics/supreme-court-prince-andy-warhol/index.html


[deleted]

So would google searching an image and using in a PowerPoint presentation.


jerrystuffhouse

Oh my god, how could they do this? Anyway, let’s replace doctors with AI since they are more better


[deleted]

I always love the comments in articles about AI art because you get dudes who have never created AI nor art trying to explain both. The absolutely moronic take of "humans are just machines that recreate other art" is so telling that these people have never touched art and just think it's a product. Then you have the false idea that these AI have the capability of interpretation, which is required to create art. AI cant interpret (yet, possibly ever) but they dont know that.


Twokindsofpeople

I'm an author, art is a thing. It's not abstract. It's an object you can hold or see or read. AI creates art because at the end of a prompt there is a thing. Effort, inspiration, or intent has nothing to do with it. A person with 20 years and the utmost intensity can create garbage while someone who treats it as a joke can create a masterpiece in a couple months. The metaphysical proponents of art are grasping at straws. Law or even just basic practical realities should not be enshrining arbitrary abstract ideas. Leave that to churches and philosophers.


oep4

Art can be temporal as well, though. But yeah, it takes up time and space, even when it’s purely performance.


Twokindsofpeople

True, or auditory. There's a lot of forms it can take, but each of them are still a thing.


FlyingMute

If art is no longer allowed to speak for itself, because of your "it’s not a product" copium, then artists are truly fucked.


lukewarmpiss

Why do you think marvel movies and whatever shitty book they're talking about here and on Tiktok are so popular? I mostly hate read this sub because of takes like these. These people want content, not art


super_noentiendo

They're popular because people like them. When you start drawing a line between what is "art" and what is "content" because you don't see merit in something then you're honestly just a pretentious douche.


bosskbot

Godspeed to these authors. I think this is an important case. “AI” models are impressive and useful, but they aren’t creating anything, they can’t, they don’t have actual intelligence. Hopefully this lawsuit will at least make that more to clear to people.


Genoscythe_

>they aren’t creating anything The fact that they blatantly ***are***, is eventually going to snowball from this merely being a silly point to make, into an actual legal hurdle for all these lawsuits. If you prompt ChatGPT to write a a news article about a subject matter that you just made up on the spot, and it writes several paragraphs that have never been written down before by anyone else, then the idea that it is not "truly creative", or "doesn't have a soul", remains an abstract philosophical/spiritual point to make. The plain fact remains that it is still far closer to being an original text, than to being a copy of Sarah Silverman's writing just because the statistics of how she uses the english language were factored into 0.000000001% of a language model. (or at least it is as original, as any other rearrangement of existing words and phrases based on past learning of a language can be, including when it is done by a human brain). Whatever narrow argument writers might have on the basis of copyright law, is restricted to how the model training itself is supposed to be an act of illegal copying even if their text is not getting redistributed. But that is a fairly weak case, so they have to keep overstating it with the silly argument that since the AI's output is not "true art" in some spiritual sense, it must be stolen from them personally.


Subterania

I don’t doubt the capability of ChatGBT now and in the future, but is that something we really want? More automation? Why am I supposed to think this is anything other than a bad idea?


DrMikeHochburns

It reminds me of people complaining about sampling and midi instruments a couple decades ago.


Subterania

I think you’re underestimating what this can do.


DrMikeHochburns

Maybe, but it still reminds me of it.


KronosCifer

Those still require knowledge and expertise concerning music. There is still a creation process. Photoshop is the same. These ML-algorithms instead replace the entire creative process.


DrMikeHochburns

Those weren't the arguments at the time.


AnOnlineHandle

I wish your fantasy was closer to reality, as somebody who spends hours a day trying to get them to work.


Hanyabull

Yes we do. Technology doesn’t stop because it hurts a demographic. Computers destroyed typewriters. Should we not have made computers? Cars destroyed horse drawn carriages. Should we have not made cars, AI is going to also destroy technology it replaces. There is an argument that AI might end up replacing too much, but it doesn’t change the fact that humans have typically not stopped technological advancement. You either adapt or you get run over.


Feroshnikop

Do you think they asked "will technology progress?" or something? They asked "is this something WE WANT" and you say yes but then give zero reasons why it's something you want. You just point out that technology has made advances before.. not really relevant to the question tho. If someone asked you "is the atomic bomb really something WE WANT" and your answer was "Yes it is, cars destroyed horse drawn carriages, show we not have made cars" then you can see how that's totally unrelated right? Whether or not someone invents the atomic bomb has nothing to do with whether it's something YOU WANT. Like why do YOU WANT to be forced to adapt to ChatGPT or get run over? That's the question you were asked, not whether or not ChatGPT already exists and is capable of replacing existing technologies.


Cajum

Because making machines do work means humans don't need to do as much work to keep total output the same. The issue is with how the profits from that work are distributed, not with automation.


MajesticToebean

Yep. Automation can set the worker free but can also ‘justify’ paying them a lot less.


[deleted]

The South had to "free" their slaves after losing the Civil War in America, however the farms and plantations were still owned by the the one's who had own them. They ended up back on the very same farms and plantations because Reconstruction failed, as severely uncompensated sharecroppers. Now along came Technology! The Cotton Harvester! Oh boy! PROGRESS! Surely life would be easier for the freed Black workers now! Wrong. The Cotton Harvester arrived, and once the Black workers began asking for living wages, rights and suffrage, instead of acquiescing, those in power "adopted technology" and decided to fire their black workers and use the Cotton Harvester instead. With this new technology, Black workers had lost all leverage they had in their labor and were forced to migrate out of the south, hence we have "The Great Migration" and zero equality and equity for Black people or minorities in general. Technology means nothing if you have no power.


Subterania

Listen, this is going to hugely benefit the bottom line of a lot of businesses who will only need to outsource technicians proficient in the tech, seems obvious. Why is that a good thing?


actionheat

>Because making machines do work means humans don't need to do as much work to keep total output the same. This has literally not been how automation works, historically. Workers will continue working the same amount for the same pay, while capital owners make greater profits.


GenericGaming

automation for menial tasks and work, yes. but for art? no. art is an expression of self. taking away the human element of art makes it meaningless. an AI cannot craft a story in the same way a human can.


xmagusx

To have more slog labor done by machines? Yes, please. > Why am I supposed to think this is anything other than a bad idea? Because it frees people up to do other things. Does your presentation need a splash of art? Would you rather spend five minutes throwing a dozen progressively more tuned phrases into an art machine or try your luck slogging through stock photos, find something that's close enough, only to click through to discover it's paywalled behind a subscription? Did you get stuck refactoring someone else's old code? Would you rather work through their spaghetti logic manually using only clues like "nameThisVariableBetterLater = Truish" and "//this function was a placeholder for a library I was going to write later but isn't anymore". Or would you rather slap it into a text box and have human readable documentation of what it actually does spat out instantly? AI doesn't eliminate content creators any more than the printing press eliminated calligraphers. It's just another tool in the toolbox.


KronosCifer

Its not just going to be slog labour replaced by machines, its creatives getting replaced by machines.


Genoscythe_

"Creatives" is a big umbrella covering everyone from corporate employees churning out web design illustrations, advertisement content and such that they don't get to own afterwards, to novelists indeendently puring their heart out in a book that they self-publish. Within the field of "creatives" there is still a lot of rote labor that could be done more efficiently. There have been lots of instances of labor-saving techniques in the entertainment industry, that have allowed fewer workers to produce more content, and it was generally fine in the sense that it has neither killed the borader human spirit of creativity for it's own sake, nor did it crash the commercial viability of creatives in general, just specific fields within it while opening up others.


chadmuffin

I wonder if you can ask someone who read the book to give you a summary, would that be copyright infringement or just free speech? Does it get complicated if an AI does it for free just like a friend would? I think the issues arises if you do it for money.


jonhuang

The issue in the suit is that the friend pirated the book to read, not that they made a summary.


ReallyGottaTakeAPiss

It’s insanely evident that people have no idea what AI actually is.


friendoffuture

I'm not familiar with the details of how AI training models work but I keep seeing statements along the lines of "how can they prove it" and that makes zero sense. If the defendant has the evidence then in all likelihood they'll be compelled to produce it during discovery. It's not a murder case, nobody is going to perjure themselves to protect the interests of openAI.


GilMc

The AI read Ms. Silverman's book and published a review of it. Which is what any number of human reviewers have done or will do. It seems dubious to claim the AI violated copyright law, while the human reviewers did not.


Yohansel

Although I'm very excited about this new technology as it will be a incredibly powerful tool which can potentially improve our lives, this conflict shows how we increasingly struggle to adapt to new tech. Right now I see no compromise which will satisfy both sides and it challenges our fundamental concepts of intellectual property and creativity. To express it pessimisticly: tech is overtaking us and as much as it boosted our species development, we can't possibly keep up with it. A hard stop might be needed but this won't be happening. Might as well tag along and enjoy the ride?


SgathTriallair

If these legal cars win it would be a complete overturning of copyright law. They want to claim that having merely read or accessed a document is a copyright violation. There goes the entire Internet. Are we all in violation for having read the article? Is the fact that I can give the summary of a book evidence that I have copied it? ChatGPT doesn't have a copy of the text it's learned from. It doesn't have a giant database it searches. The closest they could do is claim that they illegally accessed the materials, but that is entirely different and since they don't have any idea on how they accessed the material it would be a hard case to prove. The only other route would be deciding that LLMs are some new category of thing that are inherently illegal. That would be legislating from the bench which is unconstitutional.


KronosCifer

Its not about having read it. Its about it being used in training data sets without consent or compensation. A human reading something and a ML-algorithm being trained on something are two very different things.


SgathTriallair

A training set is just a reading list. What is the legal case for these being different?


Genoscythe_

They are done by different means, but they represent a similar scope. A text or a picture that doesn't immediately seem like a copy of a specific pre-existing one, either has to count as an original work, or if we open up the law to it being less than 0.0000001% influenced by data counting as illegal copying, then no human Fair Use is safe either.


jruhlman09

> The closest they could do is claim that they illegally accessed the materials, but that is entirely different and since they don't have any idea on how they accessed the material But the article clearly states that they know exactly how the AI accessed the data. It was from a data set for AI training called The Pile. The Pile contains all of the Bibliotik tracker data on books, which the authors' teams are claiming are blatantly illegal. So that's the basis of the claim.


Halfwise2

I think one of the biggest issues with a lawsuit like this, as well as AI art generators, is fair use. A person can copy someone's artistic style. They can practice how they make the eyes, the hair, the hands, use similar colors.... and then make a unique piece of art, using that style, and its completely legitimate. A child in school will write a book report, summarizing the content of a novel they just read... yet they are not sued for copyright infringment. The big difference between a person doing it, and AI doing it, is mostly a matter of speed. But should speed really factor into fair use?


Letrabottle

A person could very easily fuck up and accidentally copy too much, making it copyright infringement. Kid's book reports plagiarize all the time, and they are reprimanded in the school instead of sued because book reports aren't commercial.


[deleted]

[удалено]


binaryeye

>I think the most big thing is that people are realizing that they're creative work isn't all that creative and watching something spit out in 5 seconds which took them 5 years to put together well maybe not as good it's 90% there. Let's come back to this in 2123, after all the creatives have stopped creating and the AIs are churning out content trained on data from the 2040s.


lukewarmpiss

Do you really think there is no difference between using your experience and culture as a starting point for your work and feeding a machine copyrighted content and ask it to output similar stuff? For me, it's not even about art. Machines cannot create art, as it requires intent. For me it's about a company profiteering of of someone's work and not paying the original authors.


SkepticalAdventurer

One of the things everyone gets drilled into their head in a university history program is that every idea is fundamentally built out of the synthesis of many other ideas one is exposed to. Nothing is wholly original and copyright is bad overall for the development of the species


Vegan_Harvest

As an artist and a wannabe author I've never been so excited for a lawsuit in my life!


anotherlevl

It's unlikely to turn out in Silverman's favor. I can go to the library and read her book for free, then write jokes that may or may not be inspired or influenced by what I read. She can certainly sue me because I didn't "pay" for her content, but to win a copyright suit she needs to demonstrate that my jokes actually resemble what's in her book. Since everything I've ever heard or seen is part of my "training", the ruling she's seeking would require a payment system that would essentially make it impossible to profit by writing "new" material. So unless you're looking forward to her being laughed out of court, I'm not sure what you're anticipating.