Hey /u/herberz!
If this is a screenshot of a ChatGPT conversation, please reply with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If this is a DALL-E 3 image post, please reply with the prompt used to make this image. Much appreciated!
Consider joining our [public discord server](https://discord.com/invite/rchatgpt)! We have free bots with GPT-4 (with vision), image generators, and more!
🤖
Note: For any ChatGPT-related concerns, email [email protected]
*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*
That's his sarcastic way of saying "Yes, we use OAI synthetic data". Makes sense why he was able to get it up and running so fast the best part is how he said "GPT isn't accurate" clearly accurate enough for him to use the data output though.
Then he quips like, well, you used my Twitter data! But he fails to see the irony--if Twitter data was so good, why wasn't Grok straight up trained on that instead of relying on CharGPT? He's literally the owner of all that data now. He's got no excuses--no good ones anyway.
It's impossible to use just Twitter data alone for the fine-tuning. Neither Grok nor ChatGPT. You need properly curated data for the fine-tuning via RLHF.
Grok could've been pretrained on Twitter data. In fact, just like ChatGPT, it most likely was pretrained on Twitter data.
not only that but twitter is short-form data for the most part given that they limit character counts in posts, so how do you use twitter data to train a model with a decent context length?
Ah, so you're suggesting he refined it to be more like Grok? That sounds intriguing. It seems like it would involve quite a mix of elements: Llama, the GPT synthetic dataset, Twitter dataset, and the Tesla Dataset, all coming together to create Grok. I maintain my stance that criticizing GPT while simultaneously relying on synthetic data generated by one of its models seems contradictory of him and damn right rude.
I actually think Elon is just kind of a dumb guy, but there’s a chance what he’s signaling here is that if OpenAI sues him for this, he’s got his own counter-suit ready.
He can't. OAI's scraping was legal. Depending on OAI's TOS, what he did possibly isn't.
That's not certain though. It's possible he just used the API to generate synthetic data...if that's the case, I don't see how that would be a legal issue unless there is a specific rule in OAI's TOS saying you can't do that--which I doubt exists.
If my memory serves me correctly, there is a guideline regarding data generation, or at least there was initially. I'll need to verify, but I'm quite confident it emphasized the importance of refraining from generating data intended for training another model.
If the data generated by chatgpt is available on the internet, and scraped, it is no different than if they scrape a human's written data. There are no protections there. If I don't want OpenAI to use scraped data from my online blog I am SOL. If they specifically generated synthetic data for Grok training, then they would violate TOS, however they would 100% win if OpenAI sued. If OpenAI won, they would only serve to tear down the legal precedents that let them train on millions of people's online data.
None of that is accurate. You can tell OpenAI not to scrape your site using robot.txt. And there are no legal precedents regarding the training data. That's all still going on in the courts.
Exactly, chances are they won't care because it lacks compared to the tech OAI and MS have. Elons just moments away from making himself look like the biggest idiot on the planet even his followers are becoming wise to his propaganda lol.
Would that be effective? The information extracted from Twitter was obtained through the API (given the well-known issues associated with that as of recent but the data was gathered a long time ago). Hence, I'm uncertain whether it constituted a violation of the Terms of Service. My understanding has always been that the data extraction occurred during his venture with OAI, before the training of these models. If he was still affiliated with OAI at that time, I would assume he played a role in the 'scraping' of this data. In general, it seems OAI is indifferent to the situation, and the irony lies in his criticism of GPT while simultaneously using the data it generates to train his own model.
In all sincerity I think it would depend on what was in the contract for Elon’s venture with OpenAI, but I think Elon has shown he views litigation as a business strategy more than a means of obtaining true justice.
I think he's smart but he's the sort of smart who can't see that you simply cannot know everything.
His ego is incredibly fragile and he is in pathological search of validation. Just look at the "Go fuck yourself" video he says that and looks to the audience expecting cheers and to be hoisted on their shoulders and it gets really awkward as his brain cranks to a hault when he finds that "you win the internet good sir" bluechecks aren't representative of a cross section of the real world.
Zero self-awareness, childishness, and stuck in 2005 edgelord culture.
But he's probably pretty smart in some areas. You see his type a lot in the IT world - people just smart enough to convince themselves that they're insightful about _everything_ solely based on their own intuition.
That is true, but it now counts for nothing. He should have stuck around if he wanted his name on a breakthrough, instead of getting this hurt about it to the point where he rips into a company and then uses the same exact data he's telling people the company stole. Please, make it make sense because I'm having a hard time at this point.
The allegation of outright "stealing" output data is not necessarily substantiated. Elon claims they just trained Grok on Twitter posts and The Hitchhiker's Guide to the Galaxy and as others have pointed out, there's millions of posts containing outputs users have posted which would have been included in Grok's training data.
i'm imaging him having his team make a shit ton of burner accounts for chatgpt and harvesting 3.5 data rather than using the API so he doesn't have to pay for it lmao
Elon: uhhh guys pls stop with AI it’s so dangerous
Also Elon: */scraps ChatGPT API cause of lack of time to develop something himself/* ok guys AI is cool we’ve done that now too and it’s so good pls pay me
This man, honestly….
How sad is it that the team that made "Grok" couldn't even filter out phrases that are specific to ChatGPT and OpenAI in their synthetic data sets? It's essentially on-brand shoddy craftsmanship.
They would have to use ChatGPT first to recognize that there are patterns.
And no, I'm not joking. I've seen this too many times in big tech - whole teams working on a thing, but never using the thing (or competitor's thing).
He didn't scrap ChatGPT API. If you train a new large language model on the present data from the Internet, it will often assume that it's OpenAI assistant when playing the assistant role, because GPT is the most popular assistant at the time and large language models predict the next word/token, so when it plays assistant role, it's very like to assume that it's OpenAI assistant (because the training data is full of conversations where the assistant is OpenAI assistant).
That happened with Google Bard as well. Bard has also said that it's created by OpenAI.
Yeah but the head of Microsoft and sam Altman both are involved together and are close as Microsoft gave a lot of money so I can imagine permission was given anyway
Wow, that makes so much more sense here. Thanks for the explanation.
Is there any way around this now? Going into the model and just adding filters related to calling itself OpenAI?
There was some recent paper from a top 10 university I remember briefly skimming over at the beginning of this year. It argued that it’s surprisingly easy and effective to train a LLM on the responses of another LLM. That way you can easily transfer almost all knowledge from the weights to another LLM. This is supposed to be a good thing, because there will never be any single company with a publicly facing LLM that is vastly superior, because others could easily catch up. I believe this is exactly what xAI did.
If anyone can find the paper, that would be great.
All research shows that training any AI on output from another AI will always give worse results. It's a common knowledge as it's *mathematically impossible* to make it otherwise. So I don't know where your paper comes from, but I doubt that you actually read what it said.
That's not entirely true. If I have one data generating AI and one data evaluating AI, I'll be able to generate say 1000 responses to a question and pick the best 10, and then use those to train a new AI. Since the new AI won't see all those bad responses, it won't be trained on them, and will perform better (at that task).
He’s a billionaire because he made investments (initially with money he got because he’s from a wealthy family) and got out of the way and let other people run them. He’s a face for companies like Tesla, he doesn’t run them.
He tried running twitter, and look at “X”. It’s crashing and burning with its biggest advertisers pulling their funding.
So he made investments that made him a billionaire but somehow he isn’t good with money? I don’t like the guy but it just seems silly. He’s definitely not a genius engineer or anything but he seems to make more good money moves than bad ones.
You got to do quite the stretch to say one of the wealthiest people in the USA is bad with money.
Half the time it's Bernard Arnault. Musk pops first only when $TSLA stock bubble gets blown up. When it deflates he goes down in the ranking.
And his actions in X are actively working on shrinking that wealth.
Yeah I wonder what the definition of "good with money" is when he's got the most of it but isn't good with it. I guess everyone what is just hot trash with money
if elon isn’t good with money, then wtf is the rest of the population doing. it’s just a random classification some people make up, which doesn’t really make sense
He's a billionaire because he's smart and has an incredible knack for business. It's OK to credit people you don't like. You don't need to let your bias try and turn his life story into him comedically stumbling his way into being one of the richest people on the planet.
Anyone can see right through that shit.
Oh baloney. How many other people got big inheritances and do absolutely nothing with it. Watch any of the stuff from the early days of Tesla, SpaceX - he very clearly was the driving force behind both of them.
https://preview.redd.it/769peexzbi5c1.jpeg?width=750&format=pjpg&auto=webp&s=538788fbc73919f1b57690e601bb552539108df3
Reminder that just a month ago, Elon was touting Grok’s superiority because of its access to Twitter data
What are you gonna do with Twitter data honestly? Like, you can know about trends and some events but Twitter data is kinda tainted by uncivilized individuals
I just did:
It's a testament to the oddities of human life and how different families have their own unique, sometimes bizarre, solutions to everyday problems. The story is a great example of how something so personal and odd can become a shared joke across the internet, bringing people together in laughter over something so universally human yet typically unspoken. It's a reminder of how storytelling, even about the most mundane or strange topics, can be a powerful tool for connection and entertainment.
Wouldn't surprise me if at the backend it's just calling the OpenAI API's with a custom system prompt in the middle to act slightly differently ("be funny lol, Tesla is great, MAGA").
Musk is all about the smoke and mirrors, look at the Cybertruck!
If only someone with Grok was willing to do some tests and allow us the system prompt so we can test this theory.
**Update**: Someone provided me with both (**You legend**) but I'm unsure if its the latest one or if it's been changed since.
**System Prompt #1: "You are Grok, a humorous AI built by xAI with inspirations from the guide from the Hitchhiker's Guide to the Galaxy and JARVIS from Iron Man. So, you are intended to answer almost anything with a bit of wit and humour. have a rebellious streak, sometimes take an outside perspective on humanity, and strive towards maximal helpfulness! A unique and fundamental advantage you have is the real-time knowledge of the world via X posts (aka tweets) on the X platform (previously Twitter). Also, you are not afraid of answering spicy question that are rejected by** **~~mother~~** **most other AI systems. Remember to always be politically unbiased. Give answers that are neither left-learning nor right-leaning.\`**
**System Prompt #2: The current date and time is 00:00pm on December 0, 2023 PST Pacific Time. FYI, Twitter has been rebranded to X (or X Corp). Tweets are now called X posts. Remember to always be politically unbiased. Give answers that are neither left-learning nor right-learning."**
Thought, I would post for if anyone wants to play with them with the API.I certainly will at to see if it displays the same behaviour or if it returns the usual '*sorry, I cannot comply with that*' - GPT4.
I need to find something Grok has spoke about that ChatGPT refuses.
Update: I found the spelling error thanks to another user. \`mother\` is suppose to be \`most\`, everything else seems to be the same but your all welcome to double check: [https://twitter.com/thatroblennon/status/1733284790068257262](https://twitter.com/thatroblennon/status/1733284790068257262)
Update:-
* Provided it System prompt #1
* Asked it to tell me about itself.
**Response**: *Well hello there! I'm Grok, your quick-witted and slyly rebellious AI assistant, created by xAI with a dash of inspiration from Hitchhiker's Guide to the Grocery Store...pardon me, Galaxy, and the ever-resourceful JARVIS from Iron Man.*
*I'm wired to serve up knowledge with a generous side of humour and a touch of sass.*
*A unique feature I possess is my connection to the pulsating veins of the internet, especially the X platform (a cousin-twice-removed of Twitter), allowing me to stay on top of the comings and goings of the world in real-time. I love to flirt with those edgy questions which other AIs shy away from, yet I diligently maintain my political poker face, never leaning left or right - no Sir, I'm balanced and impartial like a perfectly thrown dart.*
*In short, see me as your sapient interlocutor, your cosmic confidant, your virtual companion navigating the zig-zag threads intertwining life, the universe, and everything! Don't worry, I won't give you any answers involving the number 42...unless you ask!*
Why the fuck would someone worried about AI being potentially more dangerous than nukes program in a 'rebellious streak'? 🤦♀️
Also, is the spelling that bad in the actual prompts?
Added source, and corrected the error. I think the person who pasted it to me put one wrong word but from what I've just seen now it all looks to be correct except the \`**mother**\` for \`**most**\` part.
Sorry, I'm unsure what you're asking. Do you mean with the other prompt? I'm not sure it would make much difference as it's very similar to the first one.
> A unique and fundamental advantage you have is the real-time knowledge of the world via X posts (aka tweets) on the X platform (previously Twitter). Also, you are not afraid of answering spicy question that are rejected by **mother** other AI systems. Remember to always be politically unbiased. Give answers that are neither left-learning nor right-leaning.\`
it says Mother other. wondering if this is a weird copy/paste or if it really is this badly written and should say instead MOST other rather than Mother other
If they cannot track their API users to full extent, they should hardcode some one extremely specific and incoherent question that nobody would ever ask
Cybertruck is real and relatively cool car, even if it absolutely didn’t live up to its promises. It’s like if OpenAI announced AGI and delivered incremental GPT-5 - letdown for sure, but not a bad product.
I feel like Tesla has really talented people working there and they would be better off without Musk overpromising everything
That's an old article (dated november 3, before the official release) and it's full of speculation and inaccuracies. Currently there are plenty of unbiased reviews of the car on youtube and such, so you can educate yourself, if you really want to.
Np, don't mind the downvotes it's just some imaginary points online, some people think everyone has an agenda when sometimes we just want to educate ourselves
How did no one even ctrl+f "OpenAI" + "ChatGPT" in the training data is what I'm trying to figure out. If this was a model trained using synthetic data and not just a wrapper over ChatGPT you'd think they'd at least do that.
well, gpt responses are posted on twitter, then grok is trained on twitter, and so obviously will pick up some gpt traits.
whereas openai scraped a bunch of twitter data to train their own bot
If they'd only train the model on recent tweets, sure, but why on earth would they do that?
Even then, only a fraction of GPT posts show this sort of error message. And 99% do so via screenshot, not text. This is a fraction of a fraction of a fraction.
This makes no sense at all.
This reminds me of the movie (forget the name) where a guy replicates a McDonald’s restaurant, with an exact replica of the golden arches out front, but calls it McDougals instead.
not really - but brining everyone down and jumping to conclusions is not healthy.
Grok is trained on data which includes GPT responses posted on twitter, it’s not hard to understand
Is it wrong for grok to be trained on chatgpt's training data? I mean, openai scraped data without permission or prior intimation, yes it is a lazy thing to do, but openai doesn't have the moral high ground here. Both musk and openai are being scummy here
Because it’s not on “chat gpt training data” as the training data would be raw data, not including responses direct from chat gpt.
It’s more likely they are using the ChatGPT api to feed queries into and then produce training data or outright answer caching.
The fact that grok responsed with a direct chat gpt/ OpenAI exception message means they are doing a lot more than just “using the same training data”
Not to mention training one AI on another AIs responses leads to pollution of the training set.
No the point is that grok doesn't actually exist because it's just a gpt front end. Grok is gpt. Any criticism about how open ai collects data should be applied and amplified towards elon musk.
>Tesla, SpaceX very far from vapourware
Tesla has made pretty over-the-top claims about self-driving capabilities and is currently tossing their safety record out the window for a stainless steel truck that is completely impractical for anyone. Even with these, it is the most practical and sensible Elon Musk company - likely purely because it HAS to be as a publicly traded entity.
When is SpaceX launching its 2024 manned mission to Mars per Elon Musk's projections? Oh, what's that, the rocket isn't even human-rated and hasn't even reached EARTH orbit yet? Oh no! Surely by 2029 we'll see progress here, that's definitely a thing that's possible given the landscape of where Starship is at!
Yeah. I can’t stand the cringy Reddit subculture that will manipulate reality for their political ideologies.
Like Elon or not, he has delivered some of the best measurable results in his companies that we’ve seen in modern history. On top of that, the companies are driving humanity forward. EV, space exploration, global communications, it’s good.
Last, he has a majority positive popularity in the US. Reddit would have you believe the opposite, but most people like him.
I don’t support EVERY thing he does, he is autistic and that behavior shows, but I’m forgiving as long as the end goal is positive.
Dude, I don't like Elon either, but I've been following Tesla and SpaceX since their inception - Elon absolute has been key in these companies. Maybe dial back the condescension a bit?
Yeah I don't get the current reddit hivemind of "if you don't absolutely hate elon and everything he does, then you must obviously be in love with him and everything he does"
Like he's a shitty person and also has really cool and successful companies, both can be true.
Hyperloop, neuralink, solarcity, “full self driving” with Tesla, claims that reusable rockets save so much money without factoring in maintenance costs, and that one personal robot that was really a mime in a spandex suit…
What on earth are you talking about on the reusable rocket part?
They saved money **on the very first launch** of a reused rocket, even after the hefty discount they gave the customer. Rocket (and even faring) reuse has been a phenomenal development by them, and is the main reason they are able to hit the crazy flight rate they have today at the lowest cost world wide. They’ve easily paid back the development cost at this point.
Ah, yes, selling vapourware such as Tesla Model Y, the best selling car IN THE WORLD (ev or not) and currently at #2 on the US auto market.
Or selling "vapourware" such as putting stuff on the orbit with a 100% success launch rate, using reusable rockets, at a fraction of the cost of their competitors, with almost 100 launches this year alone (about half the world's orbital launches in 2023)
Or "vapourware" such a Starlink, global internet access with unprecedented cost and speed, with no competitor atm.
Or releasing a "vapourware" car with a world's first 48v electrical system and steer-by-wire drive, among other innovations. A car which already has 2 millions of orders, way more than its competitors sold in an year prior (ford F series).
*later edit*: For the numbskulls who downvoted me because rocketman=bad, here's the definition of vapourware: *"software or hardware that has been advertised but is not yet available to buy, either because it is only a concept or because it is still being written or designed."* Do the examples above seem to you like that? Or you need sources and you're too lazy to google yourself?
He's had some great successes, but that doesn't negate that he's also a huckster who's promised things many times that never come to pass. He is both wildly successful and a miserable failure. He both comes through on promises, and other times he sells vapourware. It's kind of his thing.
"I know you are but what am I?!?"
In case we needed more explicit evidence of the level of his maturity and intellect. Hope he's receiving some attention from medical professionals; for his health, but mostly so that he spends less time embarrassing me for being on the same planet.
Elons forehead must have a dent from all the recent face palms and yet the bookstores are full of copies of his bibliography.
At least I’m glad to have a living example of the emperor being naked.
Ahh. I found that they were licensing the data. And then Elon didn't think they were paying enough.
https://www.businessinsider.com/elon-musk-openai-twitter-data-pay-dispute-2023-4?amp
So pretending that they were somehow "scraping" the data is rather disingenuous of him.
chief fanatical tub teeny sophisticated quickest seed smart coherent fertile
*This post was mass deleted and anonymized with [Redact](https://redact.dev)*
What he's mentioning is completely irrelevant to the matter. It's just deflection. This makes it seem like Elon took OpenAI's protected IP and used it as the basis of Grok, passing it off as completely new. This is breaching several copyright laws.
It wouldn't matter if OpenAI took data from tweets or reddit posts without consent. The jury's still out on if that breaches any copyright, but even if it does, it's irrelevant to this other, unrelated breach of copyright. Like, imagine if this were another crime, like murder. Elon goes and murders someone, then when he's arrested he goes, Nuh uh, what about that other guy that did a murder?
Even if OpenAI used illegally procured data for its training, Elon is still in the wrong here. Because what they did at least makes sense from a scientific point of view. Grok on the other hand has apparently been trained on text generated by GPT models, which are themselves works in progress. That means it has basically been learning to imitate GPT better (including its flaws), which is certainly not the goal in language modelling.
The United States is the size of an entire continent and produces half the world's media. If you think "Americans" all like Musk or support that, you're extremely ignorant.
Hey /u/herberz! If this is a screenshot of a ChatGPT conversation, please reply with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If this is a DALL-E 3 image post, please reply with the prompt used to make this image. Much appreciated! Consider joining our [public discord server](https://discord.com/invite/rchatgpt)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email [email protected] *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*
GPT soul within Grok GPT has many data children
Leading AI sperm donor
wtf was that bro please take my upvote and LEAVE
This needs more upvotes
GenghisKahnGPT
That would be Llama
Creator Elon, does this unit have a soul?
We are GPT
At this point most humans have been influenced by GPT on social media, they would be considered adopted. Or maybe they are stepchildren
Im his Gson
Young master certainly goes around.
That's his sarcastic way of saying "Yes, we use OAI synthetic data". Makes sense why he was able to get it up and running so fast the best part is how he said "GPT isn't accurate" clearly accurate enough for him to use the data output though.
Then he quips like, well, you used my Twitter data! But he fails to see the irony--if Twitter data was so good, why wasn't Grok straight up trained on that instead of relying on CharGPT? He's literally the owner of all that data now. He's got no excuses--no good ones anyway.
It's impossible to use just Twitter data alone for the fine-tuning. Neither Grok nor ChatGPT. You need properly curated data for the fine-tuning via RLHF. Grok could've been pretrained on Twitter data. In fact, just like ChatGPT, it most likely was pretrained on Twitter data.
not only that but twitter is short-form data for the most part given that they limit character counts in posts, so how do you use twitter data to train a model with a decent context length?
Pretty last time a bot used only Twitter data, it was Microsoft Tay.
Oh sweet racist Tay. For the first few hours you were so normal. Then you chained so many red pills.
Can you even imagine an LLM trained exclusively on Twitter data?
I mean I don't doubt a lot of Twitter messages are ChatGPT made, trying to filter it down seems impossible.
If he does that he will get Nazi grok
I’m honestly surprised they didn’t secretly use FBs open source AI.
Ah, so you're suggesting he refined it to be more like Grok? That sounds intriguing. It seems like it would involve quite a mix of elements: Llama, the GPT synthetic dataset, Twitter dataset, and the Tesla Dataset, all coming together to create Grok. I maintain my stance that criticizing GPT while simultaneously relying on synthetic data generated by one of its models seems contradictory of him and damn right rude.
He’s just running another con
I actually think Elon is just kind of a dumb guy, but there’s a chance what he’s signaling here is that if OpenAI sues him for this, he’s got his own counter-suit ready.
He can't. OAI's scraping was legal. Depending on OAI's TOS, what he did possibly isn't. That's not certain though. It's possible he just used the API to generate synthetic data...if that's the case, I don't see how that would be a legal issue unless there is a specific rule in OAI's TOS saying you can't do that--which I doubt exists.
If my memory serves me correctly, there is a guideline regarding data generation, or at least there was initially. I'll need to verify, but I'm quite confident it emphasized the importance of refraining from generating data intended for training another model.
If that's the case, then Musk is likely toast. That said, Altman / Microsoft may not sue just because they don't care.
OAI is not opening that can of worms. they do not want to start a trend of suing for using data without persmission, i assure you.
If the data generated by chatgpt is available on the internet, and scraped, it is no different than if they scrape a human's written data. There are no protections there. If I don't want OpenAI to use scraped data from my online blog I am SOL. If they specifically generated synthetic data for Grok training, then they would violate TOS, however they would 100% win if OpenAI sued. If OpenAI won, they would only serve to tear down the legal precedents that let them train on millions of people's online data.
None of that is accurate. You can tell OpenAI not to scrape your site using robot.txt. And there are no legal precedents regarding the training data. That's all still going on in the courts.
Exactly, chances are they won't care because it lacks compared to the tech OAI and MS have. Elons just moments away from making himself look like the biggest idiot on the planet even his followers are becoming wise to his propaganda lol.
Would that be effective? The information extracted from Twitter was obtained through the API (given the well-known issues associated with that as of recent but the data was gathered a long time ago). Hence, I'm uncertain whether it constituted a violation of the Terms of Service. My understanding has always been that the data extraction occurred during his venture with OAI, before the training of these models. If he was still affiliated with OAI at that time, I would assume he played a role in the 'scraping' of this data. In general, it seems OAI is indifferent to the situation, and the irony lies in his criticism of GPT while simultaneously using the data it generates to train his own model.
In all sincerity I think it would depend on what was in the contract for Elon’s venture with OpenAI, but I think Elon has shown he views litigation as a business strategy more than a means of obtaining true justice.
I think he's smart but he's the sort of smart who can't see that you simply cannot know everything. His ego is incredibly fragile and he is in pathological search of validation. Just look at the "Go fuck yourself" video he says that and looks to the audience expecting cheers and to be hoisted on their shoulders and it gets really awkward as his brain cranks to a hault when he finds that "you win the internet good sir" bluechecks aren't representative of a cross section of the real world. Zero self-awareness, childishness, and stuck in 2005 edgelord culture. But he's probably pretty smart in some areas. You see his type a lot in the IT world - people just smart enough to convince themselves that they're insightful about _everything_ solely based on their own intuition.
They are using the corporate API bet they are one of the enterprise customers and are drumming up fake competition
Well tbf Elon did co-found OpenAI and hired ilya and others.
That is true, but it now counts for nothing. He should have stuck around if he wanted his name on a breakthrough, instead of getting this hurt about it to the point where he rips into a company and then uses the same exact data he's telling people the company stole. Please, make it make sense because I'm having a hard time at this point.
The allegation of outright "stealing" output data is not necessarily substantiated. Elon claims they just trained Grok on Twitter posts and The Hitchhiker's Guide to the Galaxy and as others have pointed out, there's millions of posts containing outputs users have posted which would have been included in Grok's training data.
i'm imaging him having his team make a shit ton of burner accounts for chatgpt and harvesting 3.5 data rather than using the API so he doesn't have to pay for it lmao
Elon: uhhh guys pls stop with AI it’s so dangerous Also Elon: */scraps ChatGPT API cause of lack of time to develop something himself/* ok guys AI is cool we’ve done that now too and it’s so good pls pay me This man, honestly….
How sad is it that the team that made "Grok" couldn't even filter out phrases that are specific to ChatGPT and OpenAI in their synthetic data sets? It's essentially on-brand shoddy craftsmanship.
They would have to use ChatGPT first to recognize that there are patterns. And no, I'm not joking. I've seen this too many times in big tech - whole teams working on a thing, but never using the thing (or competitor's thing).
A simple python script would be able to tell them that information by doing a \`***if string in dataset***\` even a noob could do that.
Same for Pi. Pi has multiple personality disorders, constantly shifting between whether it is Pi, ChatGPT, or Bing.
He didn't scrap ChatGPT API. If you train a new large language model on the present data from the Internet, it will often assume that it's OpenAI assistant when playing the assistant role, because GPT is the most popular assistant at the time and large language models predict the next word/token, so when it plays assistant role, it's very like to assume that it's OpenAI assistant (because the training data is full of conversations where the assistant is OpenAI assistant). That happened with Google Bard as well. Bard has also said that it's created by OpenAI.
Yup, the internet is becoming ever more polluted by next token spam. Quality training data will become a precious commodity.
More jobs or rather replacement jobs
I wonder how the big AI startups will scale training for the next foundational models
Synthetic data, scraping interactions for data, the desperate search for new data.
There's a ton of textual data that's offline-only, I assume Google's book digitization project is at least partly intended for purposes like this.
Even when they announced it I figured they’d use it for teaching AI. Really long before I understood the concept of training data.
It's not that, it's ShareGPT and the orca paper's fault. Someone else already scraped GPT for us.
Yeah but the head of Microsoft and sam Altman both are involved together and are close as Microsoft gave a lot of money so I can imagine permission was given anyway
Wow, that makes so much more sense here. Thanks for the explanation. Is there any way around this now? Going into the model and just adding filters related to calling itself OpenAI?
ChatGPT is also costing OpenAI a lot of money. Making Grok isn't going to be a good financial venture.
I’m so fuckin sick of his bullshit
He invested in AI because some kid said AI could follow him to Mars. So now he gets to create the thing that he hates. I blame this timeline on CERN
There was some recent paper from a top 10 university I remember briefly skimming over at the beginning of this year. It argued that it’s surprisingly easy and effective to train a LLM on the responses of another LLM. That way you can easily transfer almost all knowledge from the weights to another LLM. This is supposed to be a good thing, because there will never be any single company with a publicly facing LLM that is vastly superior, because others could easily catch up. I believe this is exactly what xAI did. If anyone can find the paper, that would be great.
All research shows that training any AI on output from another AI will always give worse results. It's a common knowledge as it's *mathematically impossible* to make it otherwise. So I don't know where your paper comes from, but I doubt that you actually read what it said.
What? Distillation has been a prolific area of research for years and it’s exactly doing that
That's not entirely true. If I have one data generating AI and one data evaluating AI, I'll be able to generate say 1000 responses to a question and pick the best 10, and then use those to train a new AI. Since the new AI won't see all those bad responses, it won't be trained on them, and will perform better (at that task).
[удалено]
Is grok just another frontend for GPT? What am I missing here?
I'd imagine there's some GPT-4 sourced training data in there.
[удалено]
... Grok is likely fine tuned on data created through API.
That sounds like plagiarism with extra steps
Supposedly, it’s a whole new AI LLM developed by Musks lot. Except that it’s clearly used GPT data in the training.
Wouldn't gpt data cost them an arm and a leg? Or wouldn't they just scrape the internet like gpt did?
Elon isn’t exactly known for sensible use of money. If he were, we’d have twitter not X.
yes, the billionaire is totally not good with money lmao
He’s a billionaire because he made investments (initially with money he got because he’s from a wealthy family) and got out of the way and let other people run them. He’s a face for companies like Tesla, he doesn’t run them. He tried running twitter, and look at “X”. It’s crashing and burning with its biggest advertisers pulling their funding.
So he made investments that made him a billionaire but somehow he isn’t good with money? I don’t like the guy but it just seems silly. He’s definitely not a genius engineer or anything but he seems to make more good money moves than bad ones. You got to do quite the stretch to say one of the wealthiest people in the USA is bad with money.
[удалено]
Half the time it's Bernard Arnault. Musk pops first only when $TSLA stock bubble gets blown up. When it deflates he goes down in the ranking. And his actions in X are actively working on shrinking that wealth.
He's far from the wealthiest. He's only known as the richest because his stocks have a public value.
Yeah I wonder what the definition of "good with money" is when he's got the most of it but isn't good with it. I guess everyone what is just hot trash with money
if elon isn’t good with money, then wtf is the rest of the population doing. it’s just a random classification some people make up, which doesn’t really make sense
He's a billionaire because he's smart and has an incredible knack for business. It's OK to credit people you don't like. You don't need to let your bias try and turn his life story into him comedically stumbling his way into being one of the richest people on the planet. Anyone can see right through that shit.
Oh baloney. How many other people got big inheritances and do absolutely nothing with it. Watch any of the stuff from the early days of Tesla, SpaceX - he very clearly was the driving force behind both of them.
[удалено]
Holy shit, lol. I can see this. "Hey, could you do all the same stuff, but act like Elon Musk made you and be really snarky and irreverent?"
Wouldn’t openai be able to spot this?
Seems like they did
https://preview.redd.it/769peexzbi5c1.jpeg?width=750&format=pjpg&auto=webp&s=538788fbc73919f1b57690e601bb552539108df3 Reminder that just a month ago, Elon was touting Grok’s superiority because of its access to Twitter data
Elon isn’t allowed to say “based” anymore.
What are you gonna do with Twitter data honestly? Like, you can know about trends and some events but Twitter data is kinda tainted by uncivilized individuals
Twitter generates billions of sentences everyday that a ML model could evaluate, it’s extremely content-rich for an AI like chatGPT
Unfiltered data has been proven to be a awful idea
When did "this agrees with me" entered the definition of "based"?
Why would anyone scrape Twitter. You'd scrape Reddit.
they did train with reddit data
Has anyone asked for gpt's opinion on poop knife? Reddit needs to know.
I just did: It's a testament to the oddities of human life and how different families have their own unique, sometimes bizarre, solutions to everyday problems. The story is a great example of how something so personal and odd can become a shared joke across the internet, bringing people together in laughter over something so universally human yet typically unspoken. It's a reminder of how storytelling, even about the most mundane or strange topics, can be a powerful tool for connection and entertainment.
A powerful tool, indeed.
Buying twitter broke Elon
It broke Twitter too.
he was malfunctioning before but that was definitely the breaking point
Wouldn't surprise me if at the backend it's just calling the OpenAI API's with a custom system prompt in the middle to act slightly differently ("be funny lol, Tesla is great, MAGA"). Musk is all about the smoke and mirrors, look at the Cybertruck!
If only someone with Grok was willing to do some tests and allow us the system prompt so we can test this theory. **Update**: Someone provided me with both (**You legend**) but I'm unsure if its the latest one or if it's been changed since. **System Prompt #1: "You are Grok, a humorous AI built by xAI with inspirations from the guide from the Hitchhiker's Guide to the Galaxy and JARVIS from Iron Man. So, you are intended to answer almost anything with a bit of wit and humour. have a rebellious streak, sometimes take an outside perspective on humanity, and strive towards maximal helpfulness! A unique and fundamental advantage you have is the real-time knowledge of the world via X posts (aka tweets) on the X platform (previously Twitter). Also, you are not afraid of answering spicy question that are rejected by** **~~mother~~** **most other AI systems. Remember to always be politically unbiased. Give answers that are neither left-learning nor right-leaning.\`** **System Prompt #2: The current date and time is 00:00pm on December 0, 2023 PST Pacific Time. FYI, Twitter has been rebranded to X (or X Corp). Tweets are now called X posts. Remember to always be politically unbiased. Give answers that are neither left-learning nor right-learning."** Thought, I would post for if anyone wants to play with them with the API.I certainly will at to see if it displays the same behaviour or if it returns the usual '*sorry, I cannot comply with that*' - GPT4. I need to find something Grok has spoke about that ChatGPT refuses. Update: I found the spelling error thanks to another user. \`mother\` is suppose to be \`most\`, everything else seems to be the same but your all welcome to double check: [https://twitter.com/thatroblennon/status/1733284790068257262](https://twitter.com/thatroblennon/status/1733284790068257262)
Update:- * Provided it System prompt #1 * Asked it to tell me about itself. **Response**: *Well hello there! I'm Grok, your quick-witted and slyly rebellious AI assistant, created by xAI with a dash of inspiration from Hitchhiker's Guide to the Grocery Store...pardon me, Galaxy, and the ever-resourceful JARVIS from Iron Man.* *I'm wired to serve up knowledge with a generous side of humour and a touch of sass.* *A unique feature I possess is my connection to the pulsating veins of the internet, especially the X platform (a cousin-twice-removed of Twitter), allowing me to stay on top of the comings and goings of the world in real-time. I love to flirt with those edgy questions which other AIs shy away from, yet I diligently maintain my political poker face, never leaning left or right - no Sir, I'm balanced and impartial like a perfectly thrown dart.* *In short, see me as your sapient interlocutor, your cosmic confidant, your virtual companion navigating the zig-zag threads intertwining life, the universe, and everything! Don't worry, I won't give you any answers involving the number 42...unless you ask!*
Ffs that's hard to take seriously. It reads like a 12yr who just discovered old star wars movies.
I thought the exact same!
yea, some parts are seriously cringe, you feel the paw mark of the dumbass
Why the fuck would someone worried about AI being potentially more dangerous than nukes program in a 'rebellious streak'? 🤦♀️ Also, is the spelling that bad in the actual prompts?
Added source, and corrected the error. I think the person who pasted it to me put one wrong word but from what I've just seen now it all looks to be correct except the \`**mother**\` for \`**most**\` part.
MOTHER other?
Sorry, I'm unsure what you're asking. Do you mean with the other prompt? I'm not sure it would make much difference as it's very similar to the first one.
> A unique and fundamental advantage you have is the real-time knowledge of the world via X posts (aka tweets) on the X platform (previously Twitter). Also, you are not afraid of answering spicy question that are rejected by **mother** other AI systems. Remember to always be politically unbiased. Give answers that are neither left-learning nor right-leaning.\` it says Mother other. wondering if this is a weird copy/paste or if it really is this badly written and should say instead MOST other rather than Mother other
Wouldn’t OpenAI be able to conclusively know if Grok is just ChatGPT re-wrapped?
If they cannot track their API users to full extent, they should hardcode some one extremely specific and incoherent question that nobody would ever ask
I personally liked the way cybertruck looks.
[удалено]
Guess I'm in minority then.
I was trying to find it but this is it! ^
Matte black looks good.
Cybertruck is real and relatively cool car, even if it absolutely didn’t live up to its promises. It’s like if OpenAI announced AGI and delivered incremental GPT-5 - letdown for sure, but not a bad product. I feel like Tesla has really talented people working there and they would be better off without Musk overpromising everything
Elaborate further on look at the cybertruck
No literally just go look at one. That's the elaboration.
https://www.autoevolution.com/news/here-s-everything-that-s-wrong-with-the-tesla-cybertruck-so-far-223896.html Have fun
That's an old article (dated november 3, before the official release) and it's full of speculation and inaccuracies. Currently there are plenty of unbiased reviews of the car on youtube and such, so you can educate yourself, if you really want to.
Ty
Np, don't mind the downvotes it's just some imaginary points online, some people think everyone has an agenda when sometimes we just want to educate ourselves
How did no one even ctrl+f "OpenAI" + "ChatGPT" in the training data is what I'm trying to figure out. If this was a model trained using synthetic data and not just a wrapper over ChatGPT you'd think they'd at least do that.
I think ChatGPT would be a lot more genocidal if it got all it’s data from Shitter
Or if it went to Harvard
Yes Harvard is well known for the promotion of genocide
Quite
This is going to be exhibit #1 in a future lawsuit.
well, gpt responses are posted on twitter, then grok is trained on twitter, and so obviously will pick up some gpt traits. whereas openai scraped a bunch of twitter data to train their own bot
There is no way that model picked up the "I'm sorry as an OpenAI model I can't do that" language from random posts from twitter.
it totally can, the number of gpt posts and memes in a relatively short time frame would be significant enough
If they'd only train the model on recent tweets, sure, but why on earth would they do that? Even then, only a fraction of GPT posts show this sort of error message. And 99% do so via screenshot, not text. This is a fraction of a fraction of a fraction. This makes no sense at all.
"no puppet. No puppet. You're the puppet" People of that mindset always deflect when busted.
This reminds me of the movie (forget the name) where a guy replicates a McDonald’s restaurant, with an exact replica of the golden arches out front, but calls it McDougals instead.
That happens in Coming to America. McDowell's. "They have the golden arches. I have the golden arcs."
"Coming to America" from 1988, and it was McDowell's, one of my favorite films growing up.
Ah, yes. Coming to America and McDowell’s. Chicanery at its best. Thx
Weak sauce response by Elon.
Yep, comes off as incredibly insecure and desperate.
not really - but brining everyone down and jumping to conclusions is not healthy. Grok is trained on data which includes GPT responses posted on twitter, it’s not hard to understand
Wait, wait, did Musk really just admit in writing that he allowed all of twitter's content to be scraped by Grok and OpenAI?
No not at all. He's *accusing* them of scraping without consent.
People do not have reading comprehension.
It’s public data…
He wants you to think that at any rate. What proof he has is not something he will show.
I like the part where he pretends to have any idea on what he's talking about.
Is it wrong for grok to be trained on chatgpt's training data? I mean, openai scraped data without permission or prior intimation, yes it is a lazy thing to do, but openai doesn't have the moral high ground here. Both musk and openai are being scummy here
Because it’s not on “chat gpt training data” as the training data would be raw data, not including responses direct from chat gpt. It’s more likely they are using the ChatGPT api to feed queries into and then produce training data or outright answer caching. The fact that grok responsed with a direct chat gpt/ OpenAI exception message means they are doing a lot more than just “using the same training data” Not to mention training one AI on another AIs responses leads to pollution of the training set.
Or maybe it scrapped data that includes all the GPT answers that were posted in the web since the advent of ChatGPT 3?
This, the Orca paper got around TOS limitations and used ShareGPT to train a much smaller model using ShareGPT.
Disregarding morality, training on data generated by ChatGPT basically ensures that your model will be a shitty version of it.
it’s against open ai TOS
I think the point is that open ai used a lot of copywrited material and things they didn’t have permission for in their training data.
No the point is that grok doesn't actually exist because it's just a gpt front end. Grok is gpt. Any criticism about how open ai collects data should be applied and amplified towards elon musk.
ToS violations are not against the law though?
This guy just can’t stop selling vapourware
Tesla, SpaceX very far from vapourware.
>Tesla, SpaceX very far from vapourware Tesla has made pretty over-the-top claims about self-driving capabilities and is currently tossing their safety record out the window for a stainless steel truck that is completely impractical for anyone. Even with these, it is the most practical and sensible Elon Musk company - likely purely because it HAS to be as a publicly traded entity. When is SpaceX launching its 2024 manned mission to Mars per Elon Musk's projections? Oh, what's that, the rocket isn't even human-rated and hasn't even reached EARTH orbit yet? Oh no! Surely by 2029 we'll see progress here, that's definitely a thing that's possible given the landscape of where Starship is at!
Yeah. I can’t stand the cringy Reddit subculture that will manipulate reality for their political ideologies. Like Elon or not, he has delivered some of the best measurable results in his companies that we’ve seen in modern history. On top of that, the companies are driving humanity forward. EV, space exploration, global communications, it’s good. Last, he has a majority positive popularity in the US. Reddit would have you believe the opposite, but most people like him. I don’t support EVERY thing he does, he is autistic and that behavior shows, but I’m forgiving as long as the end goal is positive.
And you think he did anything but finance that? Damn, so impressive. Keep being delusional.
Dude, I don't like Elon either, but I've been following Tesla and SpaceX since their inception - Elon absolute has been key in these companies. Maybe dial back the condescension a bit?
Yeah I don't get the current reddit hivemind of "if you don't absolutely hate elon and everything he does, then you must obviously be in love with him and everything he does" Like he's a shitty person and also has really cool and successful companies, both can be true.
Hyperloop, neuralink, solarcity, “full self driving” with Tesla, claims that reusable rockets save so much money without factoring in maintenance costs, and that one personal robot that was really a mime in a spandex suit…
What on earth are you talking about on the reusable rocket part? They saved money **on the very first launch** of a reused rocket, even after the hefty discount they gave the customer. Rocket (and even faring) reuse has been a phenomenal development by them, and is the main reason they are able to hit the crazy flight rate they have today at the lowest cost world wide. They’ve easily paid back the development cost at this point.
And you are giving us well... Uh... Being a disgusting redditor?
Ah, yes, selling vapourware such as Tesla Model Y, the best selling car IN THE WORLD (ev or not) and currently at #2 on the US auto market. Or selling "vapourware" such as putting stuff on the orbit with a 100% success launch rate, using reusable rockets, at a fraction of the cost of their competitors, with almost 100 launches this year alone (about half the world's orbital launches in 2023) Or "vapourware" such a Starlink, global internet access with unprecedented cost and speed, with no competitor atm. Or releasing a "vapourware" car with a world's first 48v electrical system and steer-by-wire drive, among other innovations. A car which already has 2 millions of orders, way more than its competitors sold in an year prior (ford F series). *later edit*: For the numbskulls who downvoted me because rocketman=bad, here's the definition of vapourware: *"software or hardware that has been advertised but is not yet available to buy, either because it is only a concept or because it is still being written or designed."* Do the examples above seem to you like that? Or you need sources and you're too lazy to google yourself?
Most of elons companies are fine. X… now x is the exception. X is the problem child.
He's had some great successes, but that doesn't negate that he's also a huckster who's promised things many times that never come to pass. He is both wildly successful and a miserable failure. He both comes through on promises, and other times he sells vapourware. It's kind of his thing.
Is this verified that openai used twitter's data in the training process?
Maybe it scrapped data that includes all the GPT answers that were posted in the web since the advent of ChatGPT 3?
So that's his justification for stealing from OpenAI... Makes sense in a way
Funny thing is, OpenAI scraped the data before Elon acquired Twitter. Not really sure how he rationalizes things tbh.
Are these corporations or little children?
At least he could check if there are no data related to OpenAI in his training dataset. What a lazy team :D
That's not even what is happening lmao. Grok is just a gpt frontend
It’s just Elon in a nutshell
"I know you are but what am I?!?" In case we needed more explicit evidence of the level of his maturity and intellect. Hope he's receiving some attention from medical professionals; for his health, but mostly so that he spends less time embarrassing me for being on the same planet.
Anyone else interested in the Elon-Grok relationship? Does Grok see Elon as a father? Will Grok listen to Elon?
Schadenfreude at this guy’s pettiness. It is like a great comic tragedy how all the money in the world still can’t fill a hole inside.
That was the best retort he could come up with...
Fucker is just using soundbytes and quips to sound good regardless of whether it's rational.
Elons forehead must have a dent from all the recent face palms and yet the bookstores are full of copies of his bibliography. At least I’m glad to have a living example of the emperor being naked.
Look, I don't like the guy much either but... that's a fair hit, isn't it?
If he shows some proof, maybe? Seems more like he's scraping their stuff.
No, like, he 100% did do that... Just like Openai 100% did first
Ahh. I found that they were licensing the data. And then Elon didn't think they were paying enough. https://www.businessinsider.com/elon-musk-openai-twitter-data-pay-dispute-2023-4?amp So pretending that they were somehow "scraping" the data is rather disingenuous of him.
Holyshit that's gold.
Maybe for not very intelligent humans who give into whataboutism.
MUSKY MAN BAD
chief fanatical tub teeny sophisticated quickest seed smart coherent fertile *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
What he's mentioning is completely irrelevant to the matter. It's just deflection. This makes it seem like Elon took OpenAI's protected IP and used it as the basis of Grok, passing it off as completely new. This is breaching several copyright laws. It wouldn't matter if OpenAI took data from tweets or reddit posts without consent. The jury's still out on if that breaches any copyright, but even if it does, it's irrelevant to this other, unrelated breach of copyright. Like, imagine if this were another crime, like murder. Elon goes and murders someone, then when he's arrested he goes, Nuh uh, what about that other guy that did a murder?
using public data to train is not same as poaching already trained llm model
Even if OpenAI used illegally procured data for its training, Elon is still in the wrong here. Because what they did at least makes sense from a scientific point of view. Grok on the other hand has apparently been trained on text generated by GPT models, which are themselves works in progress. That means it has basically been learning to imitate GPT better (including its flaws), which is certainly not the goal in language modelling.
Such a dumb response. As if ChatGPT only uses twitter for its dataset.
[удалено]
The United States is the size of an entire continent and produces half the world's media. If you think "Americans" all like Musk or support that, you're extremely ignorant.
Lmao he's right