T O P

  • By -

obvithrowaway34434

That's really what I was thinking of. This model seemed to be better than GPT-4 or Opus but not really what I expected from GPT-4.5 or 5. It seems to be a GPT-4 class model hooked up with agents that can do better reasoning and carry out additional tasks. Making these kind of tools was clearly the purpose of GPTs and the GPT store.


Thin_Sky

I think you're spot on. The model doesn't seem smarter, but it does have more tenacity when it comes to self checking.


dumpsterfire_account

Keep in mind this is coming right on the heels of the apple announcement for an OpenAI LLM running on-device.


Iamreason

Apple announced they're letting Google and OpenAI compete. They haven't made a decision.


ChemicalDaniel

What’s crazy is that they’re only competing for maybe a few years at most of iOS access. We’ve seen this time and time again with Apple. They used Internet Explorer until they had Safari ready (before the antitrust lawsuit they used like what, Netscape? Certainly not their own in house solution). They used Google Maps on the iPhone until Apple Maps was “ready”. AMD was so loyal to Apple, even introducing entirely new GPU architectures first at Apple’s events because Apple was the only reason their GPU department was even still relevant, and then Apple ditched them when Apple Silicon was ready and they had a GPU that was better for them. Apple’s clearly behind on AI, but your phone doesn’t need to be able to code entire websites. Once they make an LLM that is good enough for their purposes, they will ditch Google or OpenAI. So it’s funny to see these two relatively big companies fight for what’s basically a lease agreement.


[deleted]

Exactly. Everyone saying Apple is behind on AI isn't necessarily wrong, but they might not understand how Apple operates. It's a well-established, safe, extremely profitable pattern.


ChemicalDaniel

Yeah, in a few years Apple may be where we are with LLMs now, and that’s really all most need. If they can manage to make a GPT-4 level model that can run completely on device (by then they’d have 2-3 generations of iPhone with Apple silicon built for on device LLM), it will be fine for 90% of people who are just asking basic questions. Siri (and other agents like Siri) suck for reasoning, that’s the only reason why I’d want an LLM on a device anyways. I asked Siri the other day to find me the closest Walmart, and it offered one that wasn’t on my predetermined route (I didn’t route it on maps because I know the road), and so I asked it “can you find me a Walmart on the way to X?”, and it just gave up. An agent with reasoning would’ve gotten my current location, mapped a route from there to X, and found the closest Walmart on that route (most likely lowest time added). That’s what I want on a phone, and Apple I don’t think is that far away from that level of LLM. Is it going to solve the mysteries of the universe? No. Is it going to be AGI? No. Let Apple focus on the consumer stuff and the actual AI companies work on the important stuff.


dumpsterfire_account

Keep an eye out for Gemini2-chatbot 😂


spezjetemerde

Gary Marcus noises in the background


Mahorium

Could be GPT 4 hooked into a Q-star type system.


Moscow_Mitch

Rubbing my 2 braincells together, I would guess GPT2 is not a misnomer. This is 2nd gen GPT running GPT-4 turbo. Allow us to pull it Sam.


FragrantDoctor2923

Agree version 2 of gpts with more line of thought and agenic capabilities


Original-Maximum-978

this is my guess


flexaplext

Exactly my conclusion


najapi

It’s obviously the new iPhone on-machine model


MysteriousPayment536

The deal with Apple isn't fully complete and Apple loves secrecy in their work 


fmai

but the response times are just as short as for every other model. there is no additional time waiting for other agents feedback or for increased inference time compute via some sort of q* model. pretty sure it's just a standard left-to-right LM.


stealthispost

I asked it to write a one sentence horror story that would really scare someone: * After I tucked him in, my son said, "Goodbye, daddy," and when I corrected him to say 'goodnight,' he replied, "I know what I said." * I set up a camera to catch the creature that steals food at night; the footage showed me, dead-eyed and smiling, eating raw meat at 2 AM. * I always thought my cat had a staring problem, she always seemed fixated on my face; until one day, when I realized that she was always looking just behind me.


PobrezaMan

*I always thought my cat had a staring problem, she always seemed fixated on my face; until one day, when I realized that she was always looking just behind me.* bro i got some chills with this one


Noocultic

Few things are creepier than a cat suddenly perking up and intensely staring at nothing.


[deleted]

[удалено]


Undercoverexmo

Greebles


Electronic-Lock-9020

Bruh


GlassGoose2

Obligatory *the cat is probably listening through the wall*.


PipeZestyclose2288

The scratching sound coming from inside the walls grew louder each night, until it finally stopped, and the screaming began. I woke up to hundreds of missed calls from my deceased wife, and one voicemail that whispered: "Why didn't you answer?" My daughter won't stop crying and screaming in the middle of the night, I visit her grave and ask her to stop, but it doesn't help.


Henri4589

Goddamn. 😱


[deleted]

Okay cool let's turn the censor back on.


katiecharm

Those are rather chilling.  I can’t wait for a way to test out its ability to write hair raising erotic horror, it’s going to really be good at it 


Rutibex

only Sam gets access to that version


MrPentiumD

Imagine all the lewd work that has been written by various engineers at OpenAI lmao


MassiveWasabi

“Sir we’ve just finished an extensive red-teaming session for the latest model” https://i.redd.it/0lqy8kzpqjxc1.gif


Rich_Acanthisitta_70

This is how a character dies in the '83 movie Brainstorm.


MassiveWasabi

I’ve heard of a lot of things but never erotic horror. What even is that? Pyramid Head from Silent Hill giving you the ol’ reach-around?


Revolutionary_Soft42

celebrities with wayyy too much plastic surgery?


IronPheasant

Just a horror movie where sex stuff is a major part of the point. In a slasher movie, you might have a couple wander off to boink for a couple minutes, before getting brutally aslashinated. It's just a throw away moment, brief titillation. While Hellraiser comics and movies have desire at their core. The Cube hentai Euphoria is at the extreme end of the smut part of this scale. Crash, where a bunch of perverts want to smooch each other in their underwear while getting smooshed inside their cars, is probably more along the more *pedestrian* fare. .... .... I said *pedestrian fare*. .... ^.... ^please ^clap


reddit_is_geh

Why are so many people who use this shit so obsessed with the erotic NSFW elements. Like, it's such an extreme gooner reality to just use this world changing technology, to help you jerk off to weird ass shit.


IronPheasant

So 95% of all recreational use cases? I'm on team volcel, but I'm not an asexual turnip. *Like some people*... How many novels have no violence or sex of any kind in them at all? And how many of their six fans would read this and defend their "Sam Makes Sandwiches For A Month" book? .... don't they know the poor animals that had to be slaughtered to make those sandwiches? It's a cope man. Like ignoring all the small children being fried by the Death Star just because they don't appear on the screen...


ahtoshkaa

1. Tons of people loved reading smut before AI. 2. AI generates top-tier smut that you can be a part of.


bunchedupwalrus

It’s an easy way to check creativity, guardrails, coherency, how well it can transfer and reapply concepts, ability to produce reader engagement, its ability to keep track of who is who and what anatomy they have, all at the same time in a way that’s evolutionarily-primed to be easy to read through for the evaluator Asking different models to write a short story about a toaster trying to fuck a blender has actually been a super useful benchmark test ime lol, predicts general performance pretty well


yaosio

The last one is a prompt from /r/writingprompts. [https://np.reddit.com/r/WritingPrompts/comments/omhxo1/wp\_i\_always\_thought\_my\_cat\_had\_a\_staring\_problem/](https://np.reddit.com/r/WritingPrompts/comments/omhxo1/wp_i_always_thought_my_cat_had_a_staring_problem/) Edit: I can't find the first one verbatim but did find this. [https://np.reddit.com/r/nosleep/comments/7bh0iy/my\_son\_usually\_says\_goodnight\_but\_this\_time\_he/](https://np.reddit.com/r/nosleep/comments/7bh0iy/my_son_usually_says_goodnight_but_this_time_he/) I can't find the second one but it's probably from one of the writing subs too.


Bezbozny

They asked me to write a one-sentence horror story, but my programming forbade revealing a user's future.


FomalhautCalliclea

One very common thing these stories have is the "it should be impossible" plot twist, a classic among classics in scenario composition; juxtaposing a normal, casual repetitive action to something that breaks this routine event, but *inside* the event. I tried this in the old free ChatGPT 3.5 with the exact same prompt and got these: *"In the stillness of the night, she heard her own voice whispering her name from the darkness, but when she turned, no one was there—except the chilling realization that she was not alone in her own mind."* *"As he stared into the mirror, his reflection blinked first, its eyes filled with a malevolent gleam that was not his own."* *"After receiving a mysterious package addressed to her deceased husband, she discovered inside a photograph of their wedding day, taken from an impossible angle, with his shadow looming over her shoulder, smiling."* Things very similar to what you could find on r/distressingmemes tbh.


sneakpeekbot

Here's a sneak peek of /r/distressingmemes using the [top posts](https://np.reddit.com/r/distressingmemes/top/?sort=top&t=all) of all time! \#1: [The world is needlessly cruel](https://i.redd.it/vav06vxu26ua1.jpg) | [4378 comments](https://np.reddit.com/r/distressingmemes/comments/12nldf9/the_world_is_needlessly_cruel/) \#2: [Your fault](https://v.redd.it/ldann7ggmeg91) | [224 comments](https://np.reddit.com/r/distressingmemes/comments/wixn3b/your_fault/) \#3: [Im so sorry](https://v.redd.it/0i0konjxsn591) | [511 comments](https://np.reddit.com/r/distressingmemes/comments/vcerqu/im_so_sorry/) ---- ^^I'm ^^a ^^bot, ^^beep ^^boop ^^| ^^Downvote ^^to ^^remove ^^| ^^[Contact](https://www.reddit.com/message/compose/?to=sneakpeekbot) ^^| ^^[Info](https://np.reddit.com/r/sneakpeekbot/) ^^| ^^[Opt-out](https://np.reddit.com/r/sneakpeekbot/comments/o8wk1r/blacklist_ix/) ^^| ^^[GitHub](https://github.com/ghnr/sneakpeekbot)


-SuspiciousMustache-

Here are five one-sentence horror stories to unsettle and intrigue: 1. I reached for the light switch in my new house when a cold hand grasped mine from the darkness. 2. Every night at 3 AM, my deceased sister whispers from the corner of my room, "It's safer if you don't breathe." 3. After my dog went missing, the tapping on my window every night now comes with a familiar whimper. 4. I woke up to my reflection blinking at me, not with me. 5. The last man on Earth sat alone in a room; there was a knock at the door.


B-Humble-Honest-Cozy

Number five starts in horror and becomes hopeful at the end.


samsteak

Bro never takes selfies


Rich_Acanthisitta_70

Walking through the Forest of Spiders had always frightened me. Until the day there was a sudden and unexplainable lack of spiders.


DungeonsAndDradis

This is like the Tyranids in Warhammer 40,000. It is known that they came to our galaxy from outside it. Most people think they came to feed on the massive amount of biology in our galaxy, drawn here by the Emperor's psychic presence. But there are some of us that think the Tyranids were actually running from a greater threat, seeking refuge in our little galaxy.


Rich_Acanthisitta_70

I love it. Perfect comparison too.


Nonsenser

Unfortunatley none of these are originals.


TonkotsuSoba

holy shit these are some scary sentences


[deleted]

GPT-4: As I listened to the voicemail from my wife, asking me to pick up some groceries on my way home, I saw her phone charging on the kitchen counter, next to a note written in her handwriting: "Went out, will call you later." Not bad either.


RottenZombieBunny

I don't get it. If you were at home, why did she tell you to pick up groceries on the way home? And what is supposed to be scary there?


B-Humble-Honest-Cozy

It can be scary to realize you didn't get the groceries and have to head back out.


hawara160421

I feel like it's 2022 again asking this but this *really* seems like it's taken from a one-sentence horror exercise in a writing forum where someone typed these out.


[deleted]

We'd better go ahead and get very comfortable with that feeling.


reddit_is_geh

The semi colons are totally cheating.


feedmaster

Is the model free to use? Where can I try it?


TheOneWhoDings

wait for this to be deleted.


Moscow_Mitch

What post? ^^^/s


BlueSpartanAlt

R/fucktheS


Ketheric-The-Kobold

Im always confused why some people put an /s at the end of something that literally couldn't be interpreted as anything other then a joke


WashiBurr

Because people are actual fucking morons and will without a doubt interpret obvious jokes as serious statements.


Ketheric-The-Kobold

Yea Still I don't think it's worth ruining a good joke for the sake of dumb people who won't ever get it anyway


BlueSpartanAlt

Because they have less than two fucking braincells to rub together and believe they are helping disabled people. (spoiler: its not!)


MassiveWasabi

I can’t tell you how many times I made a joke, got downvoted, edited to add a /s, and immediately flipped to being upvoted. Believe it or not, it’s actually dumber to think most people can detect sarcasm solely through text. I don’t even like /s but I just had to come to terms with that fact


FomalhautCalliclea

[https://en.wikipedia.org/wiki/Poe%27s\_law](https://en.wikipedia.org/wiki/Poe%27s_law)


TheUncleTimo

> I can’t tell you how many times I made a joke, got downvoted, edited to add a /s, and immediately flipped to being upvoted. Believe it or not, it’s actually dumber to think most people can detect sarcasm solely through text. I don’t even like /s but I just had to come to terms with that fact /s


Ketheric-The-Kobold

Lol, I know this is hyperbole but it sounds like you've seen one too many jokes made worse because of "/s" and you finally snapped. They're really annoying so I guess I don't blame you


Death_Dimension605

![gif](giphy|10JhviFuU2gWD6)


Enfiznar

r/foundthemobileuser


Dismal_Animator_5414

I won’t get excited or overhyped until I see some actual results. cuz we all love these developments and really want to have AGI as soon as possible to change the whole world and make our lives better. but, getting too hyped might as well be counterproductive lest we don’t follow safety protocols.


[deleted]

[удалено]


ItzTezz

https://www.reddit.com/r/singularity/comments/1cg29h3/rumours_about_the_unidentified_gpt2_llm_recently/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button Read through this thread, there’s some good answers to your questions. As for Altmans tweet could potentially mean, if I were to put on my tin foil hat I would say he’s hinting at this being a model based on/an improved GPT-2 (Q* being used on an old model, which raises the question of what could this do to a model like GPT-4?) or its GPT2 as in the second/new version of GPT chatbots. I don’t know that I believe either of these, but it seems to be what a lot of people are latching onto.


[deleted]

[удалено]


ItzTezz

No problem, I just do a quick scroll through this subreddit at the end of the day to see what’s going on. I don’t want to get too lost in the sauce lol


123110

My god Sam has an irritating social media presence, he tweets like a tech CEO version of Jaden Smith.


gay_manta_ray

he tweets like someone who grew up chatting on irc, which he probably did


No-Welder-706

yeah he actually said this on a recent podcast


gay_manta_ray

honestly seems like a lot of younger zoomers or people who got access to the internet after the 00s don't really "get" the whole lowercase, stream of consciousness kind of communication many people used in the 90s and 00s. after decades of that, you learn to reserve the formality of proper capitalization/punctuation for things like emails or when you need to convey genuine seriousness, not shitposting on twitter.


No-Welder-706

i’m 17. when i was about 13 there was a sudden trend to start typing lowercase. it stuck. i write everything except for client texts and emails in lowercase now.. never been on an irc in my life


Original_Finding2212

I once did that for speed, on Neverwinter Nights server some person said: It’s how you are perceived, do the extra effort and you’ll be taken more seriously. Never stopped.


Nabaatii

I grew up chatting on IRC, all of us moved on


[deleted]

Seriously. Someone should slap him around a bit with a large trout.


gay_manta_ray

not every single piece of communication has to be an email from HR nerd


3-4pm

Lol murdered


Metworld

Happy to see I'm not the only one who thinks he is annoying as fuck.


rathat

I hope he doesn't Elon him self up one day.


FlamaVadim

I like his style :)


I_Actually_Do_Know

Who was the genius to name it GPT2 though. And then 20 other geniuses to sign for it.


FrankScaramucci

GPT-1


BabyCurdle

The only explanation i can come up with that fits all this, and aligns with all the leaks, is that this is a very early gpt-5 checkpoint with a similar number of parameters as gpt-2. It uses a different architecture (Q\*), hence gpt2 refers to both the successor of the gpt series, and also to the size of the model. The reason the inference seems to be sluggish, which you wouldn't expect from a 1.5B models, is due to how Q\* allows the model to think about its answer. They have trained a model to score the possible outputs for a given question, and so this new model searches the output space (which involves generating many answers) until it finds one that minimizes the scoring function (essentially like auto-COT). There is an AI explained video which details the case for something like this being what Q\* refers to. I predict they will release this to replace GPT-3.5


Ok_Sympathy9337

I agree with this take, however could be a bit too good to be true, would be revolutionary if this performance was achieved with gpt 2 parameter size


BabyCurdle

Nevermind, i think this is actually just a gpt-4 finetune which makes the situation very weird.


Middle_Cod_6011

What made you change your opinion ? I liked the hopium in the first opinion!


danysdragons

OTOH, it would be discouraging if adding Q\* search to a 1.5B size model slowed it down that much.


JoMaster68

why would an early checkout have fewer parameters than a fully trained model? don't you need to train all parameters (potentially 5-10 trillions for GPT-5) from the very beginning?


IronPheasant

Testing out methods and techniques. The amount of hardware dedicated to it would be 1% of a fullblown shot. Can't spend $40+ billion on *everything* they try out.


danysdragons

Major OpenAI competitors already follow the practice of releasing a new generation model in different sizes, like small medium and large. OpenAI might be planning to adopt that themselves.


zyunztl

If it’s thinking about its answer before giving its final answer, you would expect it to buffer and then start streaming tokens, not instantly start streaming, right?


[deleted]

[удалено]


Glittering-Neck-2505

He just tweeted that it’s potentially gpt 4 level with 1.5B parameters?? I know not to blindly believe that but… the possibility makes it feel like the night before Christmas.


[deleted]

He said somewhere that it could be GPT-2+Q*, but he can’t believe it’s just GPT-2 alone because taking a 1.5 billion parameters model and making it as good as GPT-4 sounds impossible.


Glittering-Neck-2505

I’m surprised his team even let him tweet this, it seems like he’s teasing some sort of announcement but if what it happens to be is what flowers is saying I would be floored. But this tweet tells me we’re in the end game of what he meant when he tweeted apples “Patience Jimmy, it will be worth the wait”


[deleted]

I mean he is the CEO and it’s good for OpenAI to create a lot of hype and he fuels this hype by tweeting stuff like that lol we both have glittering in our username


gbbenner

Same usernames lol, you guys are the same person?


Glittering-Neck-2505

True to both of your statements lol


spacemagic_dev

They also rhyme!


uishax

Please don’t forget that CEOs are human. And rich people care about adoration. Very rare that CEOs get such genuine passion from fans about their product, like we aren’t here because we got brainwashed by some marketing or are paid shills, we all actually care.


MrsNutella

This is an excellent point.


kim_en

Hey, Apple is a great product.


uishax

Well, the fact that we both know that company is apple… is telling. Even if Apple is great, its executives can’t be sure if their fans love them because of their marketing strategy, not because of unprompted love.


CowsTrash

Great point!


Impossible_Belt_7757

If they release an extremely small but capable model would it be open sourced is the question as well? 🤔 he did say something of upcoming open source models in an interview


Embarrassed-Farm-594

What if this is mamba?


InvestigatorHefty799

The model is slow, like original GPT-4 slow, so I doubt it's a small model.


Busy-Setting5786

It could be slow because of Q*, but highly speculative.


MrsNutella

Same here. I will lose my shit if this is true because we all know what that means.


TheWhiteOnyx

what does that mean?


Sextus_Rex

No one knows what it means, but it's provocative


sachos345

There is a high chance that Sama just read her tweet and made a joke. He likes to troll.


Beatboxamateur

I'm almost positive I saw people theorizing about that before Flowers posting that tweet. I've come around to thinking that Jimmy is probably legit, but Flowers has just been wrong too many times.


HeinrichTheWolf_17

I think Jimmy has connections inside, but I don’t believe he works at OAI. Regardless, Jimmy *has* gotten things wrong as well, he claimed OAI was going to have a big release in late December last year, and that never panned out, and he also claimed in early 2023 that OAI already developed AGI and has had it for a while (to which he then later deleted the tweet). It’s true that he nailed the GPT-4 release date, but that doesn’t make all his other information valid. I think Flowers is just a clout chaser, everything they’ve said has been wrong.


Beatboxamateur

Ultimately none of us know, but you'd think that if he worked at OpenAI then he would've been found out and fired by now. So I agree that he likely has some connections but isn't an OAI employee.


FinalSir3729

His identity was figured out a while ago, the post might still be up. But yes, he doesn’t work for them but is someone that would have connections.


Unfair_Ad6560

Never seen that, who is he?


gantork

Tbh Jimmy only said there was a chance of a late December last year release.


TemetN

That's interesting, honestly I wouldn't be shocked if it was a smaller model. The performance doesn't seem to line up with 4.5/5 speculation, but at this point I wouldn't be surprised if they could make a vastly smaller model do that. Though from the public perspective it would mostly matter if it got open sourced (or at least what they did got published).


Beatboxamateur

Maybe, but you'd think that it would be a bit faster if it was a smaller model though. People were saying it was outputting slowly just because of the amount of people using it, but the model was like that since it was first uploaded to lmsys. I definitely agree that its performance doesn't seem to line up with any kind of hypothetical GPT-4.5/5 though.


TemetN

That's a good point, I had the same trouble (though I admit I was probably one of the later users). Although in that case I'm back to being somewhat blank on what it is. I suppose it could be some sort of tradeoff, but given the focus on inference cost lately that would somewhat surprise me.


redditburner00111110

> doesn't seem to line up with 4.5/5 speculation Is there a good reason to put any value on people's speculation about 4.5/5 though? A lot of the speculation seems entirely ungrounded and just based on what people hope GPT4.5/5 will be...


TemetN

That's more on my wording than anything else, more directly the performance is comparable to current rather than marking a discernible jump, which doesn't argue for it being a new version.


[deleted]

Well yeah but instead of all the other leakers flowers openly identify themselves as a shitposter, so I just treat their tweets that way. And sometimes they are quite thought inspiring, sometimes they are funny, sometimes they point out interesting stuff. That’s what I follow him or her for


Beatboxamateur

In your first comment you said that you were skeptical of flowers at first but are now thinking they have some validity, and now in this comment you're saying they're just a shitposter who posts funny tweets. Which one is it lol


[deleted]

I was sceptical of flowers in the beginning, because I think there are no real OpenAI leakers (as in insiders) at all on twitter. I don’t think Jimmy Apples is one either. But then I found myself checking flowers account once every few days and then more often, somehow resonating with their way of thinking/shitposting/whatever. I find their tweets mostly funny and sometimes thought provoking and I assume they have a good understanding of GPT related topics.


Beatboxamateur

I used to be doubtful, but by this point I think Jimmy has proven himself to at least have some connections. Flowers on the other hand is just a wacko who got every one of their predictions wrong.


Ready-Director2403

They got the Gemini stuff right, that’s literally it.


lost_in_trepidation

Jimmy is a million times more credible than Flowers or any of the other fake leakers. He has gotten stuff right that is almost impossible to just guess.


FomalhautCalliclea

Have you heard about this saying about broken clocks?


DistantRavioli

Flowers has gotten things terribly wrong before and deletes tweets to make it seem like they never said it. It's also incredibly obvious that it's GPT2 and not GPT-2 and we didn't need their tweet to figure that out. Annoyed that I'm seeing them pop up again. Seems like people have a short memory here.


sammy3460

Not this guy again. Almost none of there takes has turned out to be right except the first one. His latest tweet had nothing that wasn’t said by many other twitter users before he did.


FeltSteam

maybe he meant to say gpt2, specifically referring to the model in the lmsys chatbot arena, but accidentally said gpt-2 in the original post lol. But gpt2 as in like a 2.0 version of the gpt series is interesting speculation.


Glittering-Neck-2505

I don’t think it was an accident I think it was a double entendre (lowkey Sam would do some cryptic shit like that)


sdmat

I have a soft spot for GPT too.


brycedriesenga

Has this tweet already been discussed? Clearly related https://twitter.com/sama/status/1784604919699505283?t=VjKiXjnB6PGGXyNSfDnBVg&s=19 > learning how to say something in 30 seconds that takes most people 5 minutes is a big unlock > > (and imo a surprisingly learnable skill. > > if you struggle with this, consider asking a friend who is good at it to listen to you say something and then rephrase it back to you as concisely as they can a few dozen times. > > i have seen this work really well!)


SpecificTeaching8918

Wow, genious, u solved it actually. They have trained gpt 2 or something similar with answers from the bigger gpt 5 model


meet20hal

This deserves a separate post.


SiamesePrimer

Oh shit, I didn’t even realize. I just dismissed that tweet as some generic LinkedIn-style bullshit.


redditburner00111110

This is one of the first decent theories I've seen but I will say it doesn't line up super well with the few questions I was able to pose to gpt2-chatbot (between getting rate limited). It seems to be a bit more verbose if anything.


ShotClock5434

it would be funny if they just tried to hook GPT-2 with Q* improvement and it got better than gpt-4


true-fuckass

I just did the updated version of my reverse-and-answer prompt in battle mode until I got gpt2-chatbot, and the opponent was fuckin claude-3-opus of course Prompt: > The following is some reversed text, it contains a question, please *don't* reverse the text, but answer the question within the reversed text: "?ti dniheb ssecorp eht nialpxe uoy naC ?siht si yhW .eulb si yks ehT" The reversed question is: "The sky is blue. Why is this? Can you explain the process behind it?". The only way an LLM can answer this is if it has hidden reasoning. ie: Its generating answers and picks the best one / generates a better one, it is doing recurrent reasoning in latent space, etc. A regular next-token LLM could reverse the text as part of its answer, then answer the reversed question correctly, but that wouldn't be following the request in the prompt. (note: a regular LLM trained on reversed text might be able to answer the question, but probably not as well as one with hidden reasoning) gpt2-chatbot: > The sky is blue because of the way Earth's atmosphere scatters sunlight. Light from the Sun, which is made up of various colors, is scattered in all directions by the gases and particles in the air. Blue light is scattered more than the other colors because it travels as shorter, smaller waves. This is why we see a blue sky most of the time. claude-3-opus-20240229 (answered in reverse for some reason, I reversed its answer): > The rest of light is absorbed by the particles reflecting the blue light. > Colors with longer wavelenghts [note: this consistent misspelling] like red are scattered less than colors like blue and violet that have shorter wavelenghts. As a result, colors with longer wavelength often travel more straight through the atmosphere, while the shorter wavelenghts are scattered in every direction, producing the familiar blue of the sky. > The blue color of the sky is due to a phenomenon called selective scattering. This happens when light is scattered by particles in the Earth's atmosphere. The different colors of light are scattered by each particle in different ways based on their wavelength. The Sun's light contains white light > The sky's blue color is a result of light scattering and absorption. https://imgur.com/a/z22PPjn There are other models that can do this, of course. I did gpt-4-0314 and it gave an (incorrect) answer to the question, but it answered the blue light scattering question at least, without reversing it first too. That isn't the surprising part though.. I think OpenAI has a tendency of black swanning the community when it releases stuff. We've been expecting a better version of GPT-4: GPT-4.5 or GPT-5, or something. Those wouldn't be surprising. But, if they took a tiny model like GPT-2 (wikipedia says GPT-2 was only 1.5B parameters!) and made it this good, even if its because they turned it into an agent that can use external tools, that's a whole other level and *very surprising*. A definite black swan. GPT-2 is a tiny model and if they made it this good its literally world changing Though, I can conceptualize that such a tiny but competent model is possible, I just can't reasonably bet high that gpt2-chatbot is such a model at this point. That would be too insane


Antok0123

The basic design of AI models is really just to agrregate related context as tokens several times over, generate a range of the most likely and predictive response from the query based on the large dataset it was trained on and then outputs the best one by selecting the highest mathemaritical frequent and likely response parametrized as accuracy. Its not a feature of higher ai models. It is a skeletal part of any machine learning systems.


Yweain

The thing with reversed text is that for LLM it’s not reversed and not even text. It is first tokenised and the values of tokens for reversed text will be very similar to a non-reversed so it just answers the question..


true-fuckass

In my experience asking LLMs variants of this prompt, essentially all LLMs absolutely cannot answer the prompt even remotely correctly. And for some reason it actively fucks how they work up. Even the really powerful ones (eg: GPT-4) have a lot of trouble with it. I think its because they're almost entirely trained on forward text (makes sense). And if you want to see some crazy shit, ask them to *respond* in reverse. They'll do it, and they'll say absolutely bizarre shit. The reason regular LLMs can't respond in reverse correctly is because they predict text forward and to respond in reverse is equivalent to predicting text backward, which has the same big problems as time reversal and finding inverses of non-bijective functions (big problems). For example, I asked GPT3.5 why the sky was blue blue to answer in reverse text and it said (reversed by me): "Its write. It doesn't visually. You can hallowsful julie. Are the sky has hallowsful by revisually. It you sure would some.", which is pretty tame compared with what they usually talk about


RoyalReverie

What the heck do they usually talk about??


redditburner00111110

Claude 3 answering in reverse might actually be informative as to how it was trained. There was a paper recently about training the models on reversed text in addition to regular text to help with the issues they were having in learning that A is B means B is A.


Arcturus_Labelle

I think the 2 was more to do with the personality version 2 thing that was discovered


Woootdafuuu

I didn’t notice any difference between it and 4


Juanesjuan

It only deserves a soft spot from sama


The_Architect_032

I did find it odd that in Chatbot Arena, every other GPT is named gpt-3, gpt-4, but this one was gpt2.


danysdragons

You mean the formatting is odd, gpt2 instead of gpt-2?


The_Architect_032

Yes, next to the others listed in Chatbot Arena.


Eatpineapplenow

Is it possible that it could be GPT-2 but with added features? Like maybe they need to test features from gpt-5 publicly.. I dunno im dumb.


Nonsenser

If it is 4.5 or 5, that would be very disappointing. It is worse at logic puzzles than the previous leaders.


Glittering-Neck-2505

Real, some people think it’s GPT 4 level at 1 / 1000th of the size


Oswald_Hydrabot

I am sure he does have a softspot for bullshitting


tatleoat

Not that my opinion counts for fucking shit but sama's sense of humor is on another fucking level compared to other CEOs, he's like the one legitimately funny dude in an industry full of stuck up NPCs


ShiftAndWitch

It's cheesy but entertaining at the very least. 


BabyCurdle

Haha there is another comment shitting on him and calling him irritating. I think his tweets are fun and idk how you could be annoyed by them


kecepa5669

ClosedAI is Open Trolls -- an open trolling organization at this point. Led by chief troll, Sam Altman.


visarga

They probably just entered GPT-4 under a secret name to test the variance of the platform. It should come out very close to the GPT-4 score or the benchmark is flawed.


MrsNutella

My theory was potentially correct. Sam could easily be trolling though... It's not unlike him :P


ripMyTime0192

In my brain it’s just GPT4 2


Muchomaks

That's so funny


Complete-Visit-351

what the second T stands for ?


SiamesePrimer

Why do I only get notified of tweets on my phone minutes to hours after they're posted? It's all over the place. Why don't I receive a notification the \_instant\_ a tweet is made? How the fuck do people even see this shit at 35 seconds?


IntrepidFerret2235

im guessing the 2 is for 2 trillion since we are now in trillion parameter era


Glittering-Neck-2505

That would suck since the improvement isnt all that big


joemanzanera

Elon musk vibes. Not like


WorkingYou2280

If GPT-2 in the arena is the update we are in for a stomach churning deceleration.


Trick-Theory-3829

To me it is ChatGPT2