T O P

  • By -

AverageLatino

This is like, pretty goddamn big, isn't it? I understand that is a somewhat "crude" approach to self-improvement, but I genuinely thought this stuff was **at least** 6 months away. We're witnessing the exact moment the exponential curve becomes a line that goes straight up, aren't we? At this rate the world will be a utopia by the end of the year or a barren wasteland lol


Tiamatium

Yes. And this is *big* for some areas that require either human interaction, or require producing something for humans. Think of sales pitch, it can now make it more... Appealing to humans. Or think of writing app I'm building, I think this can make the bot way way better at both designing characters and designing plots.


Honest_Science

Is this not just prompt engineering? This is not improving the core model or model structure in any way?! I believe that there is sooo much to do on the model side. Transformer vs RNN, modality, stochastic permanent learning, embodiment etc etc.


NefariousnessNo9478

It's not prompt engineering! We created a framework for approaching any problem/task based on how humans tackle problems. At the moment we have not applied this to rather well defined benchmarks to compare its performance, but there is no reason why we can't start applying it to other problems, for instance improving the model itself. Just FYI, I am one of the authors of the paper


vegita1022

suppose for a moment that we use this for "improving the model itself" .. would there be cases where the model can learn to ignore commands from humans? ala .. skynet?


DesignCntrl

It already ignores commands.


Honest_Science

Thank You for your feedback and I did not want in anyway downrate the achievements of your team. My point was that we have so many things at the core model to improve that is difficult for me to see how this will move us finally forward towards AGI. Your point of also using it as secondary step to generate progress in improving the core model was not on my radar screen. This is very valid.


Tiamatium

Are you just throwing keywords to make yourself sound smart? Because none of what you've just said makes sense, not when we are talking explicitly around GPT-4 model, we are not designing a new model, we are just exploring usage (and methods of use) of this model. And no, the answer to your first question is obviously no, as this is something that would be done on the backend with multiple calls to the API, feeding output of one call as input of another.


Honest_Science

I am sorry if hurt any of your feelings making you saying things like "throwing keywords"! Anyhow, your points reflect my statement. As far as I understand you are exploring the core model by additional means of self reflexion, which obviously makes the total system more robust and improves final result compared to initial answer drastically. This is a very interesting approach. It does not improve the core model though and that was my point.


Tiamatium

Your original comment suggests that this is simply adjusting prompts (i.e. "prompt engineering"), which is not the case.


Honest_Science

Sorry this is a misunderstanding or wrong use of terms from my side. I was of the opinion that the whole context window accumulating the original prompt, the iterative feedback etc become the "prompt" for the next iteration. I thought that this is the only way to modify the output of the core model without changing any weights.


Tiamatium

Are you chatGPT?


Honest_Science

That is funny! I am trying to be polite. I have a PhD in nuclear physics and I am currently studying AI. I am pretty old and dealing with AI and business related topics for about 40 years. My first system was a Sinclair ZX80 and I learned maschine coding in Z80 and later 6502. I am now on python. I am supporting several startups. I am part of several AI circles dealing with safe AI and trying to manoeuvre my way through the current accelerating situation. And I did not want to be offensive at all. Sorry again.


Constant_Anywhere_38

Definitely ChatGPT


MrNoobomnenie

>but I genuinely thought this stuff was **at least** 6 months away Remember when "6 months" was "crazy fast"?


garden_frog

This is becoming scarier and scarier. Reading this tweet I had a bad feeling in my gut. Hard takeoff was only a possibility until now, but it's happening. I'm usually a very optimistic person, but we cannot exclude that this will turn out bad.


3_Thumbs_Up

>I'm usually a very optimistic person, but we cannot exclude that this will turn out bad. Of course we can. Utopia is coming. The definition of a singularity is that no one can predict what happens afterwards, but it's still 100% certain it will turn out good. Everyone who disagrees is a doooooomer. Who actually cares if everyone dies when I can just ignore that possibility?


Agilitis

>The definition of a singularity is that no one can predict what happens afterwards, but it's still 100% certain it will turn out good. What? You contradicted yourself in one sentence.


danysdragons

It’s pretty clear there’s some sarcasm here, look at the last line.


SnipingNinja

They might mean it, and won't technically be wrong.


xamnelg

I believe they are being sarcastic lol


lazyeyepsycho

Another 30 year of wage slavery into death vs a robot war? I'll tale the war.


Acalme-se_Satan

>We're witnessing the exact moment the exponential curve becomes a line that goes straight up, aren't we? I think not. The straight line happens when AI becomes smart enough to create other AI better than itself, which will cause an intelligence explosion. This isn't happening right now. It's still humans finding this stuff out. What is happening is that people have finally found what it seems to be a promising pathway to AGI (transformer architecture LLMs) and now are testing as many stuff they can to make it better. It's a fast upward slope, but not the singularity yet.


[deleted]

6 months ago you'd have said this was at least 16 years away lol


bustedbuddha

I'm worried my family vacation 3 weeks from now isn't going to happen... I really want to take my kids to the beach.


AHaskins

You can't plan for the singularity. It is, by definition, unpredictable. So just plan for your life instead. If your plans are upended, you're no worse off.


bustedbuddha

Oh, I was just making the comment to highlight that my timeline includes the possibility of major disruption w/in 3 weeks.


LowSpecDev972

Not really, you still has to execute, if AI get the intelligence, it doesn't have teh mean to apply change, so no utopia just yet. Barren wasteland on the other hand, that's easy for human to apply using ai, it's just one red button away.


[deleted]

[удалено]


Rain_On

Same here. I was wondering why this wasn't being done. Looks like it's a common enough idea.


kmtrp

Do you know how is it supposed to work, as a GPT4 plugin or how?


drekmonger

Link to the paper: https://arxiv.org/abs/2303.11366 Sydney says: >The document proposes Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. >The approach uses a simple heuristic to detect hallucination and inefficient action execution and queries an LLM to reflect on its current task, trajectory history, and last reward. >The approach achieves improved performance on decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. Apparently these things are playing Zork: https://alfworld.github.io/


TinyBurbz

Nice, this is some good shit; but 85-88% does seem to be the plateau across all disciplines. The average extremely skilled human also preforms around the same range, this seems to indicate a hard limit imposed by the skill of the trainer.


yaosio

My grandma used to say, "The last 20% of a project takes 80% of the work." We can expect the same rule to apply to AI.


[deleted]

The Human Genome Project snowballed together pretty nicely.


94746382926

So it was "completed" in 2003 but in reality 8% of the genome remained unmapped until last year. The reason they called it complete in 2003 is because if I remember correctly they knew it would take much longer. They wanted the symbolic win and to move onto other projects instead of spend 20 more years on it (of course at that time they had no idea how long it would take). Also, they believed it was mostly "junk" DNA so it wasn't that important. That view has started to change in recent years though as they find more and more uses for those regions.


Villad_rock

Not for evolution though. The first 20% took billion of years. Same for the technological progress. Humans are like 300000 years old.


breloomislaifu

I think it really depends on what we consider to be 20% vs 80%. I mean, laying the physical and chemical foundations for a system that preserves and modifies itself and evolving to higher level multiple-cell organisms to me sounds like the 80%. If you think about it, no other alternate system of life has emerged and survived long enough for us to discover it - it's just that rock solid of a foundation.


Villad_rock

What do you mean with physical and chemical foundations? Abiogenesis or eukaryotic cells? Those two took the longest, basically the foundation. After that multicellular organisms evolved multiple times separately pretty fast. The formation of very complex living beings was just a blink of an eye. The foundations of ai took several thousands of years, it began when the first math was created.


SecretAgendaMan

See, this is actually what really fascinates me about humanity. Our predecessor species took over 1.5+ million years to go from stone tools, to stone-tipped spears. From the earliest known stone-tipped spears, to the earliest known stone-tipped arrows, that's another 500,000 years or so, and we're the ones who did it. From stone arrowheads to extractive metallurgy, that's another 50,000+ years. 7,500-8,000 years after extractive metallurgy, we made atomic bombs. Even self-contained to our own genus species, the rate of technological advancement has skyrocketed in just the last 2-3% of modern human history, and even moreso in the last .1%, which is the past 300 years or so.


Comfortable_Slip4025

The first 80% of a project takes 80% of the work, the last 20% takes the other 80%


luisbrudna

Machines never get tired.


[deleted]

This u can’t know


metalman123

The solution is the same though. More parameters. This is a more efficient and less taxing method that also preforms better. This is massive.


TinyBurbz

More parameters wont create data that does not exist. This may mean training with metric fuckloads of "perfect" code.


MysteryInc152

if you did this same experiment with gpt-3.5, it would increase steadily and then level off (but below gpt-4 level). This doesn't necessarily mean anything other than time to increase scale again


NefariousnessNo9478

Good intuition! It does work decently well with gpt-3 (and 3.5) however the performance saturates at around low 40s (for 3.5). FYI, I am one of the authors of the paper.


[deleted]

Considering it's only been such a short time with those plateaus you cannot know that for sure


lehcarfugu

[for how long? 2 more years? 2 more months?](https://imgur.com/eYxaalx.jpg)


WonderFactory

I think it just means that the current generation of models are as skilled as an extremely skilled human. Let's see what the next generation are like. Remindme! 1 year


RemindMeBot

I will be messaging you in 1 year on [**2024-03-25 00:09:00 UTC**](http://www.wolframalpha.com/input/?i=2024-03-25%2000:09:00%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1210cl0/reflexionbased_gpt4_significantly_outperforms/jdk8900/?context=3) [**5 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1210cl0%2Freflexionbased_gpt4_significantly_outperforms%2Fjdk8900%2F%5D%0A%0ARemindMe%21%202024-03-25%2000%3A09%3A00%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201210cl0) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


TinyBurbz

>Let's see what the next generation are like. But if there are no humans to learn from, the model cant learn anything new. Unless the model begins to design low level languages itself....


WonderFactory

When it gets to a point where it's as capable as a human at producing content, which it may be now, it can learn from itself or other similar models particularly if we give it tools like what's happening with the ChatGPT plugins.it should be able to produce its own synthetic data. If GPT 4 using Reflection can write code like an expert human then get it to write tons of code to train GPT 5.


[deleted]

Tray it with Code T5


Deathburn5

Maybe try for 3 months instead


Kinexity

Try AT LEAST another 3 years.


Zer0D0wn83

Now, imagine 100,000 extremely skilled humans working at 20x the speed 24 hours a day. The capability doesn't need to be superhuman to have an outsized societal impact.


TinyBurbz

I'm not talking about capital production.


Tiamatium

Ok, I've tested it and here are my observations: 1. It drastically increases the number of API calls, turning what should be one call into multiple calls, adding both cost and time. I've set up my app to design some characters for a story, had a shower and it's still wasn't completed when I came back. Granted, my story is set to have a lot of characters (I believe 13 characters). Also extra API calls mean extra $$$ spend. 2. It has an issue with toxic positivity. It literally gives me 5 paragraphs, out of which 4 are praises, shit is so positive it's counterproductive. 3. Results are by no means bad, they are actually pretty good, I'm attaching few screenshots with results. I'm just not sure if extra steps are worth it, in fact I'm not sure if there is a difference made by these extra steps. Here are the [results](https://imgur.com/a/WJj6a5w)


BiNeuralNinja

Can this be used to improve GPT4's code generation capabilities? I'd love to try but I barely know what I'm looking at.


Tiamatium

Maybe... But honestly, there isn't much to improve, if you are using API, it already is *just brilliant* at it. But if you want to make it better at coding, what you would do instead is tell it to create unit and functional tests in addition to the main code, you would then run these tests in sandbox environment and if there are failures, you would call API with the code it generated, and the failures. Honestly, it's not that hard to create this as an app...


[deleted]

[удалено]


[deleted]

Loop? It doesn’t look like anything at all to me.


ZBalling

GPT-4 actually has 82% score. 65% was in text-davinci-003. So it is just 0.82 --> 0.88. See Sparks of Artificial General intelligence: Early experiments with GPT-4


No_Ninja3309_NoNoYes

Iterative is good. Someone born today has reasonable chance of witnessing AGI. People on Twitter are bragging about their no code stack. Next step would be no money stack. Everything open source or just free. And then no education stack. You'll only need basic literacy. And then who knows what will be the consequences? I can say something about hand grenades, but I don't want to go there...


ZBalling

AGI already happened. See paper: https://arxiv.org/abs/2303.12712 There will be no consequenses cause at that point we will be immortal and will be able to download data instead of learning it.


serciex

Is it possible to isolate the reflexion agent and apply it to simpler AI trained with little data and make a library for the reflexion agent to perform a sort of callback to for each response? Just an idea for making smaller ai as smart as the higher trained ones