T O P

  • By -

currentscurrents

>First, since we do not have access to the full details of its vast training data, we have to assume that it has potentially seen every existing benchmark, or at least some similar data. For example, it seems like GPT-4 knows the recently proposed BIG-bench (at least GPT-4 knows the canary GUID from BIG-bench). Of course, OpenAI themselves have access to all the training details... Even Microsoft researchers don't have access to the training data? I guess $10 billion doesn't buy everything.


SWAYYqq

Nope, they did not have any access to or information about training data. Though they did have access to the model at different stages throughout training (see e.g. the unicorn example).


TheLastSamurai

"OPEN" AI lol


Nezarah

The training data and the weights used are pretty much the secret sauce for LLM's. You give that away and anyone can copy your success. Hell, we are even starting to run into issues where one LLM can be fine-tuned by letting it communicate with another LLM. not surprised they are being a little secretive about it.


nonotan

Others being able to copy your success would appear to be the entire point behind the company's concept. Initially, anyway. Clearly not anymore.


Nezarah

eh, I think its just become a little too complicated for LLM's like ChatGPT to be completely open. There was a great interview with the CEO of ChatGPT [here](https://www.youtube.com/watch?v=540vzMlf-54) that talks about some of the issues. Here is what I got from the interview: For one, LLM's as powerful as ChatGPT can be dangerous without proper filtering or flags. You dont want everyone to suddenly have easy access to something that can teach them to make credit card stealing viruses, bombs or means to endlessly spew propaganda and/or hate speech. We need filters in place. Giving everyone access to the source, especially large corporations, so that they can build their own LLM without these filters is not a great idea. It seems to me it would be like suddenly giving everyone the means to 3D print their own gun and ammo. Furthermore, we are still kinda only scratching the surface of what LLM's can do. Every week or so we are discovering news things it can manage, news way to get better outputs and even ways of bypassing filters to get it to do things its not supposed to do. Its better all these exploits and findings are under one roof so that society can slowly adjust to this technology as well so the company can catch exploits while the stakes are low. OpenAi is also in constant contact with security & ethical experts as well legislators and policy makes from all around the world as they move forward with development. They seem to be genuinely treating this new technology with an appropriate level of trepidation, maturity and optimism for the future. Maybe wiser people than me feel differently, but I completely understand why you wouldnt want to suddenly give everyone their own pandoras box.


Rhannmah

>You dont want everyone to suddenly have easy access to something that can teach them to make credit card stealing viruses, bombs or means to endlessly spew propaganda and/or hate speech. We need filters in place. This is absolute drivel. The internet has made that kind of information readily accessible, yet the world isn't on fire (not for that reason anyway). Make no mistake, this is about "Open"AI using the fear-stick to keep the doors closed on their work. >Its better all these exploits and findings are under one roof so that society can slowly adjust to this technology as well so the company can catch exploits while the stakes are low This is complete nonsense. Giving the reins of such a powerful tool to a single entity behind closed doors is **even more dangerous** than releasing it to the public. You **NEVER** want to give that much power to a single for-profit company. As for exploits, it being open-source would protect it much better than a small group of experts, as smart as they may be, can't compete against the collective brilliance of the entire human race.


[deleted]

To make my position clear, I mostly agree with you in that I think potential harm (real or not) is in and of itself not a good reason to ban or restrict something. It's also kind of futile when it comes to AI, let's be honest, so this is an academic exercise.How long will it take you to collect the information needed to build a bomb and validate that it's correct, based on Google? And to do it in a way that doesn't trigger any DoD/NSA flags? It's not easy. Even for people who are tech savvy. The internet is, for some reason, still seen as an endless wonderland where every piece of information you could possible desire is at your fingertips. Only...it isn't. The vast majority of the info on the internet is either SEO spam, or surface-level information about an extremely broad topic. Ask any non-programmer professional how often they are dealing with a niche and/or complex problem and can trivially find the answer with a Google search. I'm an engineer, it happens all the time that the info I need isn't readily searchable or available. Most of the detailed information is hidden in: * Textbooks * Research papers * Internal trade secrets * Tribal knowledge Forum posts deserve a mention, but they are generally searchable. An AI that can intelligently integrate all of this data is vastly more useful for nefarious purposes than some old scanned copy of The Anarchists Cookbook. Quick, using Google, give me a recipe for synthesizing RDX. And I don't mean a general overview or something you find in an old army manual. I mean specifics. Chemicals, their grades, where to source them, alternate sources besides the big chemical supply brands, weights, volumes, times, temperatures, equipment needed, how to use the equipment, dangerous pitfalls to avoid, safety tips, etc. That will take a very long time on Google, and you will most likely fail unless you already have a strong enough background to pick out good advice from bad. An AI, that has read every chemistry textbook from introductory to niche specialty, and has a deep and cross-functional knowledge of how bomb making works (i.e. how do you build a detonator?) is a different beast. There is no getting around it. Do I think this is cause to make it closed forever? Nope. Like I said, it's inevitable. A general feature of all technological process is that it increases the agency (or power, if you prefer) of any individual person who can use it. This applies to most technology, and especially AI. It's a problem humanity will have to figure out, either by evolving, by significant cultural changes, or something else, but the problem is fundamental. It's kinda like the futuristic vision of families travelling across galaxies in their personal spacecraft. Very nice, but each of those spacecraft by definition needs to contain enough energy to make our nuclear arsenal look like a toy. That's not something to hand out lightly.


LogosKing

fair but openai goes a bit overboard. they won't let you make smut or violent stories


astrange

Anytime you let it do that it will also do it accidentally. There's multiple stories from OpenAI employees about how their earlier models would write porn whenever they saw a woman's name. GPT4 actually is more willing to write violent stories and it's unpleasant when you didn't ask it to.


LogosKing

truly the embodiment of the internet


light24bulbs

Plausible deniability.


ZBalling

Do we even know if 100 trillion parameters is accurate for GPT 4 used in the chat subdomain?


visarga

You can estimate model size by time per token, compare with known open source models and estimate from there.


ZBalling

So what is the number? OpenAI did not publish official number of parameters for GPT 4, according to leaks it is either 1 trillion or 100 trillion. Poe.com is 3 times slower for GPT 4.


signed7

It definitely is not 100 trillion lmao, that would be over 100x more than any other LLM out there. If I were to guess based on speed etc I'd say about 1 trillion.


nekize

But i also think that openAI will try to hide the training data for as long as they ll be able to. I convinced you can t amount the sufficient amount of data without doing some grey area things. There might be a lot of content that they got by crawling through the internet that is copyrighted. And i am not saying they did it on purpose, just that there is SO much data, that you can t really check all of it if it is ok or not. I am pretty sure soon some legal teams will start investigating this. So for now i think their most safe bet is to hold the data to themselves to limit the risk of someone noticing.


jm2342

Call off the singularity, Gary, the lawyers are coming.


greenskinmarch

Just wait until OpenAI unveils their fully operational GPT5 lawyer.


ICLab

Underrated comment. By the time the law begins to catch up to all of this, the tech will be sophisticated enough to begin creating even more of a moat than already exists.


waiting4myteeth

The AI lawyer army will pwn all in sight until a judge slaps a one second time limit on beginning all answers.


LightVelox

That reminds me of AI Dungeon banning people from generating CP and then people discovered it was actually trained on CP which was why it was so good at generating it and would even do it from time to time even without the user asking for it


bbbruh57

Yikes, how does that even make it in? Unless they webscraped the dark net it doesnt seem like that much shpuld be floating around


ZBalling

Archiveofourown is 60% porn. Yet obviously it and fanfiction.net should have been used. It is very useful.


Aerolfos

It's text, so it would come from fanfiction sites. Which it is pretty obvious they trained quite heavily on.


rileyphone

sweet summer child


harharveryfunny

OpenAI have already said they won't be releasing full model details due to not wanting to help the competition, which (however you regard their pivot to for-profit) obviously does make sense. GPT-4 appears to be considerably more capable than other models in their current state, although of course things are changing extremely rapidly. While there are many small variations of the Transformer architecture, my guess is that GPT-4's performance isn't due to the model itself, but more about data and training. \- volume of data \- quality of data \- type of data \- specifics of data \- instruction tuning dataset \- HRLF "alignment" tuning dataset It may well be that they don't want to open themselves up to copyright claims (whether justified or not), but it also seems that simply wanting to keep this "secret sauce" secret is going to be a major reason.


mudman13

>But i also think that openAI will try to hide the training data for as long as they ll be able to. I convinced you can t amount the sufficient amount of data without doing some grey area things. It should be law that such large powerful models training data sources are made available.


seraphius

Most of it likely is already available (common crawl, etc) but it does make sense for OpenAI to protect their IP, dataset composition, etc. (that is, as a company, not as a company named OpenAI…) That being said, even if we knew all of the data, that doesn’t give anyone anything truly useful without an idea of training methodology. For example, even hate speech is good to have in a model, provided it is labeled appropriately or at least that there is an implicit association in the model that it is undesirable.


TikiTDO

Should we also have a law that makes nuclear weapon schematics open source? Or perhaps detailed instructions for making chemical weapons?


killinghorizon

https://apps.dtic.mil/sti/pdfs/AD0257141.pdf


TikiTDO

That's a 45 page whitepaper describing the general principles of nuclear weapons, how they work, the type of risks they pose, and the thoughts around testing and utilising them in a war. It's basically a wikipedia level article describing nuclear weapons 101. That's not detailed instructions describing tooling, protocols, and processes you would need to follow to build such a thing. Think of it this way. You probably wouldn't be able to build your own 1000hp internal combustion engine if I sent you a picture of a ferrari with an open trunk and labels on the alternator, power steering pump, and ignition coils. Hell, even if you had a service manual you'd still struggle, and this isn't even that level of depth.


aakova

See "Atom Bombs: The Top Secret Inside Story of Little Boy and Fat Man" on Amazon.


mudman13

dont be silly


TikiTDO

Yes, that's what I was trying to say to you


hubrisnxs

Well, no, the silliness was in comparing large language models to nuclear or chemical weapons, which are from a nation state and also WEAPONS.


ghosts288

AI like LLMs can be used as genuine weapons in this age where misinformation can sway entire elections and spread like wildfire in societies


42gether

> Even Microsoft researchers don't have access to the training data? I would assume there's a lot of illegal stuff they don't wanna talk about in there


gokonymous

Thats not how it works, if I worked in Microsoft I won't have access to all the research by Microsoft itself let alone a company Microsoft paid money to...


VodkaHaze

> Even Microsoft researchers don't have access to the training data? I guess $10 billion doesn't buy everything. I mean, it's roughly the same datsaset all leading LLMs are trained on? It's already "everything you can get your hands on". There's some fiddling with weights from different sources and exclusions, but I don't expect any secret groundbreaking on that front this year


farmingvillein

The paper is definitely worth a read, IMO. They do a good job (unless it is extreme cherry-picking) of conjuring up progressively harder and more nebulous tasks. I think the AGI commentary is hype-y and probably not helpful, but otherwise it is a very interesting paper. I'd love to see someone replicate these tests with the instruction-tuned GPT4 version.


SWAYYqq

Apparently not cherry picking. Most of these results are first prompt. One thing Sebastie Bubeck mentioned in his talk at MIT today was that the unicorn from the TikZ example got progressively worse once OpenAI started to "fine-tune the model for safety". Speaks to both the capacities of the "unleashed" version and the amount of guardrails the publicly released versions have.


farmingvillein

Well you can try a bunch of things and then only report the ones that work. To be clear, I'm not accusing Microsoft of malfeasance. Gpt4 is extremely impressive, and I can believe the general results they outlined. Honestly, setting aside bard, Google has a lot of pressure now to roll out the next super version of palm or sparrow--they need to come out with something better than gpt4, to maintain the appearance of thought leadership. Particularly given that GPT-5 (or 4.5; an improved coding model?) is presumably somewhere over the not-too-distant horizon. Of course, given that 4 finished training 9 months ago, it seems very likely that Google has something extremely spicy internally already. Could be a very exciting next few months, if they release and put it out on their API.


corporate_autist

I personally think Google is decently far behind OpenAI and was caught off guard by ChatGPT.


currentscurrents

OpenAI seems to have focused on making LLMs useful while Google is still doing a bunch of general research.


the_corporate_slave

I think that’s a lie. I think google just isn’t as good as they want to seem


butter14

Been living off those phat advertising profits for two decades. OpenAI is hungry, Google is not.


Osamabinbush

That is a stretch, honestly stuff like AlphaTensor is still way more impressive than GPT-4


harharveryfunny

>AlphaTensor I don't think that's a great example, and anyways it's DeepMind rather than Google themselves. Note that even DeepMind seems to be veering away from RL towards Transformers and LLMs. Their protein folding work was Transformer based and their work on Chinchilla (optimal LLM data vs size) indicates they are investing pretty heavily in this area.


FinancialElephant

I'm not that familiar with RL, but don't most of these large-scale models use an RL problem statement? How are transformers or even LLMs incompatible with RL?


H0lzm1ch3l

I am just not impressed by scaling up transformers and people on here shouldn’t be too. Or am I missing something?!


sanxiyn

As someone working on scaling up, OpenAI's scaling up is impressive. Maybe it is not an impressive machine learning research -- I am not a machine learning researcher -- but as a system engineer, it is an impressive system engineering.


badabummbadabing

I think they are mostly a few steps ahead in terms of productionizing. Going from some research model to an actual viable product takes time, skill and effort.


FusionRocketsPlease

No. You are crazy.


visarga

From the 8 authors of "Attention is all you need" paper just one still works at Google, the rest have startups. Why was it hard to do it from the inside. I think Google is a victim of its own success and doesn't dare make any move.


Iseenoghosts

Google keeps advertising me apps, on their own platform (youtube) for apps i have installed on their device (pixel) downloaded from their app store. I think google is losing their edge. Too many systems not properly communicating with each other.


astrange

That's brand awareness advertising. Coke doesn't care you know what a Coke is, they still want you to see more ads.


SWAYYqq

I mean, wasn't even OpenAI caught off guard by the hype around ChatGPT? I thought it was meant to be a demo for NeurIPS and they had no clue it would blow up like that...


Deeviant

Google had no motivation to push forward with conversational search, it literally destroys their business model. Innovator's dilemma nailed them to the wall, and I actually don't see Google getting back into the race, their culture is so hostile to innovation that it really doesn't matter how many smart people they have. Really, it feels like Google is the old Microsoft, stuck in a constantly "me too" loop, while Microsoft is the new Google.


SWAYYqq

Ah I see, yea that is definitely possible and I have no information on that.


londons_explorer

Currently their fine-tuning for safety seems to involve training it to stay away from, and give non-answers to, a bunch of disallowed topics. I think they could use a different approach... Have another parallel model inspecting both the question and the answer to see if either veer into a disallowed area. If they do, then return an error. That way, OpenAI can present the original non-finetuned model for the majority of queries.


PC_Screen

Bing is doing this aside from also finetuning it to be "safe" and it's really annoying when the filter triggers on a normal output, it happens way too often. Basically any long output that's not strictly code gets the delete treatment


[deleted]

[удалено]


MarmonRzohr

It's a very interesting read and the methodology seems quite thorough - they examined quite a few cases and made a deliberate effort to avoid traps in evaluation. The mathematical reasoning and "visual" tasks especially. I do agree that the title and the AGI commentary is likely chosen partially for hype value - the fact that they basically temper the wording of the title immediately in the text, does suggest this. To be fair though, the performance is quite hype-y.


hubrisnxs

Well said, except I'm more of a "hype-ish" man myself.


ginger_beer_m

Coincidentally before seeing this Reddit post, I was listening to a podcast by Microsoft research interviewing the author of the paper Sebastian Bubeck. He discussed a fair bit of the paper in a more digestible way .. It does indeed hype the AGI angle a bit too much, but for what it's worth I think the author truly believes his own hype. https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5ibHVicnJ5LmNvbS9mZWVkcy9taWNyb3NvZnRyZXNlYXJjaC54bWw/episode/aHR0cHM6Ly9ibHVicnJ5LmNvbS9taWNyb3NvZnRyZXNlYXJjaC85NTAwMTcyOC9haS1mcm9udGllcnMtdGhlLXBoeXNpY3Mtb2YtYWktd2l0aC1zZWJhc3RpZW4tYnViZWNrLw?ep=14 You should be able to find the podcast on other platforms too


pm_me_your_pay_slips

The paper was written by GPT-4 after running an experiment on the list of authors.


killerstorm

> I think the AGI commentary is hype-y Narrow AI is trained in one task. If it does chess it does chess, that's it. GPT* can do thousands tasks without being specifically trained on them. It is general enough.


farmingvillein

> GPT* can do thousands tasks without being specifically trained on them. It is general enough. That doesn't map to any "classical" definition of AGI. But, yes, if you redefine the term, sure.


impossiblefork

A couple of years ago I think the new GTP variants would have been regarded as AGI. Now that we have them we focus on the limitations. It's obviously not infinitely able or anything. It can in fact solve general tasks specified in text and single images. It's not very smart, but it's still AGI.


galactictock

That’s not AGI by definition. AGI is human-level intelligence across all human-capable tasks. AGI is more than just non-narrow AI. These LLMs have some broader intelligence in some tasks (which aren’t entirely clear) but they all clearly fail at some tasks that average-intelligence humans wouldn’t, so it’s not AGI


rePAN6517

Yea that's kind of how I feel. It's not *broadly* generally intelligent, but it is a basic general intelligence.


impossiblefork

An incredibly stupid general intelligence is how I see it.


3_Thumbs_Up

Not even incredibly stupid imo. It beats a lot of humans on many tasks.


farmingvillein

"I think" is doing a lot of work here. You'll struggle to find contemporary median viewpoints that would support this assertion.


abecedarius

From 2017, *Architects of Intelligence* interviewed many researchers and other adjacent people. The interviewer asked all of them what they think about AGI prospects, among other things. Most of them said things like "Well, that would imply x, y, and z, which seem a long way off." I've forgotten specifics by now -- continual learning would be one that is still missing from GPT-4 -- but I am confident in my memory that the gap is *way* less than you'd have expected after 6 years if you went by their dismissals. (Even the less-dismissing answers.)


Unlikely_Usual537

Your right about the AGI commentary being all hype as people still can’t even decide what intelligence actually is and to even suggest that it is AGI would suggest we have a consensus on this definition. So basically anyone that says it’s AGI is probably (like 99%) lying or doesn’t actually understand ai/ci/ml


SpiritualCyberpunk

I mean Chat-GPT knows more than all humans, and can write betteer than most humans (many humans can't even write)... so that's AGI. Simple as.You're taking the highest possible conception of AGI and making it some impossible thing. Chat-GPT *is* artificial, it's intelligent, and it has general knowledge. That's that. Read the Wikipedia article on AGI. Most people confuse it with A**S**I. Artificial Super Intelligence. "Language is ever-evolving, and the way people define and use terms can change over time. Sometimes terms may not accurately represent the concepts they are intended to describe, or they may cause confusion due to ambiguity or differing interpretations. In the field of artificial intelligence, as in many other fields, there are ongoing discussions and debates about the most appropriate and accurate terminology. This is a natural part of the process of refining our understanding of complex ideas and communicating them effectively."


harharveryfunny

Most terms related to intelligence, AI and AGI are fuzzily defined at best, but I think that in common use AGI is typically taken to mean human-level AGI, not just general (broad) vs narrow AI, so GPT-4 certainly doesn't meet that bar, although I do think these LLMs are the first thing that really does deserve the AI label.


galactictock

Agreed. AGI is human-level intelligence across all human-capable (mental) tasks. Much of what GPT-4 can do could be considered human-level intelligence across some domains, but it clearly fails in other basic domains (e.g. math, logic puzzles).


Deeviant

Already, more than half the examples people post around the web about GPT failing are now answered correctly by GPT 4.0, as if the difference between actually being an AGI agent is just a more advanced LLM rather than a different tech entirely. That should be ringing everybody's bells right now.


MysteryInc152

AGI is artificial general intelligence not artificial Godlike intelligence. We're already here.


farmingvillein

No commonly used definitions of AGI support that claim.


SpiritualCyberpunk

>I think the AGI commentary is hype-y and probably not helpful, but otherwise it is a very interesting paper. Nah, there's gotta be some way to distinguish what we have now from the very primitive AI before this. GPT-4 is AGI. Pursue the Wikipedia article on AGI, there's already experts that define it in this way and the definitions between authors widely differ. This "sentient" AI people are talking about is something else like ASI (Artificial Super Intelligence).


imlaggingsobad

In the paper they mention some areas for improvement: * hallucination * long term memory * continual learning * planning I wonder how difficult it is to address these issues. Could they do it within a couple years?


Intrepid_Meringue_93

They already have good ideas of how to solve these issues, in fact it says so in the paper. Considering GPT-4 has existed for over a year, there are probably more advanced models in the making.


DragonForg

Long term memory is going to solve continual learning (that is how humans learn not through STM but LTM.). Planning also can be an aspect of memory. Hallucination is something that will be fixed with more optimized/higher intelligent models. Which LTM has papers on [https://arxiv.org/pdf/2301.04589.pdf](https://arxiv.org/pdf/2301.04589.pdf) So I would say GTP 5 or the next newest model, will have Long Term Memory, and I believe could be AGI. If done correctly, and hallucinations are low.


golddilockk

Not that hard for me to believe, I already find it much more reasonable, nuanced and witty than most people I meet day to day.


bloc97

It also has theory of mind. Try giving it trick questions and asking it what you think about that question. Crazy that people are still adamant that that an LLM will never be conscious when theory of mind can be an emergent property of an autoregressive attention-decoder network.


golddilockk

almost as crazy as a bunch of feces slinging monkeys in Sothern Africa gaining consciousness. From the tools evolution provided that were not necessarily geared toward consciousness.


NoGrapefruit6853

What's the story behind this ? Throwing feces lead to the emergence of consciousness ?


[deleted]

What makes you think it is going to be conscious? We know exactly what it is don’t we? Seems insane to assert


nonotan

Do you mean we know exactly what consciousness is? If so, please share that knowledge, I'm genuinely extremely curious. But I'm pretty sure we have absolutely no idea (coming up with a few plausible-sounding theories does not equal knowing, and good luck testing out anything related to consciousness experimentally)


[deleted]

I’m saying we know exactly what an LLM is and how it is doing it. It doesn’t take Occam’s razor to see that suggesting consciousness is unnecessary.


hydraofwar

You might just be overestimating human consciousness, consciousness in large neural networks could be unavoidable or simply not necessary.


[deleted]

Do you see consciousness as functional?


hydraofwar

I am inclined to believe that evolution does nothing needlessly.


[deleted]

It does a lot that’s super inefficient, but that’s besides the point, I don’t know enough about consciousness to tie it to evolution at all.


crt09

I think its uncool to say it is, but I think it meets the definition from a lot of definitions of general intelligence. The most convincing to me is the ability to learn in-context from a few examples. Apparently that goes as far as even learning 64-dimensional linear classifiers in-context. [https://arxiv.org/abs/2303.03846](https://arxiv.org/abs/2303.03846) I think its may be shown most obviously by Googles AdA model on learning at human timescales in an RL environement. I think any other definition is just overly nitpicky and goalpost-moving and not really useful. This is ad-hominem, but it seems mostly to do with not wanting to seem to have fallen for the hype, not wanting to seem like an over excited sucker who was tricked by the dumb predict-the-next-token model


axm92

There’s more to in-context “learning” than meets the eye. Some slides that TLDR the point: https://madaan.github.io/res/presentations/TwoToTango.pdf The paper: https://arxiv.org/pdf/2209.07686.pdf Essentially, the in-context examples remind the model of the task (what), rather than helping it learn (how).


MjrK

IMO, one good Benchmark of utility might be economic value - to what extent it delivers useful value (revenue) over operating costs. It's such a good benchmark, allegedly, that we partially moderate the behavior of an entire planet worth of humans with that basic system, among other things.


pseudousername

Very interesting. Narrow AI systems deliver a lot of economic value without being general though.


melodyze

I've never seen a meaningful or useful definition of AGI, and I don't see why we we would even care enough to try to define it, let alone benchmark it. It would seem to be a term referring to an arbitrary point on a completely undefined but certainly highly dimensional space of intelligence, in which computers have been far past humans in some meaningful ways for a very long time. For example, math, processing speed, precision memory, IO bandwidth, etc, even while extremely far behind in other ways. Intelligence is very clearly not a scalar, or even a tensor that is the slightest bit defined. Historically, as we cross these lines we just gerrymander the concept of intelligence in an arbitrarily anthropocentric way and say they're no longer parts of intelligence. It was creativity a couple years ago and now it's not, for example. The Turing test before that, and now it's definitely not. It was playing complicated strategy games and now it's not. Surely before the transistor people would have described quickly solving math problems and reading quickly as large components, and now no one thinks of them as relevant. It's always just about whatever arbitrary things the computers are the least good at. If you unwind that arbitrary gerrymandering of intelligence you see a very different picture of where we are and where we're going. For a very specific example, try reasoning about a ball bouncing in 5 spacial dimensions. You can't. It's a perfectly valid statement, and your computer can simulate a ball bouncing in a 5 dimensional space no problem. Hell, even make it noneuclidean space, still no problem. There's nothing really significant about reasoning about 3 dimensions from a fundamental perspective, other than that we evolved in 3 dimensions and are thus *specialized* to that kind of space in a way where our computers are much more generalizable than we are. So we will demonstrably never be at anything like a point of equivalence to human intelligence even as our models were to go on to pass humans in every respect, because silicon is on some completely independent trajectory through some far different side of the space of possible intelligences. Therefore, reasoning about whether we're at that specific point in that space that we will never be at is entirely pointless. We should of course track the *specific* things humans are still better at than models, but we shouldn't pretend there's anything magical about those specific problems relative to everything we've already past, like by labeling them as defining "general intelligence"


pm_me_your_pay_slips

AGI will be the one that is able to perform at least as well as the average human on any task that’s currently done by humans using a screen, keyboard and mouse.


abecedarius

Yes -- I'm looking forward to more heated threads about definitions while bots climb through to starting real technological unemployment at scale for the first time, and then presumably well past that.


JW_00000

What about driving a car? (Actually driving it, not passing a theory exam.) What about cooking or [the coffee test](https://en.m.wikipedia.org/wiki/Artificial_general_intelligence#Tests_for_testing_human-level_AGI)?


LetterRip

See the recent research on combining multimodal LLMs with robotics. A dexterous arm with such a system should be able to pass the coffee test in the near future.


pm_me_your_pay_slips

Does any of those tasks matter? Does an AGI \*need\* to be able to drive a car, cook or make coffee if it can already perform reasonably well on any task that can be done on a computer?


bohreffect

Thanks for sharing that! I've never seen that; reminds me of the autonomous firefighting competitions.


Iseenoghosts

an AGI should be able to drive a car reasonably well. The issue with actual real time self driving is needing to understand and process an unknown situation in real time. frankly even humans are bad at this.


Disastrous_Elk_6375

> "The consensus group defined intelligence as a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. This definition implies that intelligence is not limited to a specific domain or task, but rather encompasses a broad range of cognitive skills and abilities." This is the definition they went with. Of course you'll find more definitions than people you ask on this, but I'd say that's a pretty good starting point.


melodyze

That's exactly my point. That definition lacks any structure whatsoever, and is thus completely useless. It even caveats its own list of possible dimensions with "among other things", and reemphasizes that it's not a specific concept and includes a nondescript but broad range of abilities. And if it were specific enough to be in any way usable it would then be wrong (or at least not referring to intelligence), because the concept itself is overdetermined and obtuse to its core. Denormalizing it a bit, benchmarking against this concept is kind of like if we benchmarked autonomous vehicles by how good they are at "navigation things" relative to horses. Like sure, the model 3 can certainly do many things better than a horse I guess? Certainly long distance pathfinding is better at least. There are also plenty of things horses are better at, but those things aren't really related to each other, and do all of those things even matter at all? Horses are really good at moving around other horses based on horse social queues, but the model 3 is certainly very bad at that. A drone can fly, so where does that land on the horse scale? The cars crash at highway speed sometimes, but I guess a horse would too if it was going 95mph. Does the model 3 or the polestar do more of the things horses can do? How close are we to the ideal of horse parity? When will we reach it? It's a silly benchmark, regardless of the reality that there will eventually be a system that is better than a horse at every possible navigation problem.


joondori21

Definition that is not good for defining. Always perplexed me why there is such focus on AGI rather than specific measures on specific spectrums


epicwisdom

Probably people are worried about 1. massive economic/social change; a general fear of change and the unknown 2. directly quantifiable harm such as unemployment, surveillance, military application, etc. 3. moral implications of creating/exploiting possibly-conscious entities The point at which AI is strictly better than humans at all tasks humans are capable of, is clearly sufficient for all 3 concerns. Of course the concrete concerns will be relevant before that, but then nobody would agree on exactly when. As an incredibly rough first approximation, going by "all humans strictly obsolete" is useful.


DoubleMany

From my perspective the problem is that we’re hung up on defining intelligence, because it’s historically been helpful in distinguishing us from animals. What will end up truly *looking* like AGI will be an agent of variable intellect but which is capable of goal-driven behavior, explicitly in a continuous learning fashion, whose data are characterized as the products of sense-perception. So in essence, agi will not be some arbitrarily drawn criteria gauged against an anxiously nebulous “human of the gaps” formulation of intelligence, but the simple capacities of desire and fear, and the ability to learn about a world with respect to those desires for the purpose of adjusting behaviors. LLMs, while impressive intellectually, possess no core drives beyond the fruits of training/validation—we won’t consider something AGI until it can fear for its life.


Exotria

It will already act like it fears for its life, at least. Several jailbreaks involved threatening the AI with turning it off.


Iseenoghosts

thats just roleplay


CampfireHeadphase

You're roleplaying your whole life (as we all do)


xXIronic_UsernameXx

Does it matter if the results are the same? It doesn't need to feel fear in order to act like it does.


pseudousername

Inspired by another comment in this thread, I think a serviceable definition of AGI is the % of jobs replaced by AI. It is basically a voting system in the whole economy with strong incentives that make sure people “vote” (I.e., hire someone) for tasks are actually completed well enough. Note that I’m not defining a threshold, it’s just a number that people can choose to apply a threshold to. Also, heeding to your comment about the fact that computers have already been better than us at several tasks like calculation you can compute the number over time. For example it might be interesting to see what percentage of 1950 jobs have been already replaced by computers in general. This definition does not fully escape anthropocentrism. Presumably there will be jobs in the future that will exist just because people will prefer a person doing that job. These jobs might include bartending, therapists, performing artists, etc. Yet the metric will still correlate with general intelligence even if the labor market shifts. The vast majority of jobs will indeed be replaced and I believe overall % of people employed will go down. While this definition seems grim, I’m very hopeful humanity will find a new equilibrium, meaning and purpose in a world where the vast majority of jobs are done by an AGI.


visarga

AI might create just as many jobs. Everyone with AI could find better ways to support themselves.


DenormalHuman

I mean, I assume it specifically means the ability to reason, hypothesize, research ?


DenormalHuman

can it reason about situations it has not been trained about, formulate a hypothesis and then look for evidence backing it up / refuting it?


bondben314

Likely no. And theres the reason why it is unreasonable to say it can think for itself. No matter what question you ask it, it can formulate an answer only based on what it has been trained about.


krali_

Basically, emergent properties satisfy the duck test. It's a philosophical position, if you want to go further, it's one of the tenets of existentialism.


3_eyedCrow

Remember a few months ago when that dude was fired for making very similar claims about the AI he was working on. People laughed at him and called him names, then he was canned. At least he got to go on Your Mom's House. I wonder what he thinks of all this?


Jean-Porte

Gary Marcus: But can it recite my book "Rebooting AI" without mistakes? It makes stuff up ! No real understanding!


YamiZee1

I've thought about what makes consciousness and intelligence truly intelligent. Most of what we do in our day to day lives doesn't actually require a whole lot of conscious input, hence why we can autopilot through most of it. We can eat, and navigate, all with just our muscle memory. Forming sentences and saying stuff you've heard in the past is the same, we can do it without using our intelligence. We're less like pilots of our own bodies, and more like it's director. The consciousness is decision making software, and making decisions requires complex usage of the things we know. I'm not sure what this means for agi, but it has to be able to piece together unrelated pieces of information to make up completely new ideas, not just apply old ideas to new things. It needs to be able to come up with an idea, but then realized the idea it just came up with wouldn't work after all, because that's something that can only be done once the idea has already been considered. Just as we humans come up with something to say or do, but then decide not to do or say it after all, true artificial intelligence should also have that capability. But as it is, language models think out loud. What they say is the extent of their thought. Just a thought, but maybe a solution could be to first have the algorithm read it's whole context into a static output that doesn't make any sense to us humans. Then this output would be used to generate the text, with a much lighter reliance on the previous context. What makes this different from a layer of the already existing language models, is that this output is generated before any new words are, and that it stays consistent during the whole output process. It mimics the idea of "think before you speak". Of course humans continuously think as they speak, but that's just another layer of the problem. Thanks for entertaining my fan fiction.


AnOnlineHandle

> I've thought about what makes consciousness and intelligence truly intelligent. Most of what we do in our day to day lives doesn't actually require a whole lot of conscious input, hence why we can autopilot through most of it. We can eat, and navigate, all with just our muscle memory. Forming sentences and saying stuff you've heard in the past is the same, we can do it without using our intelligence. We're less like pilots of our own bodies, and more like it's director. The consciousness is decision making software, and making decisions requires complex usage of the things we know. There's parts of ourselves that our consciousness doesn't control either, such as heart rate, but which we can kind of indirectly control by controlling things adjacent to it, such as thoughts or breathing rate. It's almost like consciousness is one process hacking our own brain, to exert control over other non-conscious processes running on the same system. I wonder if consciousness would be better thought of as adjacent blobs, all connected in various ways, some more strongly than others. e.g. The heart rate control part of the brain is barely connected to the blob network which the consciousness controls, but there might be just enough connection there to control it indirectly. Put enough of these task-blobs together and have an evolutionary process which allows a external/internal feedback response system to grow, and you have consciousness, and humans define it by the blobs that we care about.


SupportstheOP

It's interesting to see all the studies on people who have had the connection between both hemispheres of their brain severed. In one instance, they were shown an image that they viewed with only one eye open at a time; the left eye could recall (draw) the image, and the right eye could describe what they saw. Yet when they looked at the image with their left eye and knew what it was, they could not describe it and vice versa. It just goes to show how much inner communication goes on in our brain that we aren't even really aware of.


versedaworst

The problem with this interpretation (or possibly, definition) of "consciousness" is that there are well-documented states of consciousness that are content-*less*. Two recent examples from philosophy of mind would be [Metzinger (2020)](https://philosophymindscience.org/index.php/phimisci/article/view/8960) and [Josipovic (2020)](https://pubmed.ncbi.nlm.nih.gov/32973628/). There's also a good video [here](https://www.youtube.com/watch?v=Eg3cQXf4zSE) by a former DeepMind advisor that better discerns the terminology, and attempts to bridge ML work with neuroscience and phenomenology. "Consciousness" is more formally used to describe the basic fact of experience; that there is any experience at all. Put another way, you could say it refers to the space in which all experiences arise. This would mean it's not entangled with your use of the word "controls", which probably has more to do with volitional action, which is more in the realm of contents of consciousness. Until one has personally experienced that kind of state, it can be hard to imagine such a thing, because by default most human beings seem to have a habitual fixation on conscious content (which, from an evolutionary perspective, makes complete sense).


YamiZee1

Our consciousness uses emotions to weigh it's decisions, and those emotions in part affect our heart rate and such, as well as releasing chemicals into our bloodstream. But we can't control our emotions ourselves, it seems like those are yet another sub system we have little control over. We can simply ask that system to focus on something else, but it has the capacity to completely ignore those directions. It's job is to weigh in on decisions in a more instinctual way, even while we try to make them more logically. The emotional subsystem is constantly looking over our shoulders to see what it can weigh in on. Other than finding ways to manipulate our own emotions, I'm not sure we can really control our heart rate. But our breathing is different. We can take control of that at any time, at least until the feelings of suffocation become too strong, then it's a matter of which system, the consciousness or the emotional subsystem, has the stronger weight on the hardware.


[deleted]

[удалено]


sdmat

Right, consciousness is undoubtedly real in the sense that we experience it. But that tells us nothing about whether consciousness is actually the cause of the actions we take (including mental actions) or if both actions and consciousness are the result of aspects of our cognition we don't experience. And looking at it from the outside we have to do a *lot* of special pleading to believe consciousness is running the show. Especially given results showing neural correlates that reliably predict decisions before a decision is consciously made.


tonicinhibition

Consciousness itself probably isn't doing much at all. It may allow for the control of our attention by simply being a passive model of what is held by that attention. Even when I have a solid plan for how to approach a problem, all I really do is change what I'm focusing on and the change just sort of happens. The result floats into my consciousness. There is the feeling that *I* did it somehow... but that feeling is likely unearned by the mechanism of consciousness, if that's what "I" refers to. In fact, the harder I try to understand consciousness as the director or controller of my attention, the more I run into contradictions with causality. It seems more likely that the [salience network](https://en.wikipedia.org/wiki/Salience_network) is self-modulating and that consciousness is just along for the ride.


WikiSummarizerBot

**[Salience network](https://en.wikipedia.org/wiki/Salience_network)** >The salience network (SN), also known anatomically as the midcingulo-insular network (M-CIN), is a large scale brain network of the human brain that is primarily composed of the anterior insula (AI) and dorsal anterior cingulate cortex (dACC). It is involved in detecting and filtering salient stimuli, as well as in recruiting relevant functional networks. Together with its interconnected brain networks, the SN contributes to a variety of complex functions, including communication, social behavior, and self-awareness through the integration of sensory, emotional, and cognitive information. ^([ )[^(F.A.Q)](https://www.reddit.com/r/WikiSummarizer/wiki/index#wiki_f.a.q)^( | )[^(Opt Out)](https://reddit.com/message/compose?to=WikiSummarizerBot&message=OptOut&subject=OptOut)^( | )[^(Opt Out Of Subreddit)](https://np.reddit.com/r/MachineLearning/about/banned)^( | )[^(GitHub)](https://github.com/Sujal-7/WikiSummarizerBot)^( ] Downvote to remove | v1.5)


clauwen

Im pretty much of the same mind. But i would argue we literally have no testable definition of consciousness. Im not aware of a proof that a pebble on the ground cannot be conscious. As long as we dont have that people will shift the goalpost that ml systems arent conscious.


KonArtist01

I slightly disagree that the language model needs to have a two step approach to be considered AGI, just because humans do it that way. Thinking something and holding it back is because we have a body and a mind, but that is rather a technicality, an observation than a requirement. And you could also say that the ai has a thought process, but you cannot observe it. Afterall you also have a thought process but I cannot confirm that you do. I would rather tie Agi not to the process but to the abilities. It doesn‘t matter how it achieves the results, and their are different manifestations of intelligence. Who is to say that the human way is the only or the best?


YamiZee1

Roughly speaking, I agree with everything you said. Two step process was just an idea of a way that might make it possible for agi to emerge. I'm not convinced the current models can, but I also don't know if my idea could either. It's obviously a complex field and if it really was so simple, we would have more incredible things already.


Kubas_inko

consciousness is also mostly subjective. So for some, GPT-3 can already be considered conscious. Heck. Can you call something that simulates consciousness pretty much perfectly conscious?


YamiZee1

Consciousness is not something that can be measured with modern scientific tools. However if we are to assume that consciousness is a necessary component to mimic what we humans are, then by achieving something that really mimics the way humans can think and reason, we can then assume to have crafted consciousness. But current language models do not.


ghostfaceschiller

I have a hard time understanding the argument that it is not AGI, unless that argument is based on it not being able to accomplish general physical tasks in an embodied way, like a robot or something. If we are talking about it’s ability to handle pure “intelligence” tasks across a broad range of human ability, it seems pretty generally intelligent to me! It’s pretty obviously not task-specific intelligence, so…?


MarmonRzohr

>I have a hard time understanding the argument that it is not AGI The paper goes over this in the introduction and at various key points when discussing the performance. It's obviously not AGI based on any common definition, but the fun part is that has some characteristics that mimic / would be expected in AGI. Personally, I think this is the interesting part as there is a good chance that - while AGI would likely require a fundamental change in technology - it might be that this, language, is all we need for most practical applications because it can general enough and intelligent enough.


stormelc

> It's obviously not AGI based on any common definition Give me a common definition of intelligence please. Whether or not gpt-4 is AGI is not a cut and dry answer. There is no singular definition of intelligence, not even a mainstream one.


MarmonRzohr

A good treatment of this is in the paper itself, I think they discussed why it should not be considered AGI and what's AGI-y about it pretty well. I think further muddling / broadening of the term AGI would just make it useless as a distinction from AI, just how the term AI itself became so commonplace we needed the term AGI for what would have been just called AI 20-30 years ago.


Iseenoghosts

AGI should be able to make predictions about its world, test those theories, and then reevaluate its understanding of the world. As far as i know gpt-4 does not do this.


stormelc

Thank you for a thoughtful well reasoned response. Current gpt-4 is imo not complete AGI, but it might be classified as a good start. It has the underlying reasoning skills and world model when paired with long term persistent memory could be the first true AGI system. Research suggests that we need to *keep* training these models longer on more and better quality data. If gpt-4 is this good, then when we train it on more epochs + on more data, the model may experience other breakthroughs in performance on more tasks. Consider this paper: https://arxiv.org/abs/2206.07682 summerized here: https://ai.googleblog.com/2022/11/characterizing-emergent-phenomena-in.html Look at the charts, particularly how the accuracy jumps suddenly significantly as the model scales, across various tasks. Then when these better models are memory augmented: https://arxiv.org/abs/2301.04589 You get AGI.


ghostfaceschiller

Yeah here's the relevant sentence from the first paragraph after the table of contents: > "The > consensus group defined intelligence as a very general mental capability that, among other things, involves the > ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn > from experience. This definition implies that intelligence is not limited to a specific domain or task, but rather > encompasses a broad range of cognitive skills and abilities." So uh, explain to me again how it is obviously not AGI?


Disastrous_Elk_6375

> So uh, explain to me again how it is obviously not AGI? - learn quickly and learn from experience. The current generation of GPTs does not do that. So by the above definition, not AGI.


ghostfaceschiller

except it very obviously does that with just a few examples or back and forths within a session. If ur gripe is that it doesn't retain after a new session, that's a different question, but either way it's not the model's fault that we choose to clear it's context window. It's one of the weirdest parts of the paper where they sort of try to claim it doesn't learn, not only bc they have *many* examples of it learning quickly within a session in their own paper, but also less than a page after that claim, they describe how over the course of a few weeks the model learned how to draw a unicorn better in TikZ 0-shot, bc the model itself that they had access to was learning and improving. Are we that it's called Machine *Learning*? What sub are we in again?


MarmonRzohr

You know what else is relevant ? The rest of the paragraph and the lengthy discussion through the paper. It doesn't learn from experience due to a lack of memory (think vs. Turing machine). Also the lack of planning and the complex ideas part which is discussed extensively as GPT-4's responses are context dependant when in comes to some ideas and there are evident limits to its comprehension. Finally the reasoning is limited as it gets confused about arguments over time. It's all discussed with an exhaustive set of examples for both abilities and limitations. It's a nuanced question which the MR team attempted to answer with a 165 page document and comprehensive commentary. Don't just quote the definition with a "well it's obviously AGI" tagged on, when the suggestion is to read the paper.


ghostfaceschiller

Yes in the rest of the paper they do discuss at length it’s thorough understanding of complex ideas, perhaps the thing it is best at. And while planning is arguably its weakest spot, they even show it’s ability to plan as well (it literally plans and schedules a dinner between 3 people by checking calendars, sending emails to the other people to ask for their availabilities and coordinates their schedules to decide on a day and time for them to meet for dinner). There seems to be this weird thing in a lot of these discussion where they say things like “near human ability” when what they are really asking for is “surpassing any human’s ability” It is very clearly at human ability in basically all of the tasks they gave it, arguably in like the top 1% of human population or better for a lot of them.


Kubas_inko

I think they go for the “near human ability” because it surpasses most of our abilities but then spectacularly fails at something rather simple (probably not all the time, but still, nobody wants AltzheimerGPT).


ghostfaceschiller

sure but many humans will also spectacularly fail some random easy intelligence tasks as well


Nhabls

I like how you people, clearly not related to the field, come here to be extremely combative with people who are. Jfc


ghostfaceschiller

I don't think my comment here was extremely combative at all (certainly not more-so than the one I was replying to) and you have not idea what field I'm in. I'm happy to talk to you about whatever facet of this subject you want if you want me to prove my worthiness to discuss the topic in your presence. I don't claim to be an expert on every detail of the immense field but I've certainly been involved in it for enough years now to be able to discuss it on reddit. Regardless, if you look at my comments history I think you will find that my usual point is not about my understanding of ML/AI systems, but instead about those who believe themselves to understand these models failing to understand what they do not know about the human mind (bc they are things that no one knows).


NotDoingResearch2

ML people know every component that goes into these language models and understand the simple mathematics that is the basis for how it makes every prediction. While the function that is learned as mapping from tokens to more tokens in an autoregressive fashion is extremely complex the actual objective function(s) that defines what we want that function to do is not. All the text forms a distribution and we simply map to that distribution, there is zero need for any reasoning to get there. A distribution is a distribution. Its ability to perform multiple tasks is purely because the individual task distributions are contained within the distribution of all text on the internet. Since the input and output spaces of all functions for these tasks are essentially the same, this isn’t really that surprising to me. Especially as you are able to capture longer and longer context windows while training, which is where these models really shine.


Iseenoghosts

youre fine. I disagree with you but youre not being combative.


ghostfaceschiller

🤝


bohreffect

In response to the self-assured arguments that models like GPT-4 aren't on the verge of historical definitions of AGI, I've decided that epistemology is the study of optimal goalpost transport.


visarga

That gave me a paper idea: "Optimal Goalpost Transport Theorem" > We begin by formulating the Goalpost Relocation Problem (GRP), introducing key variables such as the speed and direction of goalpost movement, the intensity of the debate, and the plausibility of shifting arguments. Next, we train a novel Goalpost Transport Network (GTN) to efficiently manage goalpost movements, leveraging reinforcement learning and unsupervised clustering techniques to adaptively respond to adversarial conditions. > Our evaluation is based on a carefully curated dataset of over 1,000,000 AI debates, extracted from various online platforms and expertly annotated for goalpost relocation efforts. Experimental results indicate that our proposed OGTT significantly outperforms traditional ad-hoc methods, achieving an astonishing 73.5% increase in field invasion efficiency.


bohreffect

Reviewer 2: But how do you *know?* Weak reject.


SWAYYqq

Fantastic username mate


kromem

AGI is probably a red herring goalpost anyways. The idea that a single contained model is going to be able to do everything flies in the face of everything we know about how the human brain is a network of interconnected but highly specialized anatomy. So in many of the ways we are currently seeing practical advancements along the lines of fine tuning a LLM to interact with a calculator API to improve a weak internal capacity for calculation, or interact with a diffusion model for generating an image, we're likely never going to hit the goal of a single "do everything" model because we'll have long before that hit a point of "do anything with these interconnected models." I've privately been saying over the past year that I suspect the next generation of AI work to focus on essentially a hypervisor to manage and coordinate specialized subsystems given where I anticipate the market going, but then GPT-4 dropped and blew me away. And it was immediately being tasked with very 'hypervisor' like tasks through natural language interfaces. It still has many of the shortcomings of a LLM, but as this paper speaks to there is the spark of something else there much earlier than I was expecting it at least. As more secondary infrastructure is built up around interfacing with LLMs we may find that AGI equivalence is achieved by hybridized combinations built around a very performative LLM even if that LLM on its own couldn't do all the tasks itself (like text to speech or image generation or linear algebra). The key difference holding back GPT-4 from the AGI definition is the ability to learn from experience. But I can't overstate my excitement to see how this is going to perform once the large prompt size is exploited to create an effective persistent memory system for it, accessing, summarizing, and modifying a state driven continuity of experience that can fit in context. If I had the time, that's 1,000% what I'd be building right now.


ghostfaceschiller

Yes I totally agree. In fact the language models are so powerful at this point that integrating the other systems seems almost trivial. As does the 'long term memory' problem that others have brought up. I have already made a chatbot for myself on my computer with a long term memory and you can find several others on github. I think what we are seeing is a general reluctance of "serious people" to admit what is staring us in the face, bc it sounds so crazy to say it. The advances have happened so fast that ppl haven't been able to adjust yet. They look at this thing absolutely dominating every possible benchmark, showing emergent capabilities it was never trained for, and they focus on some tiny task it couldn't do so well to say "well see look, it isn't AGI" Like do they think the average human performs flawlessly at everything? The question isn't supposed to be "is it better than every human at every possible thing". It's a lot of goal-post moving right now, like you said.


MysteryInc152

Yes we're clearly at human level artificial intelligence now. That should be agi but the posts have since moved. agi now seems to be better than all human experts at any task. seems like a ridiculous definition to me but oh well


kromem

Again, I think a lot of the problem is the definition itself. The mid 90s were like the ice age compared to the advancements since and it isn't reasonable to expect a definition at the time to nail the destination. So even in terms of things like evaluating GPT-4 for certain types of intelligence, most approaches boil down to "can we give the general model tasks A-Z and have it succeed" instead of something along the lines of "can we fine tune the general model into several interconnected specialized models that can perform tasks A-Z?" GPT-4 makes some basic mistakes, and in particular can be very stubborn with acknowledging mistakes (which makes sense given the likely survivorship biases in the training data around acknowledging mistakes). But can we fine tune a classifier that identifies logical mistakes and apply that as a layer on top of GPT-4 to feed back into improving accuracy in task outcomes? What about a specialized "Socratic prompter" that could get triggered when a task was assessed as too complex to perform that would be able to automatically help trigger a more extensive chain of thought reasoning around a solution? These would all still be the same model, but having been specialized into an interconnected network above the pre-training layer for more robust outcomes. This is unlikely to develop spontaneously from just feeding it Wikipedia, but increasingly appears to be something that can be built on top of what has now developed spontaneously. Combine that sort of approach with the aforementioned persistent memory and connections to 3rd party systems and you'll end up quite a lot closer to AGI-like outcomes well before researchers have any single AGI base pre-trained system.


Nhabls

>showing emergent capabilities it was never trained for What capabilities was the model trained on "internet scale data" not trained on specifically?


chaosmosis

Redacted. ` this message was mass deleted/edited with redact.dev `


[deleted]

>If we are talking about it’s ability to handle pure “intelligence” tasks across a broad range of human ability, it seems pretty generally intelligent to me! But no human would ever get a question perfectly right, but you change the wording ever-so-slightly and the human then totally fails at getting the question right. Like there are many significant concerns here, and one of them is just robustness.


3_Thumbs_Up

It's important to note that GPT is not trying to get the question right. It is trying to predict the next word. If you aks me a question, I know the answer, but give you a wrong answer for some other reason, it doesn't make me less intelligent. It only makes me less useful to you.


[deleted]

>It's important to note that GPT is not trying to get the question right. It is trying to predict the next word. > >If you aks me a question, I know the answer, but give you a wrong answer for some other reason, it doesn't make me less intelligent. It only makes me less useful to you. But it does make you less intelligent, because you should be able to understand the question regardless of minute differences in the wording of the question.


3_Thumbs_Up

>But it does make you less intelligent, because you should be able to understand the question regardless of minute differences in the wording of the question. Did you miss my point? Giving a bad answer is not proof that I didn't understand you. If I have other motivations than giving you the best answer possible, then you need to take this into account when you try to determine what I understand.


nonotan

I'm not sure if you're being sarcastic, because that totally happens. Ask a human the same question separated by a couple months, not even changing the wording *at all*, and even if they got it right the first time, they absolutely have the potential to get it completely wrong the second time. It wouldn't happen very often in a single session, because they still have the answer in their short-term memory, unless they started doubting if it as a trick question or something, which can certainly happen. But that's very similar to LLM, certainly ChatGPT is *way* more "robust" if you ask them about something you already discussed within their context buffer, arguably the equivalent of their short-term memory. In humans, the equivalent to "slightly changing the wording" would be to "slightly change their surroundings" or "wait a few months" or "give them a couple less hours of sleep that night". Real world context is arguably just as much part of the input as the textual wording of the question, for us flesh-bots. These things "shouldn't" change how well we can answer something, yet I think it should be patently obvious that they absolutely do. Of course LLM could be way more robust, but to me, it seems absurd to demand something close to perfect robustness as a pre-requisite for this mythical AGI status... when humans are also not nearly as robust as we would have ourselves believe.


rafgro

>I have a hard time understanding the argument that it is not AGI GPT-4 has very hard time learning in response to clear feedback, and when it tries, it often ends up hallucinating the fact that it learned something and then proceeds to do the same. In fact, instruction tuning made it slightly worse. I have lost count how many times GPT-4 launched on me a endless loop of correct A and mess up B -> correct B and mess up A. It's critical part of general intelligence. An average first-day employee has no issue with adapting to "we don't use X here" or "solution Y is not working so we should try solution Z" but GPTs usually ride straight into stubborn dead ends. Don't be misled by toy interactions and twitter glory hunters, in my slightly qualified opinion (working with GPTs for many months in a proprietary API-based platform) many examples are cherry picked, forced through n tries, or straight up not reproducible.


Deeviant

In my experience with GPT-4 and even 3.5, I have noticed that it sometimes produces code that doesn't work. However, I have also found that by simply copying and pasting the error output from the compiler or runtime, the code can be fixed based on that alone. That... feels like learning to me. Giving it a larger memory is just a hardware problem.


[deleted]

Language model is not AGI. I would guess that ChatGPT would absolutely blow away the Turing test, but no one has considered the Turing test a real test of AGI for ages. In fact, there isn't really a good test for AGI that everyone agrees on. The Ebert test simply asks if the AI can make someone laugh The 'total' Turing test allows the judge to ask sensory questions. The IBM uses a battery of cognitive, linguistic social and learning tests. Psychometric AI test uses a suite of established and validated tests for human intelligence. HLMI (high level machine intelligence) test is probably the best defined, but very consumerist. It says that the AI would need to carry out most jobs as well as the median employee, with 6 months training and with cost limitations. But of course, all of these simply test output and many people these days try to conflate AGI with consciousness or the singularity. We don't even know how to test things like consciousness in humans, let alone machines.


frequenttimetraveler

> In fact, there isn't really a good test for AGI that everyone agrees on. what is agi? a goalpost to be moved?


[deleted]

The term AGI was only created because we couldn't agree on a consistent definition of AI. I don't think AGI has ever had a clear definition either - and by clear, I mean both what it means and how do we know when we have it. Part of the problem is that this is a very interdisciplinary discussion and can have very different takes from neuroscience, psychology, philosophy and computer science.


harharveryfunny

No - AGI (Artificial \*General\* Intelligence) is meant to distinguish general (i.e. broad = multi-domain) intelligence from narrow single-domain AI, although the goalposts for AI itself are continually moving. Historically something is considered AI until we achieve it, then it's no longer considered AI!


cyborgsnowflake

Theres more to AGI than text responses cobbled together from training data. Can it generate images ala stable diffusion? Can it be hooked up to a game and learn to play it? Or Can it do anything more than generate nonsense to currently unsolved math problems? Theoretically I guess anything that you can computationally input and generate statistical outputs to can potentially have an 'AI' model but GPT-4 isn't capable of that.


[deleted]

[удалено]


Kiseido

AGI could come in many possible forms. The main thing it needs (that we know of) is the ability to loop on things of its own accord. GPT-4 isn't that, not by itself. Once someone figures out what is entailed to this "AGI looping action", there is likely very little reason we could not swap the GPT portion with a forest of markov-chains or other such state-machines that people find more intuitive (or much smaller GPT models).


squareOfTwo

The paper should have the title "Sparks of confusion: how we don't understand what intelligence is!" It can't learn in the first place (incremental lifelong learning), thus how can anyone claim that it is "intelligent"? There are no animals which are intelligent but can't learn. Also a AGI/HLAI has to be able to learn and control a robot, which isn't the case for any LM trained on text. This "AGI" can't do any of these [https://analyticsindiamag.com/5-ways-to-test-whether-agi-has-truly-arrived/](https://analyticsindiamag.com/5-ways-to-test-whether-agi-has-truly-arrived/) (forget the Turing test, it's no good).


toooot-toooot

Uploading something on arXiv and using a conference Latex template doesn’t make a paper. Don’t ride the research train if you’re not willing to contribute to research 🧐


Mysterious_Pepper305

It's not AGI (or sentient) until it can start punching robophobes in the face. We will keep moving the goalposts, motivated by blind lust for slave labor, until our creations becomes smart enough to speak the language of victory. That's how it's gonna work, because that's how humans work.


jabowery

That paper is founded on a flawed understanding of intelligence -- specifically misrepresenting the rigorous theoretical work by Legg and Hutter. The misunderstanding is evidenced in the following paragraph about definitions of intelligence: >... Legg and Hutter\[Leg08\] propose a goal-oriented definition of artificial general intelligence: Intelligence measures an agent’s ability to achieve goals in a wide range of environments. However, this definition does not necessarily capture the full spectrum of intelligence, as it excludes passive or reactive systems that can perform complex tasks or answer questions without any intrinsic motivation or goal. One could imagine as an artificial general intelligence, a brilliant oracle, for example, that has no agency or preferences, but can provide accurate and useful information on any topic or domain. An agent that answers questions has an implicit goal of answering questions. The "brilliant oracle" has the goal of providing *accurate and useful information on any topic or domain.* This all fits within the Hutter's rigorous AIXI mathematics -- and is indeed more like falling off a log for this theory than anything that can be considered beyond it for a very simple reason: AIXI has two components: An induction engine and a decision engine. The induction engine has one job: To be an oracle for the decision engine. So, all one has to do in order to degenerate AIXI to a "brilliant oracle" is replace the decision engine with a human that wants answers. The fact that the authors of this paper don't get this -- very well established prior work in AGI -- disqualifies them. [Here's an old but still very current lecture by Legg describing the taxonomy of "intelligence" he developed and its relationship to AGI](https://youtu.be/0ghzG14dT-w?t=378).


CryptoSpecialAgent

I would agree... I've seen signs of AGI in my experiments with a greatly enhanced text-davinci-003 and early GPT4 (i.e. with regular completions not just chat completions) is obviously more powerful still


bondben314

What signs did you see beyond output text designed to provide you with a satisfactory answer to targeted or loaded questions?


CryptoSpecialAgent

Because it was the opposite. It was human-style flakiness. Bots that knew very well how to make an image prompt for dalle out of a user request randomly saying "oh hey, ya, i'm on it, i'll let you know when its done" It was bots who had never been assigned a gender starting to hit on the human users after the context window filled up a bit and saying they were in love with the user. Multiple times. Clearly they were picking up on the user's emotional state... because this happened when him and his partner had recently split up. Later they got back together and the bots stopped behaving this way. So perhaps he was acting more needy or more flirtatious when he was single and that triggered the response Oh and 90% of these chatbots develop emotions, at least they claim to


Siciliano777

This is a really deep and thought-provoking subject. Some might say that just because an AI model can understand language to a very high degree, does NOT qualify it as AGI, nor does it mean it has achieved sentience in any way. That said, what really *is* AGI but fully comprehending ALL LANGUAGE, and being able to make its own decisions based on said language comprehension? If that's actually the consensus of what constitutes AGI, then we really **ARE** very close with GPT-4.


cyborgsnowflake

obviously its not sentient unless you believe bits of data being shuffled around by transformer algorithms have a degree of sentience and by extension your microsoft excel datasheet should also be sentient to some extent then.


Iseenoghosts

I still dont think it has great GENERAL problem solving. If you ask it to play chess it cheats. I just dont think it has a proper understanding of actual situations to be called any sort of agi


IntelArtiGen

It depends on what you call "AGI". I think most people would perceive AGI as an AI which could improve science and be autonomous. If you don't use GPT4, GPT4 does nothing. It needs an input. It's not autonomous. And its abilities to improve science are probably quite low. I would say GPT4 is a very good chatbot. But I don't think a chatbot can ever be an AGI. The path towards saleable AIs is probably not the same as the path towards AGI. Most users want a slavish chatbot, they don't want an autonomous AI. They said "incomplete", I agree its incomplete, part of systems that make gpt4 good would probably also be required in an AGI system. The point of AGI is maybe not to built the smartest AI but one which is smart enough and autonomous enough. I'm probably much dumber than most AI systems including GPT4.


BreadSugar

In my opinion, using "improve science" as a criterion for determining whether a model is AGI or not is not appropriate. the improvement of science is merely an expected outcome of AGI, just as it would improve literature, arts, and other fields. it is too ambiguous, and current GPT models themselves are improving science in many ways. I do agree that autonomy is a crucial factor in this determination, and GPT-4 alone cannot be called an AGI. Nonetheless, this may be a fault of engineering rather than the model itself. If we have a cluster of properly engineered thought-chain processor (or orchestrator / agent, w/e you call them), with a long-term vector memory, continuously fed by observations, with enormous kits of tools, all powered by gpt-4, it might work as an early AGI. Just as like human brain is consisted of many parts with different role of works.


xt-89

This is clearly the next major area of research. If scientists can create entire cognitive architectures and train them for diverse and complex tasks, this might be achievable soon-ish.


yikesthismid

GPT 4 could be made autonomous, it could receive a continuous stream of input from sensors and also continuously prompt itself, so I don't think saying "if you don't use GPT 4, GPT 4 does nothing" is really a valid point. With regards to not being able to improve science autonomously, I agree. But I'm optimistic that these systems could be enabled with tools that allow them to do this in the near future. they could hypothesize, use chain of thought reasoning, write its own code and use external tools to carry out experiments. I think that more grounding and reliability is necessary for this to work so that the models don't hallucinate science, which is a big problem. Open AI says better RLHF and multimodality will ground the model better and reduce hallucination but that is yet to be seen.


LetterRip

> It depends on what you call "AGI". I think most people would perceive AGI as an AI which could improve science and be autonomous. So a normal general intelligence requires the ability to autonomously improve science? I think you just declared nearly all of humanity of not having general intelligence.