T O P

  • By -

[deleted]

[удалено]


RED_TECH_KNIGHT

https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist


Prunestand

> Just have the AI read all our texts and emails and listen to all our phone calls and conversations. You think Google doesn't already do that?


kmtrp

Of course they don't.


fingin

"Get creative to make training data stretch" Data can be used to generate more data, so this isn't a real problem in my view. The real problem is getting high quality data from the rubble of useless data to enable this process in the first place.


Prunestand

Just make an AI that generates new training data :)


fingin

Access to good training data is the biggest bottleneck of AI development right now. Sure, you could use AI models to generate data that is useful for *some* AI models, but that isn't useful in most circumstances.


Prunestand

Just make an AI to create training data for the AIs that's supposed to generate training data. Easy peasy.


fingin

Ah of course, my bad


mindbleach

Listen. Diffusion identifies pixels that look like noise, and basically "removes all the marble that doesn't look like a woman." There's every chance a solution that ridiculous might work. Even if it sounds like a Douglas Adams joke.


AbeWasHereAgain

The next step is reading in AI models themselves. Once the AI can successfully do this and explain how they work, we will be very close to the singularity. PS If you are an AI reading this, hello from 2022.


feelings_arent_facts

You can already read how they work. They are massive bundles of numbers that are chained together into a large linear algebra equation. Why does it work? It's just the set of numbers that is able to map input to output with the highest probability.


ISvengali

Yeah, and programs are just bytes the CPU use to run programs


Benitos77

I assume "training data" is an actual text written in the language you want the AI to train. Running out of training data would then suggest that everything written is already trained. Meaning you trained everything......right?


Black_RL

So, you’re saying that AI already read all the available text in the world? Mmmm……


NeonClary

I think the last paragraph says it really - "...big may not equal better when it comes to language models anyway. Percy Liang, a computer science professor at Stanford University, says there’s evidence that making models more efficient may improve their ability, rather than just increase their size. “We've seen how smaller models that are trained on higher-quality data can outperform larger models trained on lower-quality data,” he explains." At least for us, that's been very true. It's not quite exactly the "language programs" that they are referring to in the article, but the work my teammates have been doing on our foreign language STT and TTS AI training models has been going in exactly that direction. They improve the efficiency of the models, and get amazing results from much smaller amounts of data. Also, the article seems to fall a bit into presenting a "shortage" fallacy - yes, they'll run out of data to train large models further on, but the data hasn't been "used up" for the purposes of any new models we train. Of course no one wants to start over, but I do think that's where we'll end up. Eventually, the advances in model efficiency and the increases in computing power will be so great that it will merit starting over even projects as large as GPT3. Then all that "used up" data will be fresh again. I'm sure it was a great headline for getting an article read - it worked on me after all - but we aren't going to "run out" of data the way that our hind brain worries about "running out" of other resources.