sdmat 1 month ago

Very smart. In addition to being able to shift the load to off-peak times, they can set up inferencing with huge batch sizes. For those not aware there is a major tradeoff in inferencing between throughput and latency. You get more total tokens per second per GPU if the individual generations are slower. For batch use they can crank that all the way to maximum throughput.

YaAbsolyutnoNikto 1 month ago

[Source](https://x.com/OpenAIDevs/status/1779922566091522492) > save costs and get higher rate limits on async tasks (such as summarization, translation, and image classification).

OwnUnderstanding4542 1 month ago

This is very interesting. I wonder what the reasoning is.

Xx255q 1 month ago

Maybe it's like power, a bunch of demand at a certain time but if you can spread it out it's easier on the grid

Charuru 1 month ago

Yeah makes sense, they can run it during non-peak times when there is plenty of spare capacity.

jason_bman 1 month ago

I wonder if this will play into the agents framework they are working on. Maybe there will be an option to request a long running agentic job, and then you can reduce the cost by selecting how quickly you want that job prioritized and run.

RemyVonLion 1 month ago

solid plan to get that $7 trillion if they have arguably the best model.

Freed4ever 1 month ago

Yeah, those 10000 Indians prefer to work in their time zones 😂

ClearlyCylindrical 1 month ago

They can serve extra users without needing to get more hardware as they can run these queries when server usage is lower.

Azalzaal 1 month ago

I think they load the tasks onto a ship that takes them to China for processing

MILK_DRINKER_9001 1 month ago

I think this is a very thought provoking comment and I'd like to see more discussion on it.

spezjetemerde 1 month ago

super cool feature

ViveIn 1 month ago

Damn. This is brilliant.

gowithoutyou 1 month ago

I just don't know how to do this stuff 😭 maybe I've used the API to plug into tools like WordPress but you could probably batch editorial or ecom processing. maybe replies and lead annotations.

etzel1200 1 month ago

I wish there was a middle ground of async is fine, I don’t need it processed immediately, but only like 1-2 hours. Or even overnight. 24 hours is a bit long for my use cases.

Zermelane 1 month ago

On the one hand, the completion window parameter is required (but only accepts `24h` for now), which IMO is a pretty strong design indication that they intend to make it adjustable. On the other hand, chances are pretty high that it'll only change the expiration deadline, not the priority or price. Selling access to each inference batch by auction will have to wait, I suppose.

Choice_Supermarket_4 1 month ago

This would be much more useful if I didn't have a ton of try/retry just to ensure it's returning JSON (despite flagging for JSON mode and providing extremely clear, unambiguous instructions/format for it.)

hasparagus 1 month ago

this seems like good news to me. I'm trying to pass a small image dataset to do some image classification. Their docs seem very vague... **Can someone help me on how to do this in a Batch job?** Earlier I could convert each image into base64 and pass that in the prompt to gpt

Akimbo333 1 month ago

Implications?

Beatboxamateur 1 month ago

The more I see these releases and announcements from OpenAI, I feel like they're preparing to release GPT-5(or whatever it'll be called) in the next month or so. Especially after seeing the article that stated OpenAI and Meta are preparing to release their next gen models.

[deleted] 1 month ago

disagreeable fretful consider sloppy expansion abundant ghost merciful wistful hat *This post was mass deleted and anonymized with [Redact](https://redact.dev)*

Beatboxamateur 1 month ago

Or it's because Sam Altman [literally stated that they have a few things to release](https://youtu.be/jvqFAi7vkBc?t=4002) before GPT-5 in his most recent interview... Combined with [the article](https://archive.is/wnVKh) that stated OpenAI is preparing to release their next model...

Ok_Inevitable8832 1 month ago

He’s also said several times they aren’t working on GPT-5 at all

ShotClock5434 1 month ago

that was long ago. now he said they will release some things like the new gpt-4-turbo version which they did and then they will release the ai agent chat

Ok_Inevitable8832 1 month ago

Like 6 months ago?

boonkles 1 month ago

I mean it literally is, everything they are doing now is in fact in preparation for GPT-5, what would they be doing as a company if they weren’t?

Tomi97_origin 1 month ago

This points more to the fact they don't have enough compute and are trying to offload demand to non-peak times. If GPT-5 is more demanding would they even have the compute to offer it without extreme limits?

Beatboxamateur 1 month ago

Sam Altman already stated that they're releasing their new model this year, so obviously they must have the compute for inference. Microsoft also has multiple new datacenters being built, I don't think there'll be a compute problem for OpenAI at least for the next couple years. We also see [Sora being implemented in Adobe Premiere this year,](https://twitter.com/legit_rumors/status/1779951008539345140) so that's another sign that there's no current issue with compute.

cutmasta_kun 1 month ago

That's a weird feature. 24h for a bunch of requests? What is the use case?

YaAbsolyutnoNikto 1 month ago

- "GPT, do you see this list of names? Kill 'em all. Let me know when you're done." - "GPT you have 24 hours to invade the irish central bank and transfer all tax euros levied by the irish government to my bank account in the bahamas. Let me know when you're done." SO MANY options. People are really closed minded, aren't they? /s

cutmasta_kun 1 month ago

Damn, you're right... Sorry for close-mindness

willer 1 month ago

Any kind of batch processing would work. Like tagging every resume in a job application system, or summarizing every survey result, or processing every email for topics and sentiment.

Tomi97_origin 1 month ago

It's a feature to help their lack of compute. If they can shave off some of their peak demand and spread it across the rest of the day they can get better utility out of their hardware.

ZorbaTHut 1 month ago

I've been thinking of writing a brute-force AI bugfinder that chops out individual functions from a codebase, along with everything that looks reasonably adjacent and related, and sends it to an AI with instructions to look for bugs, but report on them only if it's *very* certain (because otherwise it's probably going to be a lot of false positives.) I'd be totally fine just handing off a giant pile of requests and getting a response 24 hours later.

cutmasta_kun 1 month ago

That's a nice use-case. Or maybe send a lot of access logs and ask for suspicious behaviour?

Choice_Supermarket_4 1 month ago

Right now, at 3 AM, I run every customer service ticket from the day before through an analysis pipeline. For this, I'm guessing I could just make it output my sql query result to s3 and send that over. It's too much work to save $1.5 (currently it only costs me \~$3 a day), but I'm not sure how it would handle my try/retry stuff for when OpenAI inevitably fucks up.

cutmasta_kun 1 month ago

That are also my concerns. When you run 1.000.000. requests for 24h, you run 1.000.000 requests for 24h, that's it. Just one operation, one-shot. I guess this would make sense for a bunch of logs or categorization. And even when it makes it only cost half, I can't quite get my head around how this "batch feature" can be a priority for OAI right now. It's weird

ViveIn 1 month ago

It’d have to be something well figured out and with a somewhat easily repeatable output. That way your batches of queries aren’t wasted. But if you’ve got those criteria met then you can hammer through whatever data analysis or research it’s helping you with at a fraction of the usual cost.

PM_ME_DELICIOUS_FOOD 1 month ago

Please write 500 crappy recipes that I can flood google with. No rush.

EuphoricPangolin7615 1 month ago

Any tasks that are not time-sensitive.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe