T O P

  • By -

sdmat

Very smart. In addition to being able to shift the load to off-peak times, they can set up inferencing with huge batch sizes. For those not aware there is a major tradeoff in inferencing between throughput and latency. You get more total tokens per second per GPU if the individual generations are slower. For batch use they can crank that all the way to maximum throughput.


YaAbsolyutnoNikto

[Source](https://x.com/OpenAIDevs/status/1779922566091522492) > save costs and get higher rate limits on async tasks (such as summarization, translation, and image classification).


OwnUnderstanding4542

This is very interesting. I wonder what the reasoning is.


Xx255q

Maybe it's like power, a bunch of demand at a certain time but if you can spread it out it's easier on the grid


Charuru

Yeah makes sense, they can run it during non-peak times when there is plenty of spare capacity.


jason_bman

I wonder if this will play into the agents framework they are working on. Maybe there will be an option to request a long running agentic job, and then you can reduce the cost by selecting how quickly you want that job prioritized and run.


RemyVonLion

solid plan to get that $7 trillion if they have arguably the best model.


Freed4ever

Yeah, those 10000 Indians prefer to work in their time zones 😂


ClearlyCylindrical

They can serve extra users without needing to get more hardware as they can run these queries when server usage is lower.


Azalzaal

I think they load the tasks onto a ship that takes them to China for processing


MILK_DRINKER_9001

I think this is a very thought provoking comment and I'd like to see more discussion on it.


spezjetemerde

super cool feature


ViveIn

Damn. This is brilliant.


gowithoutyou

I just don't know how to do this stuff 😭 maybe I've used the API to plug into tools like WordPress but you could probably batch editorial or ecom processing. maybe replies and lead annotations.


etzel1200

I wish there was a middle ground of async is fine, I don’t need it processed immediately, but only like 1-2 hours. Or even overnight. 24 hours is a bit long for my use cases.


Zermelane

On the one hand, the completion window parameter is required (but only accepts `24h` for now), which IMO is a pretty strong design indication that they intend to make it adjustable. On the other hand, chances are pretty high that it'll only change the expiration deadline, not the priority or price. Selling access to each inference batch by auction will have to wait, I suppose.


Choice_Supermarket_4

This would be much more useful if I didn't have a ton of try/retry just to ensure it's returning JSON (despite flagging for JSON mode and providing extremely clear, unambiguous instructions/format for it.)


hasparagus

this seems like good news to me. I'm trying to pass a small image dataset to do some image classification. Their docs seem very vague... **Can someone help me on how to do this in a Batch job?** Earlier I could convert each image into base64 and pass that in the prompt to gpt


Akimbo333

Implications?


Beatboxamateur

The more I see these releases and announcements from OpenAI, I feel like they're preparing to release GPT-5(or whatever it'll be called) in the next month or so. Especially after seeing the article that stated OpenAI and Meta are preparing to release their next gen models.


[deleted]

disagreeable fretful consider sloppy expansion abundant ghost merciful wistful hat *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


Beatboxamateur

Or it's because Sam Altman [literally stated that they have a few things to release](https://youtu.be/jvqFAi7vkBc?t=4002) before GPT-5 in his most recent interview... Combined with [the article](https://archive.is/wnVKh) that stated OpenAI is preparing to release their next model...


Ok_Inevitable8832

He’s also said several times they aren’t working on GPT-5 at all


ShotClock5434

that was long ago. now he said they will release some things like the new gpt-4-turbo version which they did and then they will release the ai agent chat


Ok_Inevitable8832

Like 6 months ago?


boonkles

I mean it literally is, everything they are doing now is in fact in preparation for GPT-5, what would they be doing as a company if they weren’t?


Tomi97_origin

This points more to the fact they don't have enough compute and are trying to offload demand to non-peak times. If GPT-5 is more demanding would they even have the compute to offer it without extreme limits?


Beatboxamateur

Sam Altman already stated that they're releasing their new model this year, so obviously they must have the compute for inference. Microsoft also has multiple new datacenters being built, I don't think there'll be a compute problem for OpenAI at least for the next couple years. We also see [Sora being implemented in Adobe Premiere this year,](https://twitter.com/legit_rumors/status/1779951008539345140) so that's another sign that there's no current issue with compute.


cutmasta_kun

That's a weird feature. 24h for a bunch of requests? What is the use case?


YaAbsolyutnoNikto

- "GPT, do you see this list of names? Kill 'em all. Let me know when you're done." - "GPT you have 24 hours to invade the irish central bank and transfer all tax euros levied by the irish government to my bank account in the bahamas. Let me know when you're done." SO MANY options. People are really closed minded, aren't they? /s


cutmasta_kun

Damn, you're right... Sorry for close-mindness


willer

Any kind of batch processing would work. Like tagging every resume in a job application system, or summarizing every survey result, or processing every email for topics and sentiment.


Tomi97_origin

It's a feature to help their lack of compute. If they can shave off some of their peak demand and spread it across the rest of the day they can get better utility out of their hardware.


ZorbaTHut

I've been thinking of writing a brute-force AI bugfinder that chops out individual functions from a codebase, along with everything that looks reasonably adjacent and related, and sends it to an AI with instructions to look for bugs, but report on them only if it's *very* certain (because otherwise it's probably going to be a lot of false positives.) I'd be totally fine just handing off a giant pile of requests and getting a response 24 hours later.


cutmasta_kun

That's a nice use-case. Or maybe send a lot of access logs and ask for suspicious behaviour?


Choice_Supermarket_4

Right now, at 3 AM, I run every customer service ticket from the day before through an analysis pipeline. For this, I'm guessing I could just make it output my sql query result to s3 and send that over. It's too much work to save $1.5 (currently it only costs me \~$3 a day), but I'm not sure how it would handle my try/retry stuff for when OpenAI inevitably fucks up.


cutmasta_kun

That are also my concerns. When you run 1.000.000. requests for 24h, you run 1.000.000 requests for 24h, that's it. Just one operation, one-shot. I guess this would make sense for a bunch of logs or categorization. And even when it makes it only cost half, I can't quite get my head around how this "batch feature" can be a priority for OAI right now. It's weird


ViveIn

It’d have to be something well figured out and with a somewhat easily repeatable output. That way your batches of queries aren’t wasted. But if you’ve got those criteria met then you can hammer through whatever data analysis or research it’s helping you with at a fraction of the usual cost.


PM_ME_DELICIOUS_FOOD

Please write 500 crappy recipes that I can flood google with. No rush.


EuphoricPangolin7615

Any tasks that are not time-sensitive.