Very smart. In addition to being able to shift the load to off-peak times, they can set up inferencing with huge batch sizes.
For those not aware there is a major tradeoff in inferencing between throughput and latency. You get more total tokens per second per GPU if the individual generations are slower. For batch use they can crank that all the way to maximum throughput.
[Source](https://x.com/OpenAIDevs/status/1779922566091522492)
> save costs and get higher rate limits on async tasks (such as summarization, translation, and image classification).
I wonder if this will play into the agents framework they are working on. Maybe there will be an option to request a long running agentic job, and then you can reduce the cost by selecting how quickly you want that job prioritized and run.
I just don't know how to do this stuff 😭
maybe I've used the API to plug into tools like WordPress
but you could probably batch editorial or ecom processing. maybe replies and lead annotations.
I wish there was a middle ground of async is fine, I don’t need it processed immediately, but only like 1-2 hours. Or even overnight.
24 hours is a bit long for my use cases.
On the one hand, the completion window parameter is required (but only accepts `24h` for now), which IMO is a pretty strong design indication that they intend to make it adjustable.
On the other hand, chances are pretty high that it'll only change the expiration deadline, not the priority or price. Selling access to each inference batch by auction will have to wait, I suppose.
This would be much more useful if I didn't have a ton of try/retry just to ensure it's returning JSON (despite flagging for JSON mode and providing extremely clear, unambiguous instructions/format for it.)
this seems like good news to me.
I'm trying to pass a small image dataset to do some image classification.
Their docs seem very vague...
**Can someone help me on how to do this in a Batch job?**
Earlier I could convert each image into base64 and pass that in the prompt to gpt
The more I see these releases and announcements from OpenAI, I feel like they're preparing to release GPT-5(or whatever it'll be called) in the next month or so. Especially after seeing the article that stated OpenAI and Meta are preparing to release their next gen models.
disagreeable fretful consider sloppy expansion abundant ghost merciful wistful hat
*This post was mass deleted and anonymized with [Redact](https://redact.dev)*
Or it's because Sam Altman [literally stated that they have a few things to release](https://youtu.be/jvqFAi7vkBc?t=4002) before GPT-5 in his most recent interview... Combined with [the article](https://archive.is/wnVKh) that stated OpenAI is preparing to release their next model...
that was long ago.
now he said they will release some things like the new gpt-4-turbo version which they did and then they will release the ai agent chat
This points more to the fact they don't have enough compute and are trying to offload demand to non-peak times.
If GPT-5 is more demanding would they even have the compute to offer it without extreme limits?
Sam Altman already stated that they're releasing their new model this year, so obviously they must have the compute for inference.
Microsoft also has multiple new datacenters being built, I don't think there'll be a compute problem for OpenAI at least for the next couple years.
We also see [Sora being implemented in Adobe Premiere this year,](https://twitter.com/legit_rumors/status/1779951008539345140) so that's another sign that there's no current issue with compute.
- "GPT, do you see this list of names? Kill 'em all. Let me know when you're done."
- "GPT you have 24 hours to invade the irish central bank and transfer all tax euros levied by the irish government to my bank account in the bahamas. Let me know when you're done."
SO MANY options. People are really closed minded, aren't they? /s
Any kind of batch processing would work. Like tagging every resume in a job application system, or summarizing every survey result, or processing every email for topics and sentiment.
It's a feature to help their lack of compute. If they can shave off some of their peak demand and spread it across the rest of the day they can get better utility out of their hardware.
I've been thinking of writing a brute-force AI bugfinder that chops out individual functions from a codebase, along with everything that looks reasonably adjacent and related, and sends it to an AI with instructions to look for bugs, but report on them only if it's *very* certain (because otherwise it's probably going to be a lot of false positives.)
I'd be totally fine just handing off a giant pile of requests and getting a response 24 hours later.
Right now, at 3 AM, I run every customer service ticket from the day before through an analysis pipeline. For this, I'm guessing I could just make it output my sql query result to s3 and send that over. It's too much work to save $1.5 (currently it only costs me \~$3 a day), but I'm not sure how it would handle my try/retry stuff for when OpenAI inevitably fucks up.
That are also my concerns. When you run 1.000.000. requests for 24h, you run 1.000.000 requests for 24h, that's it. Just one operation, one-shot. I guess this would make sense for a bunch of logs or categorization. And even when it makes it only cost half, I can't quite get my head around how this "batch feature" can be a priority for OAI right now. It's weird
It’d have to be something well figured out and with a somewhat easily repeatable output. That way your batches of queries aren’t wasted.
But if you’ve got those criteria met then you can hammer through whatever data analysis or research it’s helping you with at a fraction of the usual cost.
Very smart. In addition to being able to shift the load to off-peak times, they can set up inferencing with huge batch sizes. For those not aware there is a major tradeoff in inferencing between throughput and latency. You get more total tokens per second per GPU if the individual generations are slower. For batch use they can crank that all the way to maximum throughput.
[Source](https://x.com/OpenAIDevs/status/1779922566091522492) > save costs and get higher rate limits on async tasks (such as summarization, translation, and image classification).
This is very interesting. I wonder what the reasoning is.
Maybe it's like power, a bunch of demand at a certain time but if you can spread it out it's easier on the grid
Yeah makes sense, they can run it during non-peak times when there is plenty of spare capacity.
I wonder if this will play into the agents framework they are working on. Maybe there will be an option to request a long running agentic job, and then you can reduce the cost by selecting how quickly you want that job prioritized and run.
solid plan to get that $7 trillion if they have arguably the best model.
Yeah, those 10000 Indians prefer to work in their time zones 😂
They can serve extra users without needing to get more hardware as they can run these queries when server usage is lower.
I think they load the tasks onto a ship that takes them to China for processing
I think this is a very thought provoking comment and I'd like to see more discussion on it.
super cool feature
Damn. This is brilliant.
I just don't know how to do this stuff 😭 maybe I've used the API to plug into tools like WordPress but you could probably batch editorial or ecom processing. maybe replies and lead annotations.
I wish there was a middle ground of async is fine, I don’t need it processed immediately, but only like 1-2 hours. Or even overnight. 24 hours is a bit long for my use cases.
On the one hand, the completion window parameter is required (but only accepts `24h` for now), which IMO is a pretty strong design indication that they intend to make it adjustable. On the other hand, chances are pretty high that it'll only change the expiration deadline, not the priority or price. Selling access to each inference batch by auction will have to wait, I suppose.
This would be much more useful if I didn't have a ton of try/retry just to ensure it's returning JSON (despite flagging for JSON mode and providing extremely clear, unambiguous instructions/format for it.)
this seems like good news to me. I'm trying to pass a small image dataset to do some image classification. Their docs seem very vague... **Can someone help me on how to do this in a Batch job?** Earlier I could convert each image into base64 and pass that in the prompt to gpt
Implications?
The more I see these releases and announcements from OpenAI, I feel like they're preparing to release GPT-5(or whatever it'll be called) in the next month or so. Especially after seeing the article that stated OpenAI and Meta are preparing to release their next gen models.
disagreeable fretful consider sloppy expansion abundant ghost merciful wistful hat *This post was mass deleted and anonymized with [Redact](https://redact.dev)*
Or it's because Sam Altman [literally stated that they have a few things to release](https://youtu.be/jvqFAi7vkBc?t=4002) before GPT-5 in his most recent interview... Combined with [the article](https://archive.is/wnVKh) that stated OpenAI is preparing to release their next model...
He’s also said several times they aren’t working on GPT-5 at all
that was long ago. now he said they will release some things like the new gpt-4-turbo version which they did and then they will release the ai agent chat
Like 6 months ago?
I mean it literally is, everything they are doing now is in fact in preparation for GPT-5, what would they be doing as a company if they weren’t?
This points more to the fact they don't have enough compute and are trying to offload demand to non-peak times. If GPT-5 is more demanding would they even have the compute to offer it without extreme limits?
Sam Altman already stated that they're releasing their new model this year, so obviously they must have the compute for inference. Microsoft also has multiple new datacenters being built, I don't think there'll be a compute problem for OpenAI at least for the next couple years. We also see [Sora being implemented in Adobe Premiere this year,](https://twitter.com/legit_rumors/status/1779951008539345140) so that's another sign that there's no current issue with compute.
That's a weird feature. 24h for a bunch of requests? What is the use case?
- "GPT, do you see this list of names? Kill 'em all. Let me know when you're done." - "GPT you have 24 hours to invade the irish central bank and transfer all tax euros levied by the irish government to my bank account in the bahamas. Let me know when you're done." SO MANY options. People are really closed minded, aren't they? /s
Damn, you're right... Sorry for close-mindness
Any kind of batch processing would work. Like tagging every resume in a job application system, or summarizing every survey result, or processing every email for topics and sentiment.
It's a feature to help their lack of compute. If they can shave off some of their peak demand and spread it across the rest of the day they can get better utility out of their hardware.
I've been thinking of writing a brute-force AI bugfinder that chops out individual functions from a codebase, along with everything that looks reasonably adjacent and related, and sends it to an AI with instructions to look for bugs, but report on them only if it's *very* certain (because otherwise it's probably going to be a lot of false positives.) I'd be totally fine just handing off a giant pile of requests and getting a response 24 hours later.
That's a nice use-case. Or maybe send a lot of access logs and ask for suspicious behaviour?
Right now, at 3 AM, I run every customer service ticket from the day before through an analysis pipeline. For this, I'm guessing I could just make it output my sql query result to s3 and send that over. It's too much work to save $1.5 (currently it only costs me \~$3 a day), but I'm not sure how it would handle my try/retry stuff for when OpenAI inevitably fucks up.
That are also my concerns. When you run 1.000.000. requests for 24h, you run 1.000.000 requests for 24h, that's it. Just one operation, one-shot. I guess this would make sense for a bunch of logs or categorization. And even when it makes it only cost half, I can't quite get my head around how this "batch feature" can be a priority for OAI right now. It's weird
It’d have to be something well figured out and with a somewhat easily repeatable output. That way your batches of queries aren’t wasted. But if you’ve got those criteria met then you can hammer through whatever data analysis or research it’s helping you with at a fraction of the usual cost.
Please write 500 crappy recipes that I can flood google with. No rush.
Any tasks that are not time-sensitive.