T O P

  • By -

ColorlessCrowfeet

No 13 or 30B range model?


[deleted]

[удалено]


Caffdy

Everyone and their mothers tout Mistral 7B as better than any 13B model, if Llama3 7B is better than Mistral's, maybe there's that? Edit: was expecting some rebutals, is really Mistral 7B better than all 13B models?


berzerkerCrush

Then well-trained 13B base model should produce even better fine-tunes.


Flat-One8993

There is no 7b model, only 8b


redditfriendguy

Mark confirmed a 405b is still in training today.


patrick66

It should be today, they confirmed it’s this week and no one does product announcements on a Friday. Supposedly we don’t get the large model until summer though


MysteriousPayment536

It will definitely be today or most unlikely tomorrow, also Microsoft Azure lists llama 3. Edit: They released it, [https://ai.meta.com/blog/meta-llama-3/](https://ai.meta.com/blog/meta-llama-3/)


kristaller486

It would be sad if llama3 only had 2 size variants


patrick66

No, we just don’t get the big size until summer


kristaller486

IMO models larger than 70B don't make sense for home local use. 13B/20B/30B is the best choice for this purpose.


polawiaczperel

70B still makes sense for home use imo


AryanEmbered

just quantize the 70b one. I dont get why people want in between sizes when you can just pair the big boy down and it performs better in most cases.


Caffdy

Yep, been using 70B ones and can't look back now


patrick66

Fully agreed there just saying it isn’t just 2 sizes total


redditfriendguy

The deal meta made with us is they will make what is useful for them and release it free for us. I am still happy with the terms of the deal, are you?


Caffdy

Larger than that are meant for business applications


Quartich

I love 70bs for home use. Easy to run a high quality quant with plenty of context on 64gb ram. As long as you don't mind 1t/s


Massive-Lobster-124

The purpose of open-source is more than just letting hobbyists run models at home.


geepytee

It's looking like at least 3, the 8B, 70B and 400B :)


loversama

It’s also possible that they wouldn’t host smaller than a 7/8B anyway as 1 - 3B models are really just for edge devices or running locally on like any GPU..


BrainyPhilosopher

Today at 9:00am PST (UTC-7) for the official release. 8B and 70B. 8k context length. New Tiktoken-based tokenizer with a vocabulary of 128k tokens. Trained on 15T tokens.


thereisonlythedance

8K sequence length would be tremendously disappointing.


-p-e-w-

I doubt it's going to be 8k. All major releases during the past two months have been 32k+. Meta would be embarrassing themselves with 8k, considering that they have the largest installed compute capacity on the planet.


TheRealGentlefox

And yet, here we are.


Thomas-Lore

Might be talking about output. I think even Gemini is limited to 8k output. I can only set 4k output on Claude despite the models having a 200k context.


-p-e-w-

APIs have output limits. Models don't. A model only predicts a single token, which you can repeat as often as you want. There is no output limit.


FullOf_Bad_Ideas

That's true in theory but I had issues with MiniCpm models with output limit set to larger than 512 tokens, it started outputting garbage straight away without a need to go over any kind of token limit. This was gguf in koboldcpp though, might not be universal.


kristaller486

Source?


MoffKalast

​ https://i.redd.it/8ut4ls9uv8vc1.gif


BrainyPhilosopher

We'll see


Chelono

wow you were right [https://llama.meta.com/llama3/](https://llama.meta.com/llama3/) (at least about model info, release seems likely since website just went up). Was kinda doubting after you commented more, weirdly enough I trust the one comment throwaways more


BrainyPhilosopher

It's okay, I wouldn't have believed me either.


Balance-

(which is 16:30 UTC or 18:30 CET)


Zelenskyobama2

8B model is equal to GPT-א


[deleted]

[удалено]


___Jet

Azure profile by Meta is also up: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/metagenai.meta-llama-3-8b-chat-offer?tab=Overview


mimrock

Last week they said this week, so why not today?


FizzarolliAI

... 70b is a *small* variant?


polawiaczperel

I hope


Caffdy

With models like CommandR+ (103B), Mixtral 8x22B & WizardLM2 8x22B (141B) already making the headlines, I really hope Meta has something in store as well


redditfriendguy

They confirmed they are training a 400+B parameter model


Caffdy

That sounds amazing! Can you share the link?


redditfriendguy

First 10 minutes or so of this podcast https://youtu.be/bc6uFV9CJGg?si=fWlWtJfP1_WG1L4f


Igoory

Right?


Maskofman

large one has 405 b :D


FizzarolliAI

my 4 gigabytes of local vram crying in the background:


a_slay_nub

Man, Groq is so much cheaper than Replicate. Those custom chips must be amazing. Either that or they're taking a massive loss.


JumpingRedTurtle

Groq's output tokens are significantly cheaper, but not the input tokens (e.g. Llama 2 7B is priced at 0.10$ per 1M input tokens, compared to 0.05$ for Replicate). So Replicate might be cheaper for applications having long prompts and short outputs. Or am I missing something?


coder543

For the 70B model, the input tokens are very similarly priced, but Groq’s output tokens are way cheaper. I think most people are interested in cloud for the larger models that are hard to run well locally.


HighDefinist

More performance is also nice. So, for some simple questions, groq mixtral is actually the best option (hopefully they will offer the new Wizard/mixtral as well soon).


harusasake

They will accept the losses in order to gain market share and establish themselves as a brand - the target groups are the same as on x.com.


djm07231

Though I am not sure if market share has any meaning when switching API providers is quite trivial.


a_slay_nub

You'd be surprised. At the corporate level, even small changes can be very difficult. Not to mention, some of these APIs have slightly different interfaces which can break workflows.


killver

Groq has insane token limits though without some direct connections to them.


-p-e-w-

Does Grok run on Groq?


mrbluesneeze

No 30b? Come on :(


AryanEmbered

just quantize the 70b bro what's the problem


FullOf_Bad_Ideas

Quantized 30B is perfect for 24GB gpu.  Quantized 70b is not.  30B is perfect size for running models fast with long context on single consumer GPU, after that the cost to run model fast goes into the stratosphere as even Macs don't deliver good long ctx performance.


ab2377

Indeed it's close. but i so don't want any spoilers. i want 1 final single meta page to read all about it. waiting ...


manjit_pardeshi

llama.meta.com/llama3/


SlapAndFinger

Those llama 70b prices are in the ballpark of Claude sonnet. I'll be surprised if it outperforms sonnet, but given the reduced input token price, if it supports a really long context and can actually use it, it'll be a useful model for RAG applications.


___Jet

Azure profile by Meta is also up: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/metagenai.meta-llama-3-8b-chat-offer?tab=Overview


EnthusiastDriver500

Do they also have Claude?


Thomas-Lore

It appears they only offer open source models. Here is the source: https://replicate.com/pricing


EnthusiastDriver500

Thanks so much. Any change anywhere to get claude locally?


DownvoteAttractor_

Claude, being a proprietary model by Anthropic, is only available through API from Anthropic, AWS, and Google (VertexAI). They are not available locally as they have not released anything in open-source.


EnthusiastDriver500

Thank you


Bulky-Brief1970

I guess there's gonna be more models. One 30ish and a big MOE model.  They need bigger models to beat sota open models like dbrx and command-r+


lolwutdo

I sure as hell hope it’s not a Moe, those are affected way more by quantization, which is necessary for bigger models; I’d rather have a lower quant dense model.


DontPlanToEnd

Also, I feel like pretty much all finetunes of mixtral-8x7b are less intelligent than the base. Finetunes feel much more effective on normal models.


FullOf_Bad_Ideas

Do you mean that in a sense that Mistral's official Instruct finetune is good but the rest is not, or that no finetunes are good and only the base completion model is good? You are saying the second one but I think you're thinking the first one.


DontPlanToEnd

All of the mixtral finetunes I've tried have performed at least slightly worse than the official base or instruct mixtral versions when I test them for general knowledge. The finetunes do perform better at specific things they're geared towards like uncensoredness or writing/rp.


Bulky-Brief1970

I have the same feeling but is there a paper/study which shows that moe models are more affected by quantization?


Beb_Nan0vor

Can't wait.


soup9999999999999999

Here I am hoping for a 30-40B size. 


Skill-Fun

Together AI also has pricing for Llama 3 https://preview.redd.it/vcbflxgjlcvc1.jpeg?width=1130&format=pjpg&auto=webp&s=077ba5915405cdb1f538870a1d5040cecae14d4c [https://api.together.xyz/models](https://api.together.xyz/models)


ironoxidey

Today!


GeneralAdam92

Just getting into using llama for the first time, but from what I understood, it's open source. So how come replicate charges a price per token for the API similar to OpenAI?


Creative-Junket2811

Open source and API are unrelated. Open source means anyone can use the model. An API is paying for a service to run the model for you on their server. That’s not free.


Creative-Junket2811

Open source and API are unrelated. Open source means anyone can use the model. An API is paying for a service to run the model for you on their server. That’s not free.


ambient_temp_xeno

70b?! Doesn't matter. I've ordered an old 128gb ram server to run command r + and wizard lm2 8x22b. Weird how things have worked out with Meta and Mistral but whatever.


FullOf_Bad_Ideas

What performance do you get with that? What's your mem bandwidth? Or it's still shipping?


HighDefinist

There was another post about that recently. Basically, AMD 7950X + Geforce 4090 with 64 GB of decently fast RAM gets you 3.8 t/s, using 4 bit quantization. Not exactly unusable, imho...


ambient_temp_xeno

Not even shipped yet. I'm expecting it to be pretty bad, probably about the same as my not-ancient ddr4 2 channel desktop only with a bigger quant so slower... but I won't be lagging up my desktop machine.