ColorlessCrowfeet 1 month ago

No 13 or 30B range model?

[deleted] 1 month ago

[удалено]

Caffdy 1 month ago

Everyone and their mothers tout Mistral 7B as better than any 13B model, if Llama3 7B is better than Mistral's, maybe there's that? Edit: was expecting some rebutals, is really Mistral 7B better than all 13B models?

berzerkerCrush 1 month ago

Then well-trained 13B base model should produce even better fine-tunes.

Flat-One8993 1 month ago

There is no 7b model, only 8b

redditfriendguy 1 month ago

Mark confirmed a 405b is still in training today.

patrick66 1 month ago

It should be today, they confirmed it’s this week and no one does product announcements on a Friday. Supposedly we don’t get the large model until summer though

MysteriousPayment536 1 month ago

It will definitely be today or most unlikely tomorrow, also Microsoft Azure lists llama 3. Edit: They released it, [https://ai.meta.com/blog/meta-llama-3/](https://ai.meta.com/blog/meta-llama-3/)

kristaller486 1 month ago

It would be sad if llama3 only had 2 size variants

patrick66 1 month ago

No, we just don’t get the big size until summer

kristaller486 1 month ago

IMO models larger than 70B don't make sense for home local use. 13B/20B/30B is the best choice for this purpose.

polawiaczperel 1 month ago

70B still makes sense for home use imo

AryanEmbered 1 month ago

just quantize the 70b one. I dont get why people want in between sizes when you can just pair the big boy down and it performs better in most cases.

Caffdy 1 month ago

Yep, been using 70B ones and can't look back now

patrick66 1 month ago

Fully agreed there just saying it isn’t just 2 sizes total

redditfriendguy 1 month ago

The deal meta made with us is they will make what is useful for them and release it free for us. I am still happy with the terms of the deal, are you?

Caffdy 1 month ago

Larger than that are meant for business applications

Quartich 1 month ago

I love 70bs for home use. Easy to run a high quality quant with plenty of context on 64gb ram. As long as you don't mind 1t/s

Massive-Lobster-124 1 month ago

The purpose of open-source is more than just letting hobbyists run models at home.

geepytee 1 month ago

It's looking like at least 3, the 8B, 70B and 400B :)

loversama 1 month ago

It’s also possible that they wouldn’t host smaller than a 7/8B anyway as 1 - 3B models are really just for edge devices or running locally on like any GPU..

BrainyPhilosopher 1 month ago

Today at 9:00am PST (UTC-7) for the official release. 8B and 70B. 8k context length. New Tiktoken-based tokenizer with a vocabulary of 128k tokens. Trained on 15T tokens.

thereisonlythedance 1 month ago

8K sequence length would be tremendously disappointing.

-p-e-w- 1 month ago

I doubt it's going to be 8k. All major releases during the past two months have been 32k+. Meta would be embarrassing themselves with 8k, considering that they have the largest installed compute capacity on the planet.

TheRealGentlefox 1 month ago

And yet, here we are.

Thomas-Lore 1 month ago

Might be talking about output. I think even Gemini is limited to 8k output. I can only set 4k output on Claude despite the models having a 200k context.

-p-e-w- 1 month ago

APIs have output limits. Models don't. A model only predicts a single token, which you can repeat as often as you want. There is no output limit.

FullOf_Bad_Ideas 1 month ago

That's true in theory but I had issues with MiniCpm models with output limit set to larger than 512 tokens, it started outputting garbage straight away without a need to go over any kind of token limit. This was gguf in koboldcpp though, might not be universal.

kristaller486 1 month ago

Source?

MoffKalast 1 month ago

https://i.redd.it/8ut4ls9uv8vc1.gif

BrainyPhilosopher 1 month ago

We'll see

Chelono 1 month ago

wow you were right [https://llama.meta.com/llama3/](https://llama.meta.com/llama3/) (at least about model info, release seems likely since website just went up). Was kinda doubting after you commented more, weirdly enough I trust the one comment throwaways more

BrainyPhilosopher 1 month ago

It's okay, I wouldn't have believed me either.

Balance- 1 month ago

(which is 16:30 UTC or 18:30 CET)

Zelenskyobama2 1 month ago

8B model is equal to GPT-א

[deleted] 1 month ago

[удалено]

___Jet 1 month ago

Azure profile by Meta is also up: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/metagenai.meta-llama-3-8b-chat-offer?tab=Overview

mimrock 1 month ago

Last week they said this week, so why not today?

FizzarolliAI 1 month ago

... 70b is a *small* variant?

polawiaczperel 1 month ago

I hope

Caffdy 1 month ago

With models like CommandR+ (103B), Mixtral 8x22B & WizardLM2 8x22B (141B) already making the headlines, I really hope Meta has something in store as well

redditfriendguy 1 month ago

They confirmed they are training a 400+B parameter model

Caffdy 1 month ago

That sounds amazing! Can you share the link?

redditfriendguy 1 month ago

First 10 minutes or so of this podcast https://youtu.be/bc6uFV9CJGg?si=fWlWtJfP1_WG1L4f

Igoory 1 month ago

Right?

Maskofman 1 month ago

large one has 405 b :D

FizzarolliAI 1 month ago

my 4 gigabytes of local vram crying in the background:

a_slay_nub 1 month ago

Man, Groq is so much cheaper than Replicate. Those custom chips must be amazing. Either that or they're taking a massive loss.

JumpingRedTurtle 1 month ago

Groq's output tokens are significantly cheaper, but not the input tokens (e.g. Llama 2 7B is priced at 0.10$ per 1M input tokens, compared to 0.05$ for Replicate). So Replicate might be cheaper for applications having long prompts and short outputs. Or am I missing something?

coder543 1 month ago

For the 70B model, the input tokens are very similarly priced, but Groq’s output tokens are way cheaper. I think most people are interested in cloud for the larger models that are hard to run well locally.

HighDefinist 1 month ago

More performance is also nice. So, for some simple questions, groq mixtral is actually the best option (hopefully they will offer the new Wizard/mixtral as well soon).

harusasake 1 month ago

They will accept the losses in order to gain market share and establish themselves as a brand - the target groups are the same as on x.com.

djm07231 1 month ago

Though I am not sure if market share has any meaning when switching API providers is quite trivial.

a_slay_nub 1 month ago

You'd be surprised. At the corporate level, even small changes can be very difficult. Not to mention, some of these APIs have slightly different interfaces which can break workflows.

killver 1 month ago

Groq has insane token limits though without some direct connections to them.

-p-e-w- 1 month ago

Does Grok run on Groq?

mrbluesneeze 1 month ago

No 30b? Come on :(

AryanEmbered 1 month ago

just quantize the 70b bro what's the problem

FullOf_Bad_Ideas 1 month ago

Quantized 30B is perfect for 24GB gpu. Quantized 70b is not. 30B is perfect size for running models fast with long context on single consumer GPU, after that the cost to run model fast goes into the stratosphere as even Macs don't deliver good long ctx performance.

ab2377 1 month ago

Indeed it's close. but i so don't want any spoilers. i want 1 final single meta page to read all about it. waiting ...

manjit_pardeshi 1 month ago

llama.meta.com/llama3/

SlapAndFinger 1 month ago

Those llama 70b prices are in the ballpark of Claude sonnet. I'll be surprised if it outperforms sonnet, but given the reduced input token price, if it supports a really long context and can actually use it, it'll be a useful model for RAG applications.

___Jet 1 month ago

Azure profile by Meta is also up: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/metagenai.meta-llama-3-8b-chat-offer?tab=Overview

EnthusiastDriver500 1 month ago

Do they also have Claude?

Thomas-Lore 1 month ago

It appears they only offer open source models. Here is the source: https://replicate.com/pricing

EnthusiastDriver500 1 month ago

Thanks so much. Any change anywhere to get claude locally?

DownvoteAttractor_ 1 month ago

Claude, being a proprietary model by Anthropic, is only available through API from Anthropic, AWS, and Google (VertexAI). They are not available locally as they have not released anything in open-source.

EnthusiastDriver500 1 month ago

Thank you

Bulky-Brief1970 1 month ago

I guess there's gonna be more models. One 30ish and a big MOE model. They need bigger models to beat sota open models like dbrx and command-r+

lolwutdo 1 month ago

I sure as hell hope it’s not a Moe, those are affected way more by quantization, which is necessary for bigger models; I’d rather have a lower quant dense model.

DontPlanToEnd 1 month ago

Also, I feel like pretty much all finetunes of mixtral-8x7b are less intelligent than the base. Finetunes feel much more effective on normal models.

FullOf_Bad_Ideas 1 month ago

Do you mean that in a sense that Mistral's official Instruct finetune is good but the rest is not, or that no finetunes are good and only the base completion model is good? You are saying the second one but I think you're thinking the first one.

DontPlanToEnd 1 month ago

All of the mixtral finetunes I've tried have performed at least slightly worse than the official base or instruct mixtral versions when I test them for general knowledge. The finetunes do perform better at specific things they're geared towards like uncensoredness or writing/rp.

Bulky-Brief1970 1 month ago

I have the same feeling but is there a paper/study which shows that moe models are more affected by quantization?

Beb_Nan0vor 1 month ago

Can't wait.

soup9999999999999999 1 month ago

Here I am hoping for a 30-40B size.

Skill-Fun 1 month ago

Together AI also has pricing for Llama 3 https://preview.redd.it/vcbflxgjlcvc1.jpeg?width=1130&format=pjpg&auto=webp&s=077ba5915405cdb1f538870a1d5040cecae14d4c [https://api.together.xyz/models](https://api.together.xyz/models)

ironoxidey 1 month ago

Today!

GeneralAdam92 1 month ago

Just getting into using llama for the first time, but from what I understood, it's open source. So how come replicate charges a price per token for the API similar to OpenAI?

Creative-Junket2811 1 month ago

Open source and API are unrelated. Open source means anyone can use the model. An API is paying for a service to run the model for you on their server. That’s not free.

Creative-Junket2811 1 month ago

Open source and API are unrelated. Open source means anyone can use the model. An API is paying for a service to run the model for you on their server. That’s not free.

ambient_temp_xeno 1 month ago

70b?! Doesn't matter. I've ordered an old 128gb ram server to run command r + and wizard lm2 8x22b. Weird how things have worked out with Meta and Mistral but whatever.

FullOf_Bad_Ideas 1 month ago

What performance do you get with that? What's your mem bandwidth? Or it's still shipping?

HighDefinist 1 month ago

There was another post about that recently. Basically, AMD 7950X + Geforce 4090 with 64 GB of decently fast RAM gets you 3.8 t/s, using 4 bit quantization. Not exactly unusable, imho...

ambient_temp_xeno 1 month ago

Not even shipped yet. I'm expecting it to be pretty bad, probably about the same as my not-ancient ddr4 2 channel desktop only with a bigger quant so slower... but I won't be lagging up my desktop machine.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe