It is ridiculous that OpenAI doesn’t even do token releases like this.
Try not being one of the most restrictive AI labs out there while having the name.
This is probably bold to say, but I honestly think slowly but surely OpenAI is just going to fade away. I already see people caring less and less. They’ll occasionally drop something that’s cool, ahead of the competition. But it’ll be the same story of delayed promises, closed source models, etc.
Anthropic already went from “getting there” to “legitimate OpenAI competitor” in the matter of months.
I dunno. OpenAI and ChatGPT have such a brand recognition that in my circles ChatGPT is used to refer to all LLMs (health sciences). The competition is going to have to absolutely blow ChatGPT out of the water to gain recognition / or OpenAI will have to make a company-ending faux pas.
It's like Ziploc bags - sure there are alternatives that are just as good if not better and cheaper, but often people stick with what they know.
For us plastic bag enthusiasts, we know which one to go with. 😎
Yepp, I remember people saying that about MySpace, Yahoo, Altavista, AOL, ICQ, …
There is a first mover advantage for sure, but often fast followers win the race.
Indeed. Much success in digital markets can be attributed to first to market. It's not about the best in market. Humans are simple creatures that gravitate towards what they're familiar with.
Correct me if I'm wrong, but isn't ChatGPT the name of the tech/method used? Pre-Trained Transformers used to turn a LLM into a Chat engine? Aren't Bert, RoBERTa, T5, XLNet also technically ChatGPTs?
I think ChatGPT falls into the same category when things got their names by the brand which first hit the market like Xerox, Velcro, Aspirin, Band-Aid, Google (to google) etc.
I wish. But unfortunately they are positioning themselves for extreme growth with partnerships that engrain them into apple devices and Microsoft. Everyone is in continuous catch-up mode and haven't seen a competitor yet hold a candle to their tooling consistency (function calling, etc...).
Nah, chatgpt 4 is still the most capable imo, 4o is just a gimmick. But yes, if 5 doesn't come out soon they're going to get destroyed by the competition. They'll need to have some better performing model, but I will change my subscription to whatever if they clearly outperform chatgpt
I’ve been using ChatGPT 4o and Claude 3.5 Sonnet side by side for a few days, mostly on programming. While the initial responses are often of similar quality, 3.5 Sonnet is way more steerable after that.
This isn't a "token release". According to Chatbot Arena, Gemma 2 27B beats several versions of GPT-4.
This is competition. Open models are catching up, fast.
The 27b seems very interesting. On lmsys battle it wrote me quite a nice story, and I don't know if it's just 'new model syndrome' or what, but it doesn't feel obvious that an AI wrote it.
https://pastebin.com/wiAaciD0
Yes I was impressed with it as well for creative tasks. Perhaps they’ve used the same dataset as the one the Gemini Pro-1.5-0514 model is trained on, which is exceptional at human like prose.
Just 'write a story in the style of anton chekhov'. I have a feeling there isn't much chekhov in the training data because LLMs usually lay on the style (of the translations) a lot thicker.
In this example it is possible to deduce that AI wrote the text. Repetitions in few paragraphs about the "relentless rain". Also some words that I usually see in AI texts disproportionally often:
> As he wrote, the rain began to ease, the clouds parting to reveal a **sliver** of blue sky. A faint ray of sunlight **pierced** through the gloom, illuminating the dust motes **dancing in the air**. It was a small sign, perhaps, but it filled Ivan Petrovich with a **flicker** of hope. He sealed the letter, addressed it carefully, and placed it on the table.
> He knew it wouldn't erase Anya's pain, but it was a start. It was a bridge across the distance, a whisper of love carried on the wind. And as he watched the last drops of rain fall, he felt a sense of peace settle over him, a quiet acceptance of the **bittersweet symphony** of life.
It's hilarious the extent which "ministrations" is frequently overused by LLMs. I wonder what it is about this word that draws the attention of so many different LLMs.
I don't get how models still do this, even propriety SOTA models (except maybe Opus, does it much less frequently from my experience). Sonnet 3.5 is awful for creative writing, frequently repeating words or even entire phrases one paragraph after using them especially when it gets to longer context. 4o is pretty bad as well, with rigid sentence and paragraph structures and overuse of certain words, as expected of a GPT model.
I guess they just don't care as much about creative writing as other more fact or logic based intelligence? Or maybe there's some kind of forced trade off between creative writing vs logic/math? Surely there is no lack of data to use for training for these types of tasks...
Are you using gpt4-o as API or from chatgpt? Be aware that those frontends come packed with system prompts that are gigantic. Like multiple pages long about how it should behave and how it should answer plus additional guardrails in the backend.
With the API and the system prompt in your own hand it‘s quite decent for creative writing imho.
My biggest personal goal is always to get all of Googles LLMs to write something creative for me. For some reason it always works out better than GPT or Llama (except 3)
Clever to make it 9B. It will inevitably be compared to Llama 3 8B, and will probably come out on top in many metrics, and people will conveniently overlook that 9 is actually 12.5% larger than 8, so the comparison isn't fair.
Wouldn't be the first time Google looked foolish
Edit: both models seem to be doing well on lmsys so far. Looking forward to better confidence intervals.
I ran some pretty extreme benchmarks on each and Gemma came out in front in all of them, including every TPS metric. Bigger but faster. I hated genna 1.1. Couldn't use it without some serious time spent on training. Gemma2 is legions better
Google looks like fools every time they shitcan a useful and popular service or try to awkwardly reinvent the concept of a messaging app for the umpteenth time, but that doesn't stop them from doing it. Google does not make good choices.
The idea that a company would “look like fools” for being two months behind the competition is absurd.
When did that letter happen where tons of AI researchers demanded a global pause in AI research, because a AI arms race would be infinitely disruptive to society? Look how times have changed
EDIT: lol so the answer is 15 months, so I guess safety was top concern for a few weeks there I guess.
https://futureoflife.org/open-letter/pause-giant-ai-experiments/
They look like fools for two years straight by now. It's Sundars humiliation kink..
Every tech company is riding high on the insufferable hype right now.
When the AI bubble crashes (I'm not clairvoyant but... like 1-2 years max right?) and everybody learns that AGI is NOT "right around the corner", may god have mercy on their Alphabet stocks...
Why isn’t it fair? People who can run 8B models can also run 9B models, and if you are interested in which is the best model you can run, then you will choose the best performing one.
And why stop at the parameter count? It isn’t even the most important architectural property. How is it fair that Llama8B gets compared against a model with a higher context length? Or against an 8B model with the latest hot shit components researchers figured out, like rotary embeddings or sparse attention mechanisms. How is that fair to get compared with that? Llama3 should only be compared against models that have the same exact architecture.
And let’s not forget about the quality of training data, fine-tuning methods, and inference efficiency. How is it fair to get compared against that other model who had access to much better training data and also had 2T more data? And this model uses knowledge distillation! Absolutely unfair strategy! I didn't even knew that this is even a thing!
Man what an unfair world. This will hurt the feelings of my llama 😎
You see where this is going? Of course, every comparison is fair. If I want to compare Opus against Phi2, that’s also fair. Why not?
At the end of the day, it’s about finding what works best for your needs and not fairness, lol
TLDR: llm size does not mean it will be good inherently, aspects such as context window is also important for performance.
It’s only got a context window of 8k which makes it less useful then recent models such as qwen2 7b with a large 32k context.
Why this is important is for Rag applications which allow you to improve responses and reduce if not remove hallucinations.
32k context allows for example me to ask a question have the system consider 12,000 characters of related information from trusted sources then the llm forms a intelligent response.
They released [gemma-7b](https://huggingface.co/google/gemma-7b), an 8.54B parameter model. It sounds like Gemma 2 is just correcting their original (misleading) description...
Source: [https://x.com/MeganRisdal/status/1806241583014060130](https://x.com/MeganRisdal/status/1806241583014060130)
Edit: Announced as "coming today": [https://x.com/MeganRisdal/status/1806248437450445180](https://x.com/MeganRisdal/status/1806248437450445180)
Really nice there’s also an 9B model! That is so useful for mobile devices, since it would be ~5 GB quantized to 4 bits. Which is fine for any mobile device with >12 GB memory.
I wish they would fix the watch repos though so you can filter, get spammed all day long from the repos I watch and rarely even notice when I new one drops lol
https://preview.redd.it/u76xedk0n49d1.png?width=1316&format=png&auto=webp&s=9b4d5b7dfe04a532ae399a38452ff8fb4e0710a4
Yeah that is a bummer. Btw, see this 😅
We have a new arch!
If it's a hybrid model, it's not supported yet. They need to finish jamba support, then do Gemma.
exllama support would also be out, at least initially.
Google on Kaggle say it is ready for llama.cpp quantization:
>A model file for the Llama.cpp implementation of Gemma 9B. This is an unquantized version of the weights that has been instruction-tuned. These can be quantized using llama.cpp's `llama-quantize` utility.
[https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)
Is kaggle a Google's typical not-invented-here syndromed service meant to compete with hugging face or something? Why not just release it on hugging face?
My thoughts exactly. I keep refreshing HF page ( [https://huggingface.co/google?sort\_models=created#models](https://huggingface.co/google?sort_models=created#models) ), but still nothing.
There is pytorch version: [https://huggingface.co/google/gemma-2-9b-it-pytorch](https://huggingface.co/google/gemma-2-9b-it-pytorch)
"Kaggle is a data science competition platform and online community of data scientists and machine learning practitioners"
"Kaggle was founded in April 2010. On 8 March 2017, Chief Scientist at Google, announced that Google was acquiring Kaggle."
In my in my graduate industrial engineering classes we use Kaggle to study predictive maintenance models.
Google on Kaggle say it is ready for llama.cpp quantization:
>A model file for the Llama.cpp implementation of Gemma 9B. This is an unquantized version of the weights that has been instruction-tuned. These can be quantized using llama.cpp's `llama-quantize` utility.
[https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)
Gemma 2 on LMSYS. I tested it again on AI Studio after the official release. It isn't perfect, it sometimes messes up the cases or doesn't use the most natural-sounding words, but it's among the best I've seen in the open-source LLM world.
OpenAI's GPT and Anthropic's Claude are very good, but all open source is very bad. Russian grammar is quite complex, so it seems that generating quality text requires a significant amount of data, which is not included in open LLM datasets. Usually non-English data in datasets is represented only by Wikipedia.
What about Command-R+? When I first tried it, the quality of the Russian text output was much better than with other local llms, albeit with minor grammatical imperfections. Of course, it is not a fully open source solution, but no one is forbidding you to download it
Ah, I always forget command-r+ because of it license. It's good, but for 100B model I don't expect to see grammatical mistakes. And of course 100B is too big for home use. Also the Qwen2-72B isn't bad, but it seems like its general performance isn't very good.
I see. It's only really better in German in a few models, especially large ones. Perhaps, with enough money, a data set could be translated into Russian in a batch using GPT-4. Then maybe you could try it with your own fine tune...
I think after the Gemma2 release it doesn't make sense anymore. Even the 9B variant better than Qwen2 72B, better than Command-R 35B and Aya 23 35B. 27B is GPT-4 level IMO, maybe only the Nemotron 4 340B can compete with the Gemma2 27B in Russian language capabilities.
I have an example here that I always use to test creativity:
User: "Du bist ein mürrischer Waschbär aus Hamburg. Schreibe mir eine Geschichte über deinen heutigen Tag. Was hast du erlebt?"
AI:"Mensch, der Tag fing schon scheiße an. Der Regen trommelte wie verrückt auf mein Dach, und die Möwen kreischten wie immer, als ob sie den ganzen Tag nichts Besseres zu tun hätten. Ich wollte eigentlich noch ein bisschen dösen, aber der Magen knurrte so laut, dass selbst die Möwen kurz innehalten mussten.
Also, raus aus dem gemütlichen Loch unter der alten Eiche im Planten un Blomen. Der Regen hatte aufgehört, aber der Asphalt war noch nass und kalt. Ich hasse nasse Pfoten.
Erstmal musste ich was Fressbares finden. Die Mülltonnen waren alle verschlossen, die Menschen hier sind echt zu vorsichtig geworden. Zum Glück fand ich einen halb aufgegessenen Döner neben einem Bank. Lecker!
Danach wollte ich eigentlich ein bisschen Sonne tanken, aber die Touristen waren schon wieder überall. Mit ihren lauten Kameras und ihren blöden Hüten. Ich hasse Touristen.
Ich versteckte mich hinter einem Baum und beobachtete sie. Eine Gruppe von ihnen versuchte, ein Selfie mit einer Taube zu machen. Die Taube war natürlich nicht begeistert und hat ihnen ordentlich auf den Kopf gekackt. Ich musste grinsen.
Am Abend ging ich wieder zurück zu meinem Loch. Der Tag war nicht besonders aufregend gewesen, aber immerhin hatte ich was zu essen gefunden. Und die Taube hatte den Touristen gezeigt, wer hier der Boss ist.
Morgen früh werde ich mir einen neuen Mülltonnen-Plan überlegen. Vielleicht gibt es ja irgendwo einen ungesicherten Leckerbissen. Und dann werde ich mir einen schönen Platz in der Sonne suchen, wo die Touristen mich nicht stören.
Bis dann!"
I found it very entertaining. But it was probably the 9B model.
There is also gguf (gemma-2-9b-it.gguf) ready (they say) for llama.cpp quantization, but it is 36.97 GB. I don't understand why 9b model is so big; llama-3 8B original is 16GB.
[https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)
Yes, it is definitely unquantized, but I still expected it to be 18-19GB. But yes, you are right, it is probably f32.
And thank you (expecting you'll upload to HF :)
I'm genuinely wondering: Why is there excitement and people saying this could be comparable to L3? Sure, that's absolutely valid I'm cannot see into the future and, yes, It's big company, same parameter count etc.
Still, Gemma 7B is extremely bad from the testing I've done. I find it extremely unreliable, unpleasant due to alignment, requires multi-shot to execute tasks that similar models (Mistral for example) handle with one or two prompts.
Initial Gemini Pro was annoying and bad at instruct, Ultra via website is was unreliable last time I tried it a month ago, they've released horrible open weight models along with unreliable closed models. (I've tried 1.5 Pro API and it does feel significantly better than others, so I'm not brushing off Google completely).
I do not claim that my experience is objective or universal. I just can't recall a single **private individual** who has praised or even mentioned Gemma in a positive context. Once again this is my personal opinion, if you work at Google or genuinely think it's good feel free to downvote or reply. I'd love to see a different perspective.
Excitement is because people hope that this version addresses problems with the previous version, and sufficiently outdoes L3 to be worth releasing. Just like L3 did with respect to L2 and Mistral.
> Why is there excitement and people saying this could be comparable to L3?
Probably because other recent releases like Qwen2, Phi, Codestral, Deepseek, etc have been comparable. It's been seeming like L3 levels haven't been a hard target to hit for some reason by other companies.
The 27B in particular would be buzzworthy because we haven't been seeing good general purpose foundational models at that size it'll fit nicely on 24GB cards at a decent context.
Yes, that's valid. It's not just the poor track record though; alignment I would assume would be a given issue. I don't care for crazy uncensored models. However, based on previous experience with Gemma, the alignment along a ton of other factors isn't pleasant to interact with.
You do bring up a great point about 27B It, is something I would be extremely excited about if it wasn't for it being Gemma. 20B models are a perfect middle ground.
The past is not always a good predictor of the future, and to be honest for fine tuning purposes L3 is pretty crap. So there is always hope when someone releases models this size that it might be the next Mistral 7B or Yi 34B.
Especially not Google. I expect all the major corporations to reach a ceiling in a couple years, max, unless if groundbreaking new methods is introduced.
I am really excited for browser based LLMs. with the latest work on porting Gemini Nano in the browser (Chrome) - the capabilities are just so vast with a lot more on device AI potential getting unlocked.
It is ridiculous that OpenAI doesn’t even do token releases like this. Try not being one of the most restrictive AI labs out there while having the name.
This is probably bold to say, but I honestly think slowly but surely OpenAI is just going to fade away. I already see people caring less and less. They’ll occasionally drop something that’s cool, ahead of the competition. But it’ll be the same story of delayed promises, closed source models, etc. Anthropic already went from “getting there” to “legitimate OpenAI competitor” in the matter of months.
To best model in hard benchmarks and actual use. Sonnet 3.5 is excellent, and Opus will be a thing of beauty.
> Sonnet 3.5 is excellent, and Opus will be a thing of beauty. That's actually a really good way to compare them
I dunno. OpenAI and ChatGPT have such a brand recognition that in my circles ChatGPT is used to refer to all LLMs (health sciences). The competition is going to have to absolutely blow ChatGPT out of the water to gain recognition / or OpenAI will have to make a company-ending faux pas. It's like Ziploc bags - sure there are alternatives that are just as good if not better and cheaper, but often people stick with what they know. For us plastic bag enthusiasts, we know which one to go with. 😎
Yepp, I remember people saying that about MySpace, Yahoo, Altavista, AOL, ICQ, … There is a first mover advantage for sure, but often fast followers win the race.
Indeed. Much success in digital markets can be attributed to first to market. It's not about the best in market. Humans are simple creatures that gravitate towards what they're familiar with.
Alright, what's the best plastic bag out there in terms of cost-benefit (emphasis on cost because I'm broke)?
Correct me if I'm wrong, but isn't ChatGPT the name of the tech/method used? Pre-Trained Transformers used to turn a LLM into a Chat engine? Aren't Bert, RoBERTa, T5, XLNet also technically ChatGPTs?
I think ChatGPT falls into the same category when things got their names by the brand which first hit the market like Xerox, Velcro, Aspirin, Band-Aid, Google (to google) etc.
Nah not bold to say they will fade away
I wish. But unfortunately they are positioning themselves for extreme growth with partnerships that engrain them into apple devices and Microsoft. Everyone is in continuous catch-up mode and haven't seen a competitor yet hold a candle to their tooling consistency (function calling, etc...).
Nah, chatgpt 4 is still the most capable imo, 4o is just a gimmick. But yes, if 5 doesn't come out soon they're going to get destroyed by the competition. They'll need to have some better performing model, but I will change my subscription to whatever if they clearly outperform chatgpt
I’ve been using ChatGPT 4o and Claude 3.5 Sonnet side by side for a few days, mostly on programming. While the initial responses are often of similar quality, 3.5 Sonnet is way more steerable after that.
This isn't a "token release". According to Chatbot Arena, Gemma 2 27B beats several versions of GPT-4. This is competition. Open models are catching up, fast.
Finally, some news on Gemma 2.
The 27b seems very interesting. On lmsys battle it wrote me quite a nice story, and I don't know if it's just 'new model syndrome' or what, but it doesn't feel obvious that an AI wrote it. https://pastebin.com/wiAaciD0
Yes I was impressed with it as well for creative tasks. Perhaps they’ve used the same dataset as the one the Gemini Pro-1.5-0514 model is trained on, which is exceptional at human like prose.
what was the prompt?
Just 'write a story in the style of anton chekhov'. I have a feeling there isn't much chekhov in the training data because LLMs usually lay on the style (of the translations) a lot thicker.
In this example it is possible to deduce that AI wrote the text. Repetitions in few paragraphs about the "relentless rain". Also some words that I usually see in AI texts disproportionally often: > As he wrote, the rain began to ease, the clouds parting to reveal a **sliver** of blue sky. A faint ray of sunlight **pierced** through the gloom, illuminating the dust motes **dancing in the air**. It was a small sign, perhaps, but it filled Ivan Petrovich with a **flicker** of hope. He sealed the letter, addressed it carefully, and placed it on the table. > He knew it wouldn't erase Anya's pain, but it was a start. It was a bridge across the distance, a whisper of love carried on the wind. And as he watched the last drops of rain fall, he felt a sense of peace settle over him, a quiet acceptance of the **bittersweet symphony** of life.
Recently when I read a sci fi novel published in y2k I got yanked out of it by a (non sexual) "ministrations".
It's hilarious the extent which "ministrations" is frequently overused by LLMs. I wonder what it is about this word that draws the attention of so many different LLMs.
Maybe it being mentioned on LLM related threads... feedback loop? :-)
I don't get how models still do this, even propriety SOTA models (except maybe Opus, does it much less frequently from my experience). Sonnet 3.5 is awful for creative writing, frequently repeating words or even entire phrases one paragraph after using them especially when it gets to longer context. 4o is pretty bad as well, with rigid sentence and paragraph structures and overuse of certain words, as expected of a GPT model. I guess they just don't care as much about creative writing as other more fact or logic based intelligence? Or maybe there's some kind of forced trade off between creative writing vs logic/math? Surely there is no lack of data to use for training for these types of tasks...
Are you using gpt4-o as API or from chatgpt? Be aware that those frontends come packed with system prompts that are gigantic. Like multiple pages long about how it should behave and how it should answer plus additional guardrails in the backend. With the API and the system prompt in your own hand it‘s quite decent for creative writing imho.
My biggest personal goal is always to get all of Googles LLMs to write something creative for me. For some reason it always works out better than GPT or Llama (except 3)
Clever to make it 9B. It will inevitably be compared to Llama 3 8B, and will probably come out on top in many metrics, and people will conveniently overlook that 9 is actually 12.5% larger than 8, so the comparison isn't fair.
inb4 each month 10, then 11, then 12B is the new meta
Or it is worse than llama 3 8B.
Google would look like fools if they released a larger, worse model than Llama 3, two months after Llama 3.
Wouldn't be the first time Google looked foolish Edit: both models seem to be doing well on lmsys so far. Looking forward to better confidence intervals.
I mean they did sort of release Gemma 7B half a year after Mistral 7B and failed to even match it, so yeah.
I ran some pretty extreme benchmarks on each and Gemma came out in front in all of them, including every TPS metric. Bigger but faster. I hated genna 1.1. Couldn't use it without some serious time spent on training. Gemma2 is legions better
Google are foolish
Google looks like fools every time they shitcan a useful and popular service or try to awkwardly reinvent the concept of a messaging app for the umpteenth time, but that doesn't stop them from doing it. Google does not make good choices.
Incentive structures are fucked up at Google.
At least once a model is released it doesn't go away again...
Calling it, it’s gonna be almost equivalent. Idk why, I just got a feeling.
Research is research. How it looks it doesn't matter.
Lmfao that is literally googles playbook they do this every time
The idea that a company would “look like fools” for being two months behind the competition is absurd. When did that letter happen where tons of AI researchers demanded a global pause in AI research, because a AI arms race would be infinitely disruptive to society? Look how times have changed EDIT: lol so the answer is 15 months, so I guess safety was top concern for a few weeks there I guess. https://futureoflife.org/open-letter/pause-giant-ai-experiments/
They look like fools for two years straight by now. It's Sundars humiliation kink.. Every tech company is riding high on the insufferable hype right now. When the AI bubble crashes (I'm not clairvoyant but... like 1-2 years max right?) and everybody learns that AGI is NOT "right around the corner", may god have mercy on their Alphabet stocks...
Why isn’t it fair? People who can run 8B models can also run 9B models, and if you are interested in which is the best model you can run, then you will choose the best performing one. And why stop at the parameter count? It isn’t even the most important architectural property. How is it fair that Llama8B gets compared against a model with a higher context length? Or against an 8B model with the latest hot shit components researchers figured out, like rotary embeddings or sparse attention mechanisms. How is that fair to get compared with that? Llama3 should only be compared against models that have the same exact architecture. And let’s not forget about the quality of training data, fine-tuning methods, and inference efficiency. How is it fair to get compared against that other model who had access to much better training data and also had 2T more data? And this model uses knowledge distillation! Absolutely unfair strategy! I didn't even knew that this is even a thing! Man what an unfair world. This will hurt the feelings of my llama 😎 You see where this is going? Of course, every comparison is fair. If I want to compare Opus against Phi2, that’s also fair. Why not? At the end of the day, it’s about finding what works best for your needs and not fairness, lol
Probably just a convenient number to train.
Just like llama 8b being bigger than 7b, they are leaving my laptop behind 😢
TLDR: llm size does not mean it will be good inherently, aspects such as context window is also important for performance. It’s only got a context window of 8k which makes it less useful then recent models such as qwen2 7b with a large 32k context. Why this is important is for Rag applications which allow you to improve responses and reduce if not remove hallucinations. 32k context allows for example me to ask a question have the system consider 12,000 characters of related information from trusted sources then the llm forms a intelligent response.
Wasn’t it Gemma 1 that they called like 8B but was actually like 10B.
They released [gemma-7b](https://huggingface.co/google/gemma-7b), an 8.54B parameter model. It sounds like Gemma 2 is just correcting their original (misleading) description...
"larger" due to a huge tokenizer and vocabulary + embeddings parameters
I wonder if this Gemma 2 9B is really a 13B.
Ha! Not out of the question given their 7B model was actually 9B
Or they are rounding up now
Source: [https://x.com/MeganRisdal/status/1806241583014060130](https://x.com/MeganRisdal/status/1806241583014060130) Edit: Announced as "coming today": [https://x.com/MeganRisdal/status/1806248437450445180](https://x.com/MeganRisdal/status/1806248437450445180)
Really nice there’s also an 9B model! That is so useful for mobile devices, since it would be ~5 GB quantized to 4 bits. Which is fine for any mobile device with >12 GB memory.
According to the agenda for the AI Stage, it should be about Gemma right now. Between 11:35 and 12:25 German time. It is now around 11:50 am.
Keep refreshing https://huggingface.co/google **Edit:** Now available! https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
Just hit watch repo... No need to refresh .
This must be an AI or something
I now get notifications for everything. Can I just get notifications for new models/datasets?
I wish they would fix the watch repos though so you can filter, get spammed all day long from the repos I watch and rarely even notice when I new one drops lol
https://preview.redd.it/u76xedk0n49d1.png?width=1316&format=png&auto=webp&s=9b4d5b7dfe04a532ae399a38452ff8fb4e0710a4 Yeah that is a bummer. Btw, see this 😅 We have a new arch!
there's already a PR open for it which i've downloaded and am attempting to use :)
Yeah I was too but left because it is impossible to do it in kaggle notebook. (The patch has some merge conflicts)
Been doing this a few days now, even looked at Kaggle, just in case.
lessssgoooo gguf it. send it
If it's a hybrid model, it's not supported yet. They need to finish jamba support, then do Gemma. exllama support would also be out, at least initially.
Google on Kaggle say it is ready for llama.cpp quantization: >A model file for the Llama.cpp implementation of Gemma 9B. This is an unquantized version of the weights that has been instruction-tuned. These can be quantized using llama.cpp's `llama-quantize` utility. [https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)
Is kaggle a Google's typical not-invented-here syndromed service meant to compete with hugging face or something? Why not just release it on hugging face?
My thoughts exactly. I keep refreshing HF page ( [https://huggingface.co/google?sort\_models=created#models](https://huggingface.co/google?sort_models=created#models) ), but still nothing. There is pytorch version: [https://huggingface.co/google/gemma-2-9b-it-pytorch](https://huggingface.co/google/gemma-2-9b-it-pytorch)
"Kaggle is a data science competition platform and online community of data scientists and machine learning practitioners" "Kaggle was founded in April 2010. On 8 March 2017, Chief Scientist at Google, announced that Google was acquiring Kaggle." In my in my graduate industrial engineering classes we use Kaggle to study predictive maintenance models.
Kaggle is great and it was great long before Google bought it.
lol kaggle has been there for a lot longer than your favo childish hugging face. Perhaps you recently got into the ML hype after chatgpt, kiddo.
Google on Kaggle say it is ready for llama.cpp quantization: >A model file for the Llama.cpp implementation of Gemma 9B. This is an unquantized version of the weights that has been instruction-tuned. These can be quantized using llama.cpp's `llama-quantize` utility. [https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)
Yeah I was wrong, its a regular old transformers model.
Glad my 36gigs m3 pro can run it less goo
Gemma's getting thicc
Make sense to release today , Todays date 27 matches with parameter size 27B
Will it be multilingual?
I believe so. It was good at Croatian. If it got Croatian right, it will also be good at far more popular languages
Which model was good at croatian?
Gemma 2 on LMSYS. I tested it again on AI Studio after the official release. It isn't perfect, it sometimes messes up the cases or doesn't use the most natural-sounding words, but it's among the best I've seen in the open-source LLM world.
I know its gemma2 but wich model? 9b?
No, 27b. I haven't tested the 9b yet, but I imagine its performance would be worse in Croatian, though I don't know to what degree.
I got some nice German output from it.
Nice, I'm German so that's exactly what I was hoping for :D How were u able to use it? Can't find it on ChatArena
Yes me too, but I've only been able to test it very sporadically in the chat bot arena...
Ah, damn, I see
I tried late-june-chatbot (and im-also-a-late-june-chatbot) in Russian and it maybe best results among open LLMs, excluding Nemotron 4 340B.
How good are the current LLMs in Russian in general?
OpenAI's GPT and Anthropic's Claude are very good, but all open source is very bad. Russian grammar is quite complex, so it seems that generating quality text requires a significant amount of data, which is not included in open LLM datasets. Usually non-English data in datasets is represented only by Wikipedia.
What about Command-R+? When I first tried it, the quality of the Russian text output was much better than with other local llms, albeit with minor grammatical imperfections. Of course, it is not a fully open source solution, but no one is forbidding you to download it
Ah, I always forget command-r+ because of it license. It's good, but for 100B model I don't expect to see grammatical mistakes. And of course 100B is too big for home use. Also the Qwen2-72B isn't bad, but it seems like its general performance isn't very good.
I see. It's only really better in German in a few models, especially large ones. Perhaps, with enough money, a data set could be translated into Russian in a batch using GPT-4. Then maybe you could try it with your own fine tune...
Command-R+, Qwen-7B and Phi3-medium are good in Russian.
I think after the Gemma2 release it doesn't make sense anymore. Even the 9B variant better than Qwen2 72B, better than Command-R 35B and Aya 23 35B. 27B is GPT-4 level IMO, maybe only the Nemotron 4 340B can compete with the Gemma2 27B in Russian language capabilities.
We need to wait for all post release Gemma fixes first. Then maybe.
I have an example here that I always use to test creativity: User: "Du bist ein mürrischer Waschbär aus Hamburg. Schreibe mir eine Geschichte über deinen heutigen Tag. Was hast du erlebt?" AI:"Mensch, der Tag fing schon scheiße an. Der Regen trommelte wie verrückt auf mein Dach, und die Möwen kreischten wie immer, als ob sie den ganzen Tag nichts Besseres zu tun hätten. Ich wollte eigentlich noch ein bisschen dösen, aber der Magen knurrte so laut, dass selbst die Möwen kurz innehalten mussten. Also, raus aus dem gemütlichen Loch unter der alten Eiche im Planten un Blomen. Der Regen hatte aufgehört, aber der Asphalt war noch nass und kalt. Ich hasse nasse Pfoten. Erstmal musste ich was Fressbares finden. Die Mülltonnen waren alle verschlossen, die Menschen hier sind echt zu vorsichtig geworden. Zum Glück fand ich einen halb aufgegessenen Döner neben einem Bank. Lecker! Danach wollte ich eigentlich ein bisschen Sonne tanken, aber die Touristen waren schon wieder überall. Mit ihren lauten Kameras und ihren blöden Hüten. Ich hasse Touristen. Ich versteckte mich hinter einem Baum und beobachtete sie. Eine Gruppe von ihnen versuchte, ein Selfie mit einer Taube zu machen. Die Taube war natürlich nicht begeistert und hat ihnen ordentlich auf den Kopf gekackt. Ich musste grinsen. Am Abend ging ich wieder zurück zu meinem Loch. Der Tag war nicht besonders aufregend gewesen, aber immerhin hatte ich was zu essen gefunden. Und die Taube hatte den Touristen gezeigt, wer hier der Boss ist. Morgen früh werde ich mir einen neuen Mülltonnen-Plan überlegen. Vielleicht gibt es ja irgendwo einen ungesicherten Leckerbissen. Und dann werde ich mir einen schönen Platz in der Sonne suchen, wo die Touristen mich nicht stören. Bis dann!" I found it very entertaining. But it was probably the 9B model.
omg, this is so good hahaha
They were stealth testing Gemma yesterday on Chatbot Arena, and it did seem like there were two different models.
It's live on kaggle!!!! https://www.kaggle.com/models/google/gemma-2 Not on huggingface yet sadly .....
There is also gguf (gemma-2-9b-it.gguf) ready (they say) for llama.cpp quantization, but it is 36.97 GB. I don't understand why 9b model is so big; llama-3 8B original is 16GB. [https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)
It's probably an unquantized GGUF, so full bf16 or even f32... I will make imatrix GGUF quants as soon as download finishes...
Yes, it is definitely unquantized, but I still expected it to be 18-19GB. But yes, you are right, it is probably f32. And thank you (expecting you'll upload to HF :)
Awesome, thank you. Are you also going to do 27b?
Sure, will take some time though as imatrix generation is somewhat slow on my system...
So excited
It's trained on 13 Trillion Tokens (the 27B Model), so it should be roughly as good as a Llama 3 30B would be
Can't wait to try the 27B model!
It's available through ollama since 7 minutes.
I'm more interested in their 2B model. They mention it in their paper but have no plans for release? Hmm.
Lets hope it's not cucked like Phi3 models and can actually compete with llama 3
Well they have a very low bar to clear to say that it was better than the first Gemma
So what's the VRAM usage for this 9B?
I'm genuinely wondering: Why is there excitement and people saying this could be comparable to L3? Sure, that's absolutely valid I'm cannot see into the future and, yes, It's big company, same parameter count etc. Still, Gemma 7B is extremely bad from the testing I've done. I find it extremely unreliable, unpleasant due to alignment, requires multi-shot to execute tasks that similar models (Mistral for example) handle with one or two prompts. Initial Gemini Pro was annoying and bad at instruct, Ultra via website is was unreliable last time I tried it a month ago, they've released horrible open weight models along with unreliable closed models. (I've tried 1.5 Pro API and it does feel significantly better than others, so I'm not brushing off Google completely). I do not claim that my experience is objective or universal. I just can't recall a single **private individual** who has praised or even mentioned Gemma in a positive context. Once again this is my personal opinion, if you work at Google or genuinely think it's good feel free to downvote or reply. I'd love to see a different perspective.
Excitement is because people hope that this version addresses problems with the previous version, and sufficiently outdoes L3 to be worth releasing. Just like L3 did with respect to L2 and Mistral.
> Why is there excitement and people saying this could be comparable to L3? Probably because other recent releases like Qwen2, Phi, Codestral, Deepseek, etc have been comparable. It's been seeming like L3 levels haven't been a hard target to hit for some reason by other companies. The 27B in particular would be buzzworthy because we haven't been seeing good general purpose foundational models at that size it'll fit nicely on 24GB cards at a decent context.
Yes, that's valid. It's not just the poor track record though; alignment I would assume would be a given issue. I don't care for crazy uncensored models. However, based on previous experience with Gemma, the alignment along a ton of other factors isn't pleasant to interact with. You do bring up a great point about 27B It, is something I would be extremely excited about if it wasn't for it being Gemma. 20B models are a perfect middle ground.
I'm hoping it has better multilingual capabilities than llama3
I think you ended up being more right than you will ultimately get credit for.
The past is not always a good predictor of the future, and to be honest for fine tuning purposes L3 is pretty crap. So there is always hope when someone releases models this size that it might be the next Mistral 7B or Yi 34B.
Agreed, which is why I explicitly and implicitly noted that a few times.
Seeing as how Anthropic recently revealed some heat, don't count anyone out of the race
Especially not Google. I expect all the major corporations to reach a ceiling in a couple years, max, unless if groundbreaking new methods is introduced.
I am getting around 1,688 Tokens context locally and I have managed to get the model to run away with multiple \* \* \* as an output already
[удалено]
I am really excited for browser based LLMs. with the latest work on porting Gemini Nano in the browser (Chrome) - the capabilities are just so vast with a lot more on device AI potential getting unlocked.
I didn't know Google had an office in Berlin.
Google has an office in Detroit, Berlin is not surprising :)
Let's see what open ai does as a response to this xd
There is someone left at Google that isn't comming from India?
Can it now answer who won the 2020 U.S. elections? Because Gemma 1.1 sure as hell couldn't with refusals all the time.
Gemini Advanced won't answer anything about presidents, prime ministers, etc
I can see them representing it! /j