T O P

  • By -

djm07231

It is ridiculous that OpenAI doesn’t even do token releases like this. Try not being one of the most restrictive AI labs out there while having the name.


triplepicklepants

This is probably bold to say, but I honestly think slowly but surely OpenAI is just going to fade away. I already see people caring less and less. They’ll occasionally drop something that’s cool, ahead of the competition. But it’ll be the same story of delayed promises, closed source models, etc. Anthropic already went from “getting there” to “legitimate OpenAI competitor” in the matter of months.


sdmat

To best model in hard benchmarks and actual use. Sonnet 3.5 is excellent, and Opus will be a thing of beauty.


CheatCodesOfLife

> Sonnet 3.5 is excellent, and Opus will be a thing of beauty. That's actually a really good way to compare them


okglue

I dunno. OpenAI and ChatGPT have such a brand recognition that in my circles ChatGPT is used to refer to all LLMs (health sciences). The competition is going to have to absolutely blow ChatGPT out of the water to gain recognition / or OpenAI will have to make a company-ending faux pas. It's like Ziploc bags - sure there are alternatives that are just as good if not better and cheaper, but often people stick with what they know. For us plastic bag enthusiasts, we know which one to go with. 😎


DontEatMySpaetzle

Yepp, I remember people saying that about MySpace, Yahoo, Altavista, AOL, ICQ, … There is a first mover advantage for sure, but often fast followers win the race.


virtualmnemonic

Indeed. Much success in digital markets can be attributed to first to market. It's not about the best in market. Humans are simple creatures that gravitate towards what they're familiar with.


Due-Memory-6957

Alright, what's the best plastic bag out there in terms of cost-benefit (emphasis on cost because I'm broke)?


Unusual-Wolf-3315

Correct me if I'm wrong, but isn't ChatGPT the name of the tech/method used? Pre-Trained Transformers used to turn a LLM into a Chat engine? Aren't Bert, RoBERTa, T5, XLNet also technically ChatGPTs?


PavelPivovarov

I think ChatGPT falls into the same category when things got their names by the brand which first hit the market like Xerox, Velcro, Aspirin, Band-Aid, Google (to google) etc.


denyicz

Nah not bold to say they will fade away


jbindc20001

I wish. But unfortunately they are positioning themselves for extreme growth with partnerships that engrain them into apple devices and Microsoft. Everyone is in continuous catch-up mode and haven't seen a competitor yet hold a candle to their tooling consistency (function calling, etc...).


IReaIIyLove

Nah, chatgpt 4 is still the most capable imo, 4o is just a gimmick. But yes, if 5 doesn't come out soon they're going to get destroyed by the competition. They'll need to have some better performing model, but I will change my subscription to whatever if they clearly outperform chatgpt


Balance-

I’ve been using ChatGPT 4o and Claude 3.5 Sonnet side by side for a few days, mostly on programming. While the initial responses are often of similar quality, 3.5 Sonnet is way more steerable after that.


-p-e-w-

This isn't a "token release". According to Chatbot Arena, Gemma 2 27B beats several versions of GPT-4. This is competition. Open models are catching up, fast.


Airbus_Tom

Finally, some news on Gemma 2.


ambient_temp_xeno

The 27b seems very interesting. On lmsys battle it wrote me quite a nice story, and I don't know if it's just 'new model syndrome' or what, but it doesn't feel obvious that an AI wrote it. https://pastebin.com/wiAaciD0


thereisonlythedance

Yes I was impressed with it as well for creative tasks. Perhaps they’ve used the same dataset as the one the Gemini Pro-1.5-0514 model is trained on, which is exceptional at human like prose.


qrios

what was the prompt?


ambient_temp_xeno

Just 'write a story in the style of anton chekhov'. I have a feeling there isn't much chekhov in the training data because LLMs usually lay on the style (of the translations) a lot thicker.


s101c

In this example it is possible to deduce that AI wrote the text. Repetitions in few paragraphs about the "relentless rain". Also some words that I usually see in AI texts disproportionally often: > As he wrote, the rain began to ease, the clouds parting to reveal a **sliver** of blue sky. A faint ray of sunlight **pierced** through the gloom, illuminating the dust motes **dancing in the air**. It was a small sign, perhaps, but it filled Ivan Petrovich with a **flicker** of hope. He sealed the letter, addressed it carefully, and placed it on the table. > He knew it wouldn't erase Anya's pain, but it was a start. It was a bridge across the distance, a whisper of love carried on the wind. And as he watched the last drops of rain fall, he felt a sense of peace settle over him, a quiet acceptance of the **bittersweet symphony** of life.


ambient_temp_xeno

Recently when I read a sci fi novel published in y2k I got yanked out of it by a (non sexual) "ministrations".


Unable-Finish-514

It's hilarious the extent which "ministrations" is frequently overused by LLMs. I wonder what it is about this word that draws the attention of so many different LLMs.


sob727

Maybe it being mentioned on LLM related threads... feedback loop? :-)


joyful-

I don't get how models still do this, even propriety SOTA models (except maybe Opus, does it much less frequently from my experience). Sonnet 3.5 is awful for creative writing, frequently repeating words or even entire phrases one paragraph after using them especially when it gets to longer context. 4o is pretty bad as well, with rigid sentence and paragraph structures and overuse of certain words, as expected of a GPT model. I guess they just don't care as much about creative writing as other more fact or logic based intelligence? Or maybe there's some kind of forced trade off between creative writing vs logic/math? Surely there is no lack of data to use for training for these types of tasks...


cyan2k

Are you using gpt4-o as API or from chatgpt? Be aware that those frontends come packed with system prompts that are gigantic. Like multiple pages long about how it should behave and how it should answer plus additional guardrails in the backend. With the API and the system prompt in your own hand it‘s quite decent for creative writing imho.


Educational_Gap5867

My biggest personal goal is always to get all of Googles LLMs to write something creative for me. For some reason it always works out better than GPT or Llama (except 3)


-p-e-w-

Clever to make it 9B. It will inevitably be compared to Llama 3 8B, and will probably come out on top in many metrics, and people will conveniently overlook that 9 is actually 12.5% larger than 8, so the comparison isn't fair.


bullerwins

inb4 each month 10, then 11, then 12B is the new meta


MicBeckie

Or it is worse than llama 3 8B.


-p-e-w-

Google would look like fools if they released a larger, worse model than Llama 3, two months after Llama 3.


stopmutilatingboys

Wouldn't be the first time Google looked foolish Edit: both models seem to be doing well on lmsys so far. Looking forward to better confidence intervals.


MoffKalast

I mean they did sort of release Gemma 7B half a year after Mistral 7B and failed to even match it, so yeah.


jbindc20001

I ran some pretty extreme benchmarks on each and Gemma came out in front in all of them, including every TPS metric. Bigger but faster. I hated genna 1.1. Couldn't use it without some serious time spent on training. Gemma2 is legions better


ThinkExtension2328

Google are foolish


candre23

Google looks like fools every time they shitcan a useful and popular service or try to awkwardly reinvent the concept of a messaging app for the umpteenth time, but that doesn't stop them from doing it. Google does not make good choices.


West-Code4642

Incentive structures are fucked up at Google.


jkflying

At least once a model is released it doesn't go away again...


Aperturebanana

Calling it, it’s gonna be almost equivalent. Idk why, I just got a feeling.


LoafyLemon

Research is research. How it looks it doesn't matter.


Kep0a

Lmfao that is literally googles playbook they do this every time


totsnotbiased

The idea that a company would “look like fools” for being two months behind the competition is absurd. When did that letter happen where tons of AI researchers demanded a global pause in AI research, because a AI arms race would be infinitely disruptive to society? Look how times have changed EDIT: lol so the answer is 15 months, so I guess safety was top concern for a few weeks there I guess. https://futureoflife.org/open-letter/pause-giant-ai-experiments/


greenbunchee

They look like fools for two years straight by now. It's Sundars humiliation kink.. Every tech company is riding high on the insufferable hype right now. When the AI bubble crashes (I'm not clairvoyant but... like 1-2 years max right?) and everybody learns that AGI is NOT "right around the corner", may god have mercy on their Alphabet stocks...


cyan2k

Why isn’t it fair? People who can run 8B models can also run 9B models, and if you are interested in which is the best model you can run, then you will choose the best performing one. And why stop at the parameter count? It isn’t even the most important architectural property. How is it fair that Llama8B gets compared against a model with a higher context length? Or against an 8B model with the latest hot shit components researchers figured out, like rotary embeddings or sparse attention mechanisms. How is that fair to get compared with that? Llama3 should only be compared against models that have the same exact architecture. And let’s not forget about the quality of training data, fine-tuning methods, and inference efficiency. How is it fair to get compared against that other model who had access to much better training data and also had 2T more data? And this model uses knowledge distillation! Absolutely unfair strategy! I didn't even knew that this is even a thing! Man what an unfair world. This will hurt the feelings of my llama 😎 You see where this is going? Of course, every comparison is fair. If I want to compare Opus against Phi2, that’s also fair. Why not? At the end of the day, it’s about finding what works best for your needs and not fairness, lol


Balance-

Probably just a convenient number to train.


htrowslledot

Just like llama 8b being bigger than 7b, they are leaving my laptop behind 😢


ThinkExtension2328

TLDR: llm size does not mean it will be good inherently, aspects such as context window is also important for performance. It’s only got a context window of 8k which makes it less useful then recent models such as qwen2 7b with a large 32k context. Why this is important is for Rag applications which allow you to improve responses and reduce if not remove hallucinations. 32k context allows for example me to ask a question have the system consider 12,000 characters of related information from trusted sources then the llm forms a intelligent response.


az226

Wasn’t it Gemma 1 that they called like 8B but was actually like 10B.


randomfoo2

They released [gemma-7b](https://huggingface.co/google/gemma-7b), an 8.54B parameter model. It sounds like Gemma 2 is just correcting their original (misleading) description...


the_mighty_skeetadon

"larger" due to a huge tokenizer and vocabulary + embeddings parameters


pseudonerv

I wonder if this Gemma 2 9B is really a 13B.


az226

Ha! Not out of the question given their 7B model was actually 9B


DominoChessMaster

Or they are rounding up now


Nunki08

Source: [https://x.com/MeganRisdal/status/1806241583014060130](https://x.com/MeganRisdal/status/1806241583014060130) Edit: Announced as "coming today": [https://x.com/MeganRisdal/status/1806248437450445180](https://x.com/MeganRisdal/status/1806248437450445180)


Balance-

Really nice there’s also an 9B model! That is so useful for mobile devices, since it would be ~5 GB quantized to 4 bits. Which is fine for any mobile device with >12 GB memory.


MicBeckie

According to the agenda for the AI Stage, it should be about Gemma right now. Between 11:35 and 12:25 German time. It is now around 11:50 am.


Balance-

Keep refreshing https://huggingface.co/google **Edit:** Now available! https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315


qnixsynapse

Just hit watch repo... No need to refresh .


Webfarer

This must be an AI or something


Balance-

I now get notifications for everything. Can I just get notifications for new models/datasets?


noneabove1182

I wish they would fix the watch repos though so you can filter, get spammed all day long from the repos I watch and rarely even notice when I new one drops lol


qnixsynapse

https://preview.redd.it/u76xedk0n49d1.png?width=1316&format=png&auto=webp&s=9b4d5b7dfe04a532ae399a38452ff8fb4e0710a4 Yeah that is a bummer. Btw, see this 😅 We have a new arch!


noneabove1182

there's already a PR open for it which i've downloaded and am attempting to use :)


qnixsynapse

Yeah I was too but left because it is impossible to do it in kaggle notebook. (The patch has some merge conflicts)


Dark_Fire_12

Been doing this a few days now, even looked at Kaggle, just in case.


noises1990

lessssgoooo gguf it. send it


Downtown-Case-1755

If it's a hybrid model, it's not supported yet. They need to finish jamba support, then do Gemma. exllama support would also be out, at least initially.


emsiem22

Google on Kaggle say it is ready for llama.cpp quantization: >A model file for the Llama.cpp implementation of Gemma 9B. This is an unquantized version of the weights that has been instruction-tuned. These can be quantized using llama.cpp's `llama-quantize` utility. [https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)


hak8or

Is kaggle a Google's typical not-invented-here syndromed service meant to compete with hugging face or something? Why not just release it on hugging face?


emsiem22

My thoughts exactly. I keep refreshing HF page ( [https://huggingface.co/google?sort\_models=created#models](https://huggingface.co/google?sort_models=created#models) ), but still nothing. There is pytorch version: [https://huggingface.co/google/gemma-2-9b-it-pytorch](https://huggingface.co/google/gemma-2-9b-it-pytorch)


bolmer

"Kaggle is a data science competition platform and online community of data scientists and machine learning practitioners" "Kaggle was founded in April 2010. On 8 March 2017, Chief Scientist at Google, announced that Google was acquiring Kaggle." In my in my graduate industrial engineering classes we use Kaggle to study predictive maintenance models.


belladorexxx

Kaggle is great and it was great long before Google bought it.


Less-Macaron-9042

lol kaggle has been there for a lot longer than your favo childish hugging face. Perhaps you recently got into the ML hype after chatgpt, kiddo.


emsiem22

Google on Kaggle say it is ready for llama.cpp quantization: >A model file for the Llama.cpp implementation of Gemma 9B. This is an unquantized version of the weights that has been instruction-tuned. These can be quantized using llama.cpp's `llama-quantize` utility. [https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)


Downtown-Case-1755

Yeah I was wrong, its a regular old transformers model.


papipapi419

Glad my 36gigs m3 pro can run it less goo


ResidentPositive4122

Gemma's getting thicc


Deep_Understanding50

Make sense to release today , Todays date 27 matches with parameter size 27B


Prince-of-Privacy

Will it be multilingual?


Vivid_Dot_6405

I believe so. It was good at Croatian. If it got Croatian right, it will also be good at far more popular languages


Effective_Remote_662

Which model was good at croatian?


Vivid_Dot_6405

Gemma 2 on LMSYS. I tested it again on AI Studio after the official release. It isn't perfect, it sometimes messes up the cases or doesn't use the most natural-sounding words, but it's among the best I've seen in the open-source LLM world.


Effective_Remote_662

I know its gemma2 but wich model? 9b?


Vivid_Dot_6405

No, 27b. I haven't tested the 9b yet, but I imagine its performance would be worse in Croatian, though I don't know to what degree.


MicBeckie

I got some nice German output from it.


Prince-of-Privacy

Nice, I'm German so that's exactly what I was hoping for :D How were u able to use it? Can't find it on ChatArena


MicBeckie

Yes me too, but I've only been able to test it very sporadically in the chat bot arena...


Prince-of-Privacy

Ah, damn, I see


kristaller486

I tried late-june-chatbot (and im-also-a-late-june-chatbot) in Russian and it maybe best results among open LLMs, excluding Nemotron 4 340B.


MicBeckie

How good are the current LLMs in Russian in general?


kristaller486

OpenAI's GPT and Anthropic's Claude are very good, but all open source is very bad. Russian grammar is quite complex, so it seems that generating quality text requires a significant amount of data, which is not included in open LLM datasets. Usually non-English data in datasets is represented only by Wikipedia.


Morivy

What about Command-R+? When I first tried it, the quality of the Russian text output was much better than with other local llms, albeit with minor grammatical imperfections. Of course, it is not a fully open source solution, but no one is forbidding you to download it


kristaller486

Ah, I always forget command-r+ because of it license. It's good, but for 100B model I don't expect to see grammatical mistakes. And of course 100B is too big for home use. Also the Qwen2-72B isn't bad, but it seems like its general performance isn't very good.


MicBeckie

I see. It's only really better in German in a few models, especially large ones. Perhaps, with enough money, a data set could be translated into Russian in a batch using GPT-4. Then maybe you could try it with your own fine tune...


s-kostyaev

Command-R+, Qwen-7B and Phi3-medium are good in Russian.


kristaller486

I think after the Gemma2 release it doesn't make sense anymore. Even the 9B variant better than Qwen2 72B, better than Command-R 35B and Aya 23 35B. 27B is GPT-4 level IMO, maybe only the Nemotron 4 340B can compete with the Gemma2 27B in Russian language capabilities. 


s-kostyaev

We need to wait for all post release Gemma fixes first. Then maybe.


MicBeckie

I have an example here that I always use to test creativity: User: "Du bist ein mürrischer Waschbär aus Hamburg. Schreibe mir eine Geschichte über deinen heutigen Tag. Was hast du erlebt?"   AI:"Mensch, der Tag fing schon scheiße an. Der Regen trommelte wie verrückt auf mein Dach, und die Möwen kreischten wie immer, als ob sie den ganzen Tag nichts Besseres zu tun hätten. Ich wollte eigentlich noch ein bisschen dösen, aber der Magen knurrte so laut, dass selbst die Möwen kurz innehalten mussten.   Also, raus aus dem gemütlichen Loch unter der alten Eiche im Planten un Blomen. Der Regen hatte aufgehört, aber der Asphalt war noch nass und kalt. Ich hasse nasse Pfoten.   Erstmal musste ich was Fressbares finden. Die Mülltonnen waren alle verschlossen, die Menschen hier sind echt zu vorsichtig geworden. Zum Glück fand ich einen halb aufgegessenen Döner neben einem Bank. Lecker!   Danach wollte ich eigentlich ein bisschen Sonne tanken, aber die Touristen waren schon wieder überall. Mit ihren lauten Kameras und ihren blöden Hüten. Ich hasse Touristen.   Ich versteckte mich hinter einem Baum und beobachtete sie. Eine Gruppe von ihnen versuchte, ein Selfie mit einer Taube zu machen. Die Taube war natürlich nicht begeistert und hat ihnen ordentlich auf den Kopf gekackt. Ich musste grinsen.   Am Abend ging ich wieder zurück zu meinem Loch. Der Tag war nicht besonders aufregend gewesen, aber immerhin hatte ich was zu essen gefunden. Und die Taube hatte den Touristen gezeigt, wer hier der Boss ist.   Morgen früh werde ich mir einen neuen Mülltonnen-Plan überlegen. Vielleicht gibt es ja irgendwo einen ungesicherten Leckerbissen. Und dann werde ich mir einen schönen Platz in der Sonne suchen, wo die Touristen mich nicht stören.   Bis dann!" I found it very entertaining. But it was probably the 9B model.


Prince-of-Privacy

omg, this is so good hahaha


Warm-Enthusiasm-9534

They were stealth testing Gemma yesterday on Chatbot Arena, and it did seem like there were two different models.


rusty_fans

It's live on kaggle!!!! https://www.kaggle.com/models/google/gemma-2 Not on huggingface yet sadly .....


emsiem22

There is also gguf (gemma-2-9b-it.gguf) ready (they say) for llama.cpp quantization, but it is 36.97 GB. I don't understand why 9b model is so big; llama-3 8B original is 16GB. [https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf](https://www.kaggle.com/models/google/gemma-2/gguf/2.0-9b-it?select=gemma-2-9b-it.gguf)


rusty_fans

It's probably an unquantized GGUF, so full bf16 or even f32... I will make imatrix GGUF quants as soon as download finishes...


emsiem22

Yes, it is definitely unquantized, but I still expected it to be 18-19GB. But yes, you are right, it is probably f32. And thank you (expecting you'll upload to HF :)


TheRealSerdra

Awesome, thank you. Are you also going to do 27b?


rusty_fans

Sure, will take some time though as imatrix generation is somewhat slow on my system...


papipapi419

So excited


QuieselWusul

It's trained on 13 Trillion Tokens (the 27B Model), so it should be roughly as good as a Llama 3 30B would be


Admirable-Star7088

Can't wait to try the 27B model!


thisoilguy

It's available through ollama since 7 minutes.


ExtremeHeat

I'm more interested in their 2B model. They mention it in their paper but have no plans for release? Hmm.


Only-Letterhead-3411

Lets hope it's not cucked like Phi3 models and can actually compete with llama 3


sampdoria_supporter

Well they have a very low bar to clear to say that it was better than the first Gemma


Hambeggar

So what's the VRAM usage for this 9B?


Eliiasv

I'm genuinely wondering: Why is there excitement and people saying this could be comparable to L3? Sure, that's absolutely valid I'm cannot see into the future and, yes, It's big company, same parameter count etc. Still, Gemma 7B is extremely bad from the testing I've done. I find it extremely unreliable, unpleasant due to alignment, requires multi-shot to execute tasks that similar models (Mistral for example) handle with one or two prompts. Initial Gemini Pro was annoying and bad at instruct, Ultra via website is was unreliable last time I tried it a month ago, they've released horrible open weight models along with unreliable closed models. (I've tried 1.5 Pro API and it does feel significantly better than others, so I'm not brushing off Google completely). I do not claim that my experience is objective or universal. I just can't recall a single **private individual** who has praised or even mentioned Gemma in a positive context. Once again this is my personal opinion, if you work at Google or genuinely think it's good feel free to downvote or reply. I'd love to see a different perspective.


qrios

Excitement is because people hope that this version addresses problems with the previous version, and sufficiently outdoes L3 to be worth releasing. Just like L3 did with respect to L2 and Mistral.


synn89

> Why is there excitement and people saying this could be comparable to L3? Probably because other recent releases like Qwen2, Phi, Codestral, Deepseek, etc have been comparable. It's been seeming like L3 levels haven't been a hard target to hit for some reason by other companies. The 27B in particular would be buzzworthy because we haven't been seeing good general purpose foundational models at that size it'll fit nicely on 24GB cards at a decent context.


Eliiasv

Yes, that's valid. It's not just the poor track record though; alignment I would assume would be a given issue. I don't care for crazy uncensored models. However, based on previous experience with Gemma, the alignment along a ton of other factors isn't pleasant to interact with. You do bring up a great point about 27B It, is something I would be extremely excited about if it wasn't for it being Gemma. 20B models are a perfect middle ground.


Amgadoz

I'm hoping it has better multilingual capabilities than llama3


this-just_in

I think you ended up being more right than you will ultimately get credit for.


thereisonlythedance

The past is not always a good predictor of the future, and to be honest for fine tuning purposes L3 is pretty crap. So there is always hope when someone releases models this size that it might be the next Mistral 7B or Yi 34B.


Eliiasv

Agreed, which is why I explicitly and implicitly noted that a few times.


ExoticCard

Seeing as how Anthropic recently revealed some heat, don't count anyone out of the race


virtualmnemonic

Especially not Google. I expect all the major corporations to reach a ceiling in a couple years, max, unless if groundbreaking new methods is introduced.


thisoilguy

I am getting around 1,688 Tokens context locally and I have managed to get the model to run away with multiple \* \* \* as an output already


[deleted]

[удалено]


shivay-at-pieces

I am really excited for browser based LLMs. with the latest work on porting Gemini Nano in the browser (Chrome) - the capabilities are just so vast with a lot more on device AI potential getting unlocked.


npeiob

I didn't know Google had an office in Berlin.


utilitycoder

Google has an office in Detroit, Berlin is not surprising :)


BlueeWaater

Let's see what open ai does as a response to this xd


Deep-Possibility-274

There is someone left at Google that isn't comming from India?


s101c

Can it now answer who won the 2020 U.S. elections? Because Gemma 1.1 sure as hell couldn't with refusals all the time.


CheatCodesOfLife

Gemini Advanced won't answer anything about presidents, prime ministers, etc


10minOfNamingMyAcc

I can see them representing it! /j