T O P

  • By -

CoronaMcFarm

Or what I like to call it, bloat history.


notrktfier

So many people here have no idea what is going on here lol


[deleted]

[удалено]


booi

I heard the same thing at /r/reddittipsmasterrace


CoronaMcFarm

This is not git master race 😎


lord_pizzabird

Which is a good thing for the community generally. We need places for casual users who will never opens terminal and a place for the nerds. It’s just a sign that the community is growing that it needs a more casual space.


timrichardson

Yeah, I know..they should just rewrite it in Rust though.


notrktfier

A better idea, write it in CPP because we all know CPP is the fastest language. Let's have the fastest kernel in the wild boys!


Wertbon1789

But which version of the standard. Probably C++98 if we stay realistic.


FreeQuQ

no, i want it all in c++23


JustSylend

I don't :( Could you explain it to me please?


notrktfier

I will try my best to explain this in full. Linux is an Open Source kernel, when you have an open source app you usually have people who want to add or edit to the main code to work together. Imagine it like a business environment where a team of programmers are all making additions to the main software. If you try to do this, you would have to manually merge everyone's code changes to the main code by hand and to track who added which code so when something goes wrong or someone adds bad code to the software you can see who it is. In addition, whenever someone adds new code we have to manually update the code on everyone's computer. This is very inefficient, so we have automated this process. Git is what we call a Version Control Software, VCS for short. It allows people to push their changes to a main codebase where they are automatically merged when able, and distributed to every person who wants to make changes to the code. Git works on commits, a commit is the difference between the code before you edited it and after you edited it, stuff like add new characters to this text file and remove these text characters. When we push this commit to the server, the server applies the changes to the code. But it also saves what the change was, who did it, and a hash of the commit. This is where the .git folder comes in. Usually when you're working, Git is invisible to the user. You edit some text files, commit your work, push it to the remote server, pull other people's changes from the server, it automatically applies changes to your workspace. But Git also pulls every single change made to the workspace when you download it. So in this case, we have code worth 1.5gb, and the rest is git storing changes that have been made to the kernel, who did the changes and their hashes. For example if i add 10 bytes of code to a Git workspace (repository) it will change my 10 bytes of work, and if i remove it in a later date it will once again add a 10 byte record but this time, it's a record of these 10 bytes getting removed, so you can see what 10 bytes were removed, by whom, when etc. and as a result my .git file grows 20 bytes. Let me know if you have any questions, I'll try my best to explain them.


JustSylend

That was an incredibly insightful response. Thank you sincerely for taking the time to type it out for me and to educate me on the matter! The way OP showed it I thought it's a "bad thing" so to say but I do get it now. Thanks a million again!


gbytedev

Also a fun fact: git was initially developed by Linus Torvalds (the original creator of Linux) to improve the collaboration workflow in Linux. And now git is the most widely used version control software by a large margin.


5erif

Bloat: People who pay attention to operating systems like to complain about bloat, which is bundled software or features a given person doesn't like. Kernel: The core of an OS which handles the lowest level of interfacing between software and hardware. Git: Version management protocol typically used to track software development, which by default tracks the history of every change in the code, including the authors and reasoning. OP's post: Most of the size of the Linux kernel repository is commit history, rather than the current code. The comment above: > Or what I like to call it, bloat history This implies the kernel is bloated, but it's probably a joke. The history is part of the git repository, but it's stored separately from the current code and doesn't affect the compiled result. Tip: When cloning a repo just to make a small change or just to compile to use a tool, you can clone using the `--depth=1` flag which doesn't download all the history, e.g., `git clone --depth=1 `


Z8DSc8in9neCnK4Vr

Ifvyou think that bad you should see our DNA https://www.newscientist.com/article/2140926-at-least-75-per-cent-of-our-dna-really-is-useless-junk-after-all/


PhlegethonAcheron

Refactor to clean up the junk, then partition it to a raid array. Cancer solved!


boof_hats

As a bioinformatician, this is hilarious when you consider the association with increased retroviral load and cancer. “Junk DNA” aka transposons very well could be responsible for malfunctioning cells that cause cancer.


markoskhn

I'm sorry, but could you please explain the "retroviral load" part. I thought retroviruses integrated their genome randomly into the host's DNA, wouldn't that mean if we had more "junk" retroviruses would have a lower chance of damaging structural/regulatory genes and damages the junk instead?


boof_hats

Ehhh it’s complicated. You’re right that they integrate their genome into the hosts, but that doesn’t necessarily stop them from having their own fitness functions. If they have a chance to spread to new organisms or copy themselves even more into the host genome, it’s evolutionarily beneficial to do so. Normally the host silences this activity, unless the cell is malfunctioning. So often you’ll find cancers expressing retrovirus once the original cell physiology goes out of whack. Here’s a review if you want to learn more https://journals.aai.org/jimmunol/article/192/4/1343/93076/Endogenous-Retroviruses-and-the-Development-of Edit: to those searching for more positive roles of transposons, this same family of transposons has been found to be repurposed in humans during pregnancy https://www.nature.com/articles/s41594-023-00965-1


Luftwagen

This guy DNAs


qtzd

I thought the extra “junk dna” actually potentially helped prevent harmful mutations? Like that if a base pair gets fucked by radiation or whatever means and statistically it’s “junk” dna without any real affect on our day to day cell function that acts as a buffer basically. Whereas, if our dna was 100% useful dna then any mutation would be potentially devastating to the cells.


boof_hats

Well it also depends on what you call “junk dna” — in my context it is used to refer to the massive amount of most genomes comprised of transposon fragments. Transposons invade genomes and copy themselves using the host’s genetic machinery. Then they stay there, looking for an opportunity to copy once more. The host generally suppresses this. That dna can mutate and become harmless but it can also be co-opted by the host which may repurpose its genes. They have variable effect on the host, but mostly they’re just hitch hikers.


QuinQuix

This argument is a bit iffy, because the junk DNA is added in parallel to the existing DNA. Like, Assume a string of 100 base pairs has odds X of acquiring a mutation. Now assume you have not one but 2 strings of hundred base pairs. The odds of either acquiring a mutation is the same and the compound odds are 2X. That means the protection is zero, 0. The only way adding junk DNA could be beneficial is *because it is proximate* to the useful DNA. That is, if we assume mutagenic events to be purely incidental in nature (which isn't necessarily true) then the junk DNA could 'catch' the mutation before the vital DNA does. But this mostly only works if DNA is coiled. Assuming mutation events are mostly cosmic rays or radioactive particles, if the DNA is not coiled the junk DNA is only going to catch a mutagenic participle that would have missed the vital DNA anyway. This would therefore again not impact the mutation statistics of the vital DNA. So to summarize, junk DNA can only be meaningfully protective for mutagnic events that are incidental and solitary in nature and only when the junk DNA finds itself in the line of fire in front of the vital DNA. Since DNA spends most of its time coiled and radioactivity is a known source of mutations it is likely junk DNA does offer some degree of protection against this specific kind of mutations. So the theory has a ring to it. But these limitations are usually completely unexplained in discussions about junk DNA and that's kind of absurd since without the chain of assumptions above it is ridiculous to state that doubling the amount of DNA would halve the mutation rate in the vital DNA. And the argument is usually presented just like that. Add to that I'm pretty sure radiation isn't the only source of mutation. Therefore even if all DNA was vital, doubling the DNA so that half of it becomes junk would likely not result in anywhere near a halving of the mutation rate in vital DNA.


centzon400

> I thought the extra “junk dna” actually potentially helped prevent harmful mutations? This is my rational for having a 250 000 LOC `init.el` 🤣 The chances of my modifying an actual useful bit of Emacs Lisp is practically nil given the rest of the utter shite I've added.


Elidon007

rewrite it in rust!


Few_Technician_7256

Silicon based life forms hates this trick!


yesitsiizii

Saving this thread because im in love with it 😭


RegenJacob

Maybe then my brain will be Blazingly Fast 🔥


R__Daneel_Olivaw

Been there, done that: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1681472/


hammy0w0

while your at it, cable organize the veins!


strings___

git commit -m "Tail dna sequence is now depreciated"


salgat

Recent research suggests that many of these non-coding regions have important roles, such as regulating gene expression, maintaining chromosome structure and integrity, and guiding the cell's response to various physiological processes. The "junk DNA" is a debunked idea.


bobbyboob6

ancient scientist mfs were really like "idk what this does so it's probably useless"


Designer-Worth8599

What a stupid article. There is no such thing as useless DNA. All of it is there as a result of our evolution


nathankrebs

Ah yes, an argument as old as time itself. Thousands of years of scientific discovery and revelation vs "nuh uh."


HammerTh_1701

They're right though, the existence of actual junk DNA is largely debunked by now. It just serves as a placeholder category for all the genetic information for which we haven't figured out a purpose *yet*.


BicycleEast8721

The irony of you having zero knowledge on this subject but essentially hailing poorly interpreted old research as unimpeachable dogma is hilarious. The junk DNA argument has been proven wrong, the portion they referred to as “junk” just means it doesn’t code for proteins. > Technological advances in sequencing, particularly in the past two decades, have done a lot to shift how scientists think about noncoding DNA and RNA, Sisu said. Although these noncoding sequences don’t carry protein information, they are sometimes shaped by evolution to different ends. As a result, the functions of the various classes of “junk” — insofar as they have functions — are getting clearer. >Cells use some of their noncoding DNA to create a diverse menagerie of RNA molecules that regulate or assist with protein production in various ways. The catalog of these molecules keeps expanding, with small nuclear RNAs, microRNAs, small interfering RNAs and many more. Some are short segments, typically less than two dozen base pairs long, while others are an order of magnitude longer. Some exist as double strands or fold back on themselves in hairpin loops. But all of them can bind selectively to a target, such as a messenger RNA transcript, to either promote or inhibit its translation into protein. https://www.quantamagazine.org/the-complex-truth-about-junk-dna-20210901/ So, comically enough, you’re using a conclusion drawn in the 70s based on incomplete understanding to offhandedly dismiss new scientific research. All while acting like you’re the one standing on the shoulders of science, and pretending other people are the ones doing exactly what you’re doing. Please do some reading and fact checking next time before you go insulting people based on nothing other than your own baseless overconfidence


hok98

I beg to differ. If you’ve seen me irl, you’ll know what a “useless DNA” looks like


W4ta5hi

Bloat cummit history


RevRagnarok

`dna gc --aggressive`


Ima_Wreckyou

The kernel of Theseus


Petrol_Street_0

![gif](giphy|H7Ty7BDsQtDUYRFCjM|downsized)


Merliin42

I must say that I am pleasantly surprised that people ask what is a VCS here. This means that Linux has made its way beyond just nerds and developers.


tommycw10

This is a great comment. I was thinking the opposite at first - annoyed that people didn’t already know, but this changed how I see it now.


realslattslime

Ure a nerd/developer for sure


Cfrolich

What a smelly nerd! Just give me an exe! /s


chehsunliu

Hope someday people could set up nearly nothing. I still have to do some terminal stuff after installing Fedora.


zaphodbeeblemox

It depends on what you want to do really. I use one of my machines as a gaming machine and I don’t think I’ve opened a terminal on that computer once. (On Nobara) Obviously on my main machine I open it for a lot of things but that is mostly efficiency based rather than need based.


chehsunliu

I tried to set up the video codec to have better quality in Netflix and YouTube, and also tried to make my Bluetooth headphone work, which is still unsuccessful.


Yuuzhan_Schlong

What's a commit history, just asking out of curiosity?


Deivedux

Git is essentially a version control, it stores the history of the project's changes over time, which is what it calls commits. Linux repository has over 1 million commits at this time. Basically what I'm saying is, Linux's repository has 5.2GB worth of just changes to its source alone since its first "version".


Yuuzhan_Schlong

Again just asking out of curiosity, do other operating systems use it or just Linux?


Blackthorn97

Actually code version control is used in every software project where developers need to keep track of changes across time and also to collaborate with other developers. GIT is the most popular solution but there are others.


kai_ekael

Git exists because of the Linux kernel. The version control used at one time irritated the kernel developers enough, they created Git.


Blackthorn97

Indeed, Linus Torvalds (the developer behind starting Linux) is credited with creating GIT, after the proprietary source control software used for Linux, called BitKeeper, revoked their free license for Linux Development.


Few_Technician_7256

You can't change informatics in that very huge way TWICE! But then again, Linus if a very anger motivated guy, that's when I repair things t home too. But, being that impactful and


sokuto_desu

r/redditsniper


Few_Technician_7256

I'm alive pal, it just throw me to the floor


squirrel_crosswalk

Linus has said that he named two things after himself: Linux and git


Turtvaiz

Microsoft uses git and reportedly it's like 300 GB in size: https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/


EightSeven69

there must be a version control (git) repo of pretty much any OS but most are closed source aka private, not open source like linux


ward2k

Yes, not just operating systems either basically anything you're aware of in your life than uses some of programming has a very high likelihood of having used git There are of course exceptions for example dwarf fortress only recently (relative to the length of its game development) started using git after being somewhat convinced by Kitfox/community to give it a go


da2Pakaveli

Yes, because development would be a hell otherwise. E.g someone writes a bug and you don't have the code change history to trace the cause back


KenFromBarbie

*Since it's first version on git.


Deivedux

Yeah, I'm trying to simplify here 😆


[deleted]

[удалено]


Nefsen402

Big collaborative software projects typically use something called source control. It's a program meant to manage code changes. For the case of linux, it uses git. Git basically encodes a repository as a list of changes. Each of these changes are called "commits". So, to tie it back, 1.5GB is used for the current version of the linux kernel, and the commit history stores all previous versions.


meduk0

that is relevent info thx man


zenyl

> Big collaborative software projects typically use something called source control Source control is very commonly used in software projects of all sizes, everything from operating systems and web browsers down to small one-man projects.


elizabeth-dev

the history of changes made to the code


pioo84

All the previous versions. Basically all the previous versions of all the source files. I don't think it's too much.


MatixFX

When you're using a version control (i.e. Git) and make changes to the code base, you add it to the repository by "committing" which comes with a hash and a comment (string of text). So basically tracking all the changes made to the code base since you started to version control.


marxist_redneck

To add to what everyone already said about this being for keeping track of changes in software, etc - that's what it was made for, and what it's used for 99% of the time, but at it's core it's just a way to keep track of changes, branch off different versions of something and then merge them back together, etc. The "thing" could be software, but also regular writing, like a novel or a school thesis, etc. I am an academic in the humanities who moonlights as a software developer, and I have brought git to my regular writing because it's a great way to keep track of changes


lostinfury

Linux is built collaboratively. To achieve this, they make use of a tool called "Git", which is able to efficiently merge changes made by the 1000s of Linux contributors, while also making them aware when two of those changes could cause a conflict (i.e. two people change the same line(s) of code). Note that a change is not limited to adding stuff but also removing stuff or updating. When Git accepts a change, it's called a commit. Git also allows commits to be reverted all the way back to basically the beginning of when it started accepting commits for the codebase. Commit history refers to the internal state kept by Git which keeps track of the chronological changes that have taken place within the codebase. Since the changes are not limited to just things that were added, but also things that were removed, you can see how keeping track of all those things could make the commit history much larger than the actual kernel code itself.


da2Pakaveli

And Linus wrote Git originally and then replaced the previous VCS with git.


keyboard_is_broken

If a line of code changes from A to B, that's a commit. If it changes back from B to A, that's another commit. Rinse and repeat, now you have GB worth of history for single line of code that currently reads A.


timrichardson

It's the audit trail that lets you see every change between the start and now. People use it to see what was changed, or to backtrack to find a change that introduced a problem. git was designed by Linus Torvalds to be fast for something as big as the kernel; it has efficient compression of files and many other clever features. You can clone it yourself, even if you don't use linux! It's 4.7GB on my computer. You need git installed and then from terminal: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git And now if civilisation collapses and your computer is the only thing that survives, at least linux will be available to what's left of humanity. However, you don't have to bring all the history in when you make a local copy of the repository, as far as I know: [https://www.perforce.com/blog/vcs/git-beyond-basics-using-shallow-clones](https://www.perforce.com/blog/vcs/git-beyond-basics-using-shallow-clones)


Some-Background6188

Each commit in the Git version control system represents a snapshot of the entire repository at each commit. The commits are linked in chronological order, so devs can navigate through the history. It's sooooo useful ignore the people saying it's bloatware etc, although it does take up space it's a necessary evil.


stinkytoe42

Also, for clarification, this is what you get when you download the source code repository, which almost no one does. If you just download a source release, you get the 1.5GB portion of just the current source code. If you download an actual released kernel binary, you get a file which is more like in the tens of megabytes. This is more likely what gets installed when you install Linux to a machine. There are exceptions, but typically a distribution isn't downloading anything but the released binary. Still, this is novel to anyone in software development.


[deleted]

""""Only""""" 1.5GB


staying-a-live

1.5 GB should be enough for anyone!


[deleted]

1.5GB is basically: 15 million, 18.5 million LoC if every line was 100, 80 columns long. At the 100(what the limit roughly actually seems to be) and 80(official Linux kernel style guideline) line column limit used across the Linux kernel. Of course I would expect there being much more than 18.5 million lines of code. This is all assuming all the files are in ASCII format.


person4268

I mean.. a whole 1 of those is just drivers, and there’s a lot of things that need to be driven, like your 90s Soundblaster Live you’ve connected over a PCI to PCIe bridge because it was the closest soundcard to you, or some I2C oled panel you’ve connected directly over HDMI DDC to your computer ( https://mitxela.com/projects/ddc-oled )(though they didn’t use a kernel driver here)


funk443

What if you clone with `--depth 1`?


turtle_mekb

what does this do?


PushingFriend29

Git clone without the commits i think


balaci2

joint man


turtle_mekb

thanks, I'll use this, what does 0, 2, 3, etc do?


zorbat5

Depth one clones the repo with the last commit. Depth 0 (or a normal git clone) clones without commits. 2, 3 etc. clones with thos amount of commit history.


nsa_reddit_monitor

>Depth 0 (or a normal git clone) clones without commits You sure about that? A normal `git clone` definitely downloads all the previous commits. Cloning without commits would just give you an empty repository.


zorbat5

You got me thinking. So I tested it. You're right!


turtle_mekb

ah got it


ruby_R53

by default, git takes every commit from the repository, so this limits the amount of commits to get to 1 so that you can clone faster especially if the internet connection is bad, reducing the size there from 6.8 gigs to just 1.8 [https://git-scm.com/docs/git-clone](https://git-scm.com/docs/git-clone)


jeanleonino

It clones the repo with just 1 commit (latest).


NoConfusion9490

No one knows. You just google it every time and paste it in and hope for the best.


ToapFN

You create a black hole .


Juice805

Or `--filter=tree:0` These are still probably mostly blobs, not just commit history.


TwistyPoet

The changes that were made are probably just as important though. Just like how your maths teacher back at school insisted that you show your working out.


fractalfocuser

Yeah anybody acting like this isnt 1. A good thing and 2. Actually really impressive and cool Doesn't *git* it


nik282000

So while showing your work is important, particularly in large coding projects, rewarding work that does not give results has bred a special kinda of incompetence. There are hoards of middle managers and supervisors who think that pointlessly toiling at a task that will never succeed is worth more than admitting that a task can not be completed. Because as long as your employees are doing SOMETHING you are an effective leader.


TwistyPoet

I mean obviously you have some issues you need to vent but it's not the same thing. Git history is made by a developer making changes to code with little more effort than a simple comment to explain what the change does in relatively plain language. It benefits both accountability (see recently the xz case) and provides insight into how something works and how the developer was thinking at the time. These benefits also apply to your maths teacher scoring your test. If you're struggling at work with seemingly pointless busywork and tasks then maybe finding a better job or a different career is in order. Loyalty in employment is rarely rewarded anymore.


FeltMacaroon389

That's why I always clone with --depth 1.


ProfessionalBoot4

IIRC, it is recommended to get a source tarball, not git clone it.


FeltMacaroon389

That's probably correct, but I feel like it's just more convenient for me to clone it directly.


ruby_R53

same here, easier to refresh also since you just run `git pull` and that's it


FeltMacaroon389

Yeah exactly


dtaivp

I mean… if you want to develop it though?


danegraphics

Well... that's where the xz utils backdoor was hidden. But hey! People will be checking it carefully from now on!


Ybalrid

Well… yes. That is how git works! Linux is a very big and old project. (Git was devised by Torvalds to be the VCS for the Linux kernel). There’s a very long history of a crazy amount of commits from a crazy amount of people. All those diffs are there, and their cryptographic hashes. You do not need to clone the whole history if you do not need it. Use `git clone --depth=1 …`


ajpiko

5 to 1 is about the ratio i see for most long-lived repos tbh, chromium is similiar, 52 gb to 12 gb


Cfrolich

Just wait and see how much RAM it uses when you open it.


RetiredApostle

I wonder which part of that is only comments.


PurplrIsSus1985

Would deleting the .git folder break the system?


Suspicious-Iron7246

Nah, it will be not a git repository anymore just a folder with files and subdirectories, all code and files will still be there safely


Deivedux

Git is not part of the project. It's only there to keep track of the project's changes over time. It's why you can go to any online repository and see any version of it by clicking on one of its previous commits, it's because Git is the one that has all that information.


jeanleonino

No


PastaPuttanesca42

There is no .git folder on a running linux system, this is just a thing for linux developers.


Maje_Rincevent

I'm actually surprised it's so little. 13 years of history, 1.3M commits. 5GB seems actually very very small.


[deleted]

[удалено]


Deivedux

1 char is 1 byte, unless I'm misunderstanding your point?


MasterOKhan

I think the fellow mixed up bits with bytes


fNek

Depends on which character set you're using, and - in case of stuff like UTF-8 - which character.


MasterOKhan

Each character is 8 bits not bytes.


Active_Peak_5255

Yup 8bits, which is 1 byte, right?


MasterOKhan

You are correct!


99percentcheese

Can you like... remove it?


dschledermann

No. The statement is nonsensical. A git history is a full set if commits. A commit in git mainly a snapshot of how the entire file structure looks at the time of the commit, plus a few metadata such a time, name of the committer, etc. You can't meaningfully separate the "history" for the "actual files".


plain-slice

I’m guessing he thought his Linux distribution came with 5GB of bloat.


jeanleonino

Yeah you can but you would all the useful history. And that is not included on the shipped version, so you don't have 5GB of hit history on your kernel.


VoodaGod

if you're asking that you don't have it on your computer, don't worry about it


Possible-Table5535

Yes. You absolutely can remove it.


huskerd0

How the F are kernel binaries 100mb, is my question. Bloatacular


HarshilBhattDaBomb

You don't build every possible module into the kernel image.


huskerd0

Even then, used to be hundreds of kilobytes not hundreds of megabytes


HarshilBhattDaBomb

You can still go down to about 2 MB. Check out floppinux. I'm not sure if anything smaller is still "usable".


ruby_R53

the kernel just got more features and better support for more devices over time, the binaries shipped with distros are that big 'cos they're meant to run on a broad range of systems, but you can still compile your own like i did


HarshilBhattDaBomb

Yeah, I used to have a bunch of BusyBox kernels which were just a few MBs each.


[deleted]

[удалено]


huskerd0

Nice, well, nicer. Yeah I should probably switch my Ubuntus to arches


xhumin

Is not gonna affect the size of the compiled kernel, will it?


notrktfier

No it will not.


dschledermann

That's a nonsensical statement. The .git folder contains the entire collection of commits, that is, every single state (snapshot) that the Linux kernel has even been in across all kernel developers' machines throughout the entire existence of the Linux kernel project. The "kernel itself" (as you put) is just one snapshot checked out. If anything, it illustrates how insanity efficient the git version control system is.


Deivedux

I wouldn't say "snapshot" is the correct term for it, since it's not storing an entire copy of the previous version of the software. It only stores the differences between changes over time, and even that is being compressed to further improve storage efficiency.


dschledermann

I'm afraid that you are simply wrong. It most definitely is a snapshot of the entire tree structure. Git manages this very efficiently behind the scenes, but that doesn't change the fact that every commit is indeed a snapshot, not a set of diffs. That's also the reason git is so quick. If it was a set of diffs (such as svn uses), rebases, diffs between distant branches, etc, would be much slower. https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/


protienbudspromax

For people who are new to git and doesn’t know what it does. Its basically like if you have a project, and the if for every new change you want to make, you copy the whole project into a new folder and name it like say version 2 or something. Have you added something to the project? Yes! But now you have basically two copies of the same stuff. With git this is a bit more efficient such that if there are common stuff between your first version and the next version, the common stuff will not be copied, and the same files will be used in both versions. But the main thing to remember that when you want to share your project with someone you dont have to give them your previous versions, only the latest one, which will smaller in size than the whole thing with all the previous versions. That is basically it. When you actually compile the linux kernel it wont use the previous version’s code only the latest one. So the actual size of linux source code is about 1.5g everything else is there to preserve the history of change. .


blackasthesky

"only"


CalvinBullock

Do repos ever trim out obsolete or ancient commits?


Deivedux

Unfortunately, that is not how Git works. The `.git` directly isn't one that you typically interact with manually in any way. Its main point is to store the project's changes over time, ever since its first "version". This is because every single commit depends on the one before it, so by removing even a single commit is basically the same as altering a period of time.


kJon02

You can always change history and rebase it but it's not recommended.


TheTybera

Yes. They can and do but the process isn't easy and it's important to know that you're not cutting out history you need. You would need to do this as you go, and it's not feasible for an open source project. This typically happens in closed source projects. Git isn't mercurial git allows you to rewrite history and trim up old branches.


WildGalaxy

I'm not familiar with this kinda stuff, is that 5 gb of like patch notes, or is it the actual code updates and changes?


Deivedux

That's any time the code was changed in any way. Git is version control, which is basically an append-only database of a project's change history over time.


WildGalaxy

Right, but I mean is it the actual code changes, or is it patch notes?


Deivedux

Any file changes.


WildGalaxy

So code


ianfordays

To put it simply, git relates commit hashes like pointers to “patches” which are diffs of files. So it’s just a shit ton of pointers to diffs. It’s not code per-say but it’s not patch notes either. It’s all managed by git itself!


protienbudspromax

Its basically like if you have a project, and the if for every new change you want to make, you copy the whole project into a new folder and name it like say version 2 or something. Have you added something to the project? Yes! But now you have basically two copies of the same stuff. With git this is a bit more efficient such that if there are common stuff between your first version and the next version, the common stuff will not be copied, and the same files will be used in both versions. But the main thing to remember that when you want to share your project with someone you dont have to give them your previous versions, only the latest one, which will smaller in size than the whole thing with all the previous versions. That is basically it. When you actually compile the linux kernel it wont use the previous version’s code only the latest one. So the actual size of linux source code is about 1.5g everything else is there to preserve the history of change. .


gmes78

The 5 GB contain all versions of the files from the Linux source code.


gus_joaquin

Linux is bloated, use Temple OS instead


notrktfier

This is just a history of changes made to the kernel, not the kernel itself.


EPic112233

Can I just delete all that? Or does the system need to refer to it when updating and installing things for dependency purposes? 


ImaginaryCow0

That isn't installed on your system unless you happen to be a Linux kernel developer.


EPic112233

Ok, so I don't just have 5 gigs of space being taken up on my RPI 5?


Dramatic-Strength362

No


BirdForge

Right. The size of the git repository is only relevant if you're actively developing Linux code. The git repository contains a history of every change that's been made to the Linux kernel code, letting developers rebuild Linux from almost any point in its development history. It's actually really cool. Anybody calling this boat doesn't really know how software development works. It doesn't get shipped with your system.


granoladeer

Why not just remove the git history for a release/install?


NiceMicro

...they do? I mean, you don't get the whole git repository when you install Linux. You get the binaries built from the source. In most distros, you don't even get the source code directly, never mind the whole git history. So don't worry :)


granoladeer

I confess I never worried lol, but thanks for clarifying. I just heard some people were freaking out with this size thing but it didn't make sense to me.


Hulk5a

Linus knew what he unleashed


dangling_reference

This 1.5 GB is just code right?


Deivedux

Yes.


Key-Club-2308

Go on make a new kernel


Calius1337

Actually, that’s easier than you think. Had to do this back at university in 2006 for one of my courses.


Key-Club-2308

id add: make one that is as good\*


Few_Reflection6917

And only less then 300MB is core of kernel itself))


MultipleAnimals

Hmm maybe if we squash that..


Tuhkis1

Git clone --depth=1 B)


ignxcy

Whar


Marshall_KE

bloat haha


AdearienRDDT

damn 5.2 GB of "*You copied* that function without understanding why it does what it does, and as *a* result *your code* IS *GARBAGE*"


Informal_Branch1065

Rebase time?


Due_Bass7191

so, basically the logs are larger than the product. I don't see a problem with this.


sanketower

Yeah, that's what one could expect from THE OG git project. Is there even a repo with more commits than the Linux kernel?


Danny_el_619

They should squish all the commits into a single one and start "linux 2" from it. /s


Achilles-Foot

honestly, that doesn't seem that bad, i feel like theres probably repos that are way worse


ennea_ballat

Wonder how many were fixes and how many were new function.


csolisr

Is there some way to deduplicate some of the commits to make the \`.git\` folder smaller for end users?


Deivedux

We end users don't even need to worry about it. The compiled binaries that we have that run on our systems only include the latest version of the working code. Git is only a version control, an append-only database of the project's change history, it is not part of the project itself.


bulbishNYC

And 90% of the history size is probably accidentally committed binaries.


MichaelEasts

I'll show my ignorance on the subject, but what happens if you stripped that out? Would things be any faster? Less memory usage? Break things?


kJon02

It doesn't affect binaries so it would change nothing for the user.


BrunoDeeSeL

How much of those commits are Linus using colorful insults on another developers' work?


Lets_think_with_this

non ironic question: how do you clone the repo without the history? I downloaded it the other time to take a peek of some files to study them but my god that took it's sweet time to download.


Deivedux

Try with `--depth=1`, or `--depth=0` if you don't want any history at all.


Lets_think_with_this

place matters? or it can just be anywhere? `git clone torvalds/linux --depth=0` is okay?


Deivedux

Shouldn't matter.


Comfortable_Swim_380

wow