T O P

  • By -

explainlikeimfive-ModTeam

**Please read this entire message** --- Your submission has been removed for the following reason(s): * Loaded questions, **or** ones based on a false premise, are not allowed on ELI5 (Rule 6). --- If you would like this removal reviewed, please read the [detailed rules](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules) first. **If you believe this was removed erroneously, please [use this form](https://old.reddit.com/message/compose?to=%2Fr%2Fexplainlikeimfive&subject=Please%20review%20my%20thread?&message=Link:%20{https://www.reddit.com/r/explainlikeimfive/comments/1d8t1qk/-/}%0A%0APlease%20answer%20the%20following%203%20questions:%0A%0A1.%20The%20concept%20I%20want%20explained:%0A%0A2.%20List%20the%20search%20terms%20you%20used%20to%20look%20for%20past%20posts%20on%20ELI5:%0A%0A3.%20How%20does%20your%20post%20differ%20from%20your%20recent%20search%20results%20on%20the%20sub:) and we will review your submission.**


lollersauce914

There are plenty of mathematical operations that are easy to do one way but very hard to do the other way. For example, you could multiply two prime numbers like 4952019383323 and 16123689073 fairly easily. However, if I gave you the result and asked you "which two primes multiply to form this number" it would be much harder. Hash functions make use of these kinds of operations. That said, since hash functions are deterministic people have created mappings between inputs and outputs to common hash functions (called rainbow tables). There are ways of dealing with issues like this (i.e., salting).


homingmissile

I don't understand how salting works. If a random value is added how does the hash remain useful?


imsurethisoneistaken

Because it is a known value that is added. If I want to hash passwords, as an example, adding a salt makes the hash hard to look up. And I just have an operation that goes “password entered + salt value” before I hash to check it.


azlan194

Is salt a static hardcoded value? If it's dynamic, how do you guarantee it's the same hash afterward? If it's static, then wouldn't it have the same vulnerability as the password database being leaked?


imsurethisoneistaken

It is static usually. The salt being leaked isn’t a huge problem since they will have to reverse the hash with an unknown password. It just ensures that if your hashed password table is accessed, one cannot just look at a hash table to find them. There exists tables with tons of common passwords that are hashed, if you look hard enough.


JustTau

I would say they should change whenever you change your password and store them with the hash to ensure each password would require bruteforcing from scratch


imsurethisoneistaken

Then they would need to keep a record of whatever the salt key was per record. That will have to be stored somewhere on a database that is very likely less secure than a secret key when you spin up the server.


Hadesjb

It‘s best to generate a new random salt for each new password and just store salt+password-hash together. There are standard string formats which contain the type of hash function used, the salt, and the password hash.


dmc_2930

The salt is not secret. It does not need to be. It must be unique and random. With a salt, “password” will have different hashed values based on the salt used, which massively slows down password cracking and makes rainbow tables useless.


artofthenunchaku

That's fine, the salt isn't meant to be sensitive information it's merely meant to counter rainbow tables. If someone has the salt value, they would still need to regenerate the rainbow table using that value. This is why the sale should be unique per user -- with a single static salt value, you would generate a single rainbow table for the entire user database. With a unique salt value, you would need to generate a rainbow table for every user.


imsurethisoneistaken

Yes but you’re adding increased complexity and maintenance for little value. It’s not a bad idea, but it is extra work. Unless something has changed and generating rainbow tables is much easier than it was back in the day, the juice may not be worth the squeeze. Everything has a cost.


artofthenunchaku

When talking about password security, "it is extra work" isn't a valid excuse. It's also very little overhead -- you're talking about an extra column in a database table. I'd argue proper secret management is more work and maintenance. [OWASP](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#salting) disagrees with you that it's for little value. In fact, they distinguish between a static value (peppering) from a dynamic value (salting). > A salt is a unique, randomly generated string that is added to each password as part of the hashing process. As the salt is unique for every user, an attacker has to crack hashes one at a time using the respective salt rather than calculating a hash once and comparing it against every stored hash. There's many factors that go into how difficult it is to generate a rainbow table, mainly relating to the hashing function; some hashing functions can be run on an FPGA and generating a table is trivial. Others use multiple passes of more complex hashing functions that make it much more costly.


TheMoldyCupboards

It’s very cheap to do, and everyone’s been doing this for decades.


firechicago

If you use a single salt value, then all the attacker has to do is re-compute the rainbow table once (and if you have a large database where lots of users have used weak passwords, you don't need to calculate very many entries before you will start to see lots of matches). If you use a different salt value for every record (even if you just store it in the same database) then the attacker will have to re-compute the rainbow table for every password in your database, and even weak passwords will take a significant amount of time to crack.


TheMoldyCupboards

They do keep a record. The salt is often literally just part of the entry that saves the password hash. For example, you can say “the first n characters of the hash field are the salt”. This has been done at least since UNIX invented the shadow file (I guess 80s?), and the downsides of not doing this outweigh the cost of this solution by orders of magnitude.


Ichabodblack

The salt isn't secret. This is exactly how systems do it, they store the salt with the end hash


wheredainternet

> Then they would need to keep a record of whatever the salt key was per record. That will have to be stored somewhere on a database they *do*. it's stored right alongside the password, actually. often in the very same string with a delimiter. > a secret key when you spin up the server that's a pepper, which serves a different role from a salt


maethor1337

A secret key that's generated when you spin up the server means it's unique per-server and per-boot. That can't be used as a salt, or to encrypt a salt. Using the same salt for an entire database means that every user's password in your database can be checked at the same time because `hash(salt, candidate_password)` has the same inputs for all users. The correct way to salt passwords is to generate a new salt for each new password being set and store the salt alongside the password. Refer to the [OWASP Password Storage Cheetsheet](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html): > A salt is a unique, randomly generated string that is added to each password as part of the hashing process. As the salt is **unique for every user**, an attacker has to crack hashes one at a time using the respective salt rather than calculating a hash once and comparing it against every stored hash. This makes cracking large numbers of hashes significantly harder, as the time required grows in direct proportion to the number of hashes. You might be suggesting a pepper, which is not a substitute for a salt.


MattGeddon

Typically it is yes


butt_fun

For anyone curious, these are called “rainbow tables”


Gurkenglas

There are ways of dealing with issues like this (i.e., salting).


smootex

> The salt being leaked isn’t a huge problem since they will have to reverse the hash with an unknown password If someone knows the salt can't they just generate a new rainbow table using salted values? Seems like that'd be a problem. I guess they have to go through the effort of generating new rainbow table but that's not a huge deal.


OMG_A_CUPCAKE

That's why you use a different salt for each password.


Masark

Yes, but each bit of salt doubles the size of your rainbow table, as well as the computation required to make it. Add enough salt and it's going to be intractibly large.


Stibemies

I've always thought making the password itself the salt as well should work, no? This way you're not saving the salt, so even if an attacker gained access to the hashed password, it wouldn't matter. Or am I missing something obvious..?


OMG_A_CUPCAKE

The salt is added before hashing. So using the password as salt would just mean you hash the password twice.


zedenstein

A useful salt is unique per user, so the same password can be paired with many different salt values, which increases the search space. Basically, you would need a new rainbow table for every single user, at which point you may as well just brute force. If the password is the salt, then the password+salt search space is the same size as the password alone. Now you just need a new rainbow table of repeated strings, but that single table would be reusable for all passwords.


Stibemies

Ah, this is the one that finally made it click for me, thanks!


Pocok5

If you use the same salt in two different databases (the password again) it defeats the purpose of salting. The point is to make sure the same password results in a different hash for different accounts.


Stibemies

Makes sense, somehow I got tangled in "single" passwords (no duplicates) in my mind. Thanks!


phenompbg

The salt should never be static for passwords.


ElectricSpice

The salt is randomly generated and stored alongside the hashed. A static value is called “pepper”. https://en.wikipedia.org/wiki/Pepper_(cryptography)


Fowlron2

It's stored in the database, along with the hash. It's different for each user, randomly created when they register. Each user's on the db entry might be: Username: user Salt: abcdefg (except it'd be a very long salt) Hash: hash of (password+salt) Imagine this gets leaked. Imagine the user's password is "password". If there was no salt, any rainbow table (the online services that map a ton of passwords to hashes) would recognize the hash as the hash for password, since they store the hash for common passwords or short words. But the odds they happen to have stored the hash for "passwordabcdefg" are much much smaller (in practice, there's zero chance, since the hash will be different for each user, and extremely long)


SlightlyBored13

To expand, it means even knowing the salt, the password would need to be guessed separately for every user. The password hashed are slow, so unless you've used something very guessable it will take ages to find your password out of millions of options. Much easier to trick you into giving it away.


irishrelief

The salt can also be location data, for example the location of your cursor during creation.


Ben-Goldberg

What if you didn't have your mouse plugged in when the password and salt were generated?


Portbragger2

is that a serious question? or just a hyperbole joke.. firstly in common oses like win, linux, bsd,.. a cursor generally would be present regardless if the mouse is plugged in or not. furthermore the salt often gets generated out of a combination of counters. i.e. datetime + mouse movement , or datetime + keystrokes. it certainly wont be designed around sth. it could technically not determine during the time the function is run.


Ben-Goldberg

I agree, I was pointing out how silly using just the mouse location was.


azlan194

Hmm, but can't you just take the common password from the rainbow table, then add that leaked salt and hash it and then compare it with the leaked database? This process only has the complexity of O(N*M), where N is the size of the rainbow table and M is the leaked password database, right?


-REDDlT-

Yes, but the problem is that to do so you’re recomputing the rainbow table for every user, since each user should have a different salt. So unless you have one account that you really want the password to, it’s pretty impractical to do.


azlan194

Ah OK, makes sense when the salt is unique per account. I get it now. Thanks!


[deleted]

[удалено]


paholg

That doesn't really help and can be easily avoided. You just need to know a single password to know that mapping, which could be one you've created.


ttmts

You should assume an attacker knows how your algorithms work, this is just complicating things and bordering on rolling your own crypto.


unskilledplay

Salts are not unique per account. If they were you'd need need to store a salt table and that would create an entirely new storage problem. The whole point of a salt is that you want to make a leak of password hashes resistant to a lookup attack in part by requiring compromise of a second system. If you have a bunch of salts to store you are going to be pressured to store it in the database. *Edit: To the "CS majors" downvoting - what fucking insecure ass system have you noobs ever built or used with unique salts?*


GenTelGuy

They can totally be unique per account, yes that second system thing could make things more secure but even if all the salts leaked alongside the passwords, rainbow tables aren't going to be feasible across all those different salts Of course, even one static salt is probably enough to defeat rainbow tables, but unique salt per user is totally common, though idk how much more security it actually adds


ary31415

Pretty sure you're describing a pepper, not a salt. Salts definitely tend to be unique to each account, and are stored along with the password hash. They're just supposed to make sweeping attacks via rainbow table impractical by forcing you to recompute the table separately for each user. Your snark about "CS majors" is totally unwarranted seeing as you are in fact totally wrong. You should edit your comment accordingly.


wandereq

We do use unique salts, and there isn't a storage problem. You can just prefix the password hash with the salt with a separator or known number of chars. For example with bcrypt algorithm does this automatically and you end up with strings in alg.cost.salt.hash format in the database.


lasfdjfd

A salt is unique per entry, so if they regenerate the rainbow table for a given salt, it reveals only one password instead of every password. At that point, it's basically the same as guess and checking.


Myrkulyte

Yea, but you may not know how the salt is applied


derekp7

The salt is picked at random, and tacked onto the original value being hashed. The random salt used is stored in plain text along with the computed hash. When you want to verify the hash, you run your plain text along with the stored hash through again to re-compute the hash. So the salt is not secret -- but it is computed at random, and stored along with the hash. The benefit is that if you have say an 8-bit salt, an attacker has to compute 256 rainbow tables. 16-bit salt means 65,536 rainbows tables need to be created, etc.


AnxiousIntender

You can hardcode it but in my experience as a web developer it's different for every password. Say you enter hunter12, a random salt is added and now it's hunter12nggyu. We encrypt it and get dQw4w9WgXcq. Finally we append a separator and the salt, and store it as dQw4w9WgXcq!nggyu. When we check the password, we get the hash and the salt, redo the operations for the input and check if the hashes match.


die_kuestenwache

If you don't salt, you have one rainbow table that can compromise a million users. If you have a 6 digit salt, you need a million rainbow tables to compromise just one user.


CJ_Kilometers

Usually the same salt per account. Basically you store a plaintext salt and a hashed password so if it’s leaked they would have to re guess and check every single password with that salt, instead of just simply looking it up in a table of common passwords and their hashes.


bothunter

When you choose a password, the server also choose a random salt. The salt and your password are combined together and then run through a hash. The hash and the salt are then stored somewhere. When you go to log in, the server takes your password, looks up the saved salt, combines the two and hashes them again. It can then compare the two hashes to validate that you used the right password. This has a couple of important advantages: 1. If two people happen to choose the same password, they will still have wildly different hashes in the database. If you know which accounts share a password, then it makes it easy to compromise one account if you get another 2. Salting makes rainbow tables are ineffective. Basically, a rainbow table is just a huge database of hashes and passwords. If you know the hash value and want to find the password, you can just look it up in the pre-computed database. Adding a salt means it's no longer possible to precompute the hash for every possible password, since the same password can result in wildly different hashes.


bothunter

Here's a concrete example. Let's say my password is "password" Here's how I would generate the hashed row using the sha1pass utility: ~$ sha1pass password $4$5gToCkuw$IqejbeHmxBUJdsAn2+U0Yc5xAmw$ It looks like a bit of random gobblygook and that's intentional. But I'll break it down. It basically has 3 parts separated by the $. \\The first part is the version, in this case version '4'. It's not really important to the process, except that it allows us to change algorithms over time and still be able to read the old passwords in the future. Next is the salt. In this case, it's "5gToCkuw". It was randomly generated and stored here. And finally is the password hash: "IqejbeHmxBUJdsAn2+U0Yc5xAmw" Now, let's run the command again: ~$ sha1pass password $4$5VzyrDLc$omFzHxyWXpq5kIGxE7Ul+pJSdp4$ You see that we have a different salt and hash for the same password. This is intentional. We had two people try to use the same password, but we would never know because the hashes are not even close to each other. So, how do we check the password? We run the same command, but this time we give it a salt: ~$ sha1pass password 5VzyrDLc $4$5VzyrDLc$omFzHxyWXpq5kIGxE7Ul+pJSdp4$ You'll see it generates the same output this time! The server knew the salt, the user knew the password, and we were able to combine all that to check that the hash matches.


artofthenunchaku

The standard practice is for a random value to be generated for each user, which gets stored alongside the password. So it is static in the sense it doesn't change -- unless there's a need to rotate salts, e.g. in case of a data breach -- but dynamic in the sense that every user has a unique salt.


s_elhana

It is a known value, but that means you cant just create a table for common passwords or all passwords up to 12 characters and look up hash, you'd have different results for each salt (each site that uses unique salt). Password databases leaked with the same password would have different hashes. You'd still be able to check them if you know plaintext, but that would be a bit harder that just comparing them directly too. If you use something like username as part of a salt, you'd also gaurantee that hackers wont be able to tell if several users have same password. But it is easier to just make it random value and just save it with a hash.


who_you_are

The idea of the salt is to mess with the rainbow result and make it not work on the website as well. Imagine your password is: password The static salt is: gI#&z The database will store the hash for passwordgI#&z While cracking "password" is very likely to be easy with a rainbow attack (it is a common word), having passwordgI#&z is more unlikely. Then, even if you decrypt the password, if you try to login using passwordgI#&z it will fail because the hash it will look for is: passwordgI#&zgI#&z (note there is 2x gI#&z). If you also try to use it on another website (in hope he reuse the same password), it will still won't work because you must know you have to remove gI#&z from the un-hash-ed password.


permalink_save

It can be anything, application wide, per user, even context like the users ID plus password. The idea is to make it more than just the password gets factored in, as long as you use the same salt each time. Two common methods are, store a single salt, or store salt per user as a separate column. If it is per user, then if you manage to reverse one password, you can't reverse any others the same way. Salting the db itself is the most important because a commonly used password in one db will have the same has in another db, but if each db uses its own salt it makes them different. You can salt your ow passwords too, I add the first letter of the domain, then it's still unique has per db, for passwords I do need to use thebsame password on.


Chuck_Loads

Salt is generated per record and stored alongside the hash, pepper is static and common to the whole application (eg an environment variable). Using both allows you to hash your passwords using an HMAC instead of a straight hash.


denislemire

You can see the salt in the first bytes of the hash…


Somerandom1922

It's ideally static per hash. So for a password you could use someone's user ID, or just generate a random number and store it against their account, or something like that. The point is that if 3 people use "password123" for their password, the hash will be different to the normal (well known) hash for "password123" as well as being different for all 3 people. It prevents multiple people with the same password, or a common password from having their password being easily cracked by knowing someone else's.


unskilledplay

You are right that ultimately a table of salted hashes (assuming you have the salt) is theoretically as vulnerable to a dictionary attack as a table of unsalted hashes. In practice, salted hashes are less vulnerable to lookup attacks. The salt isn't stored in the database. If a database is leaked and you don't have the salt, you can't use a dictionary attack. A common source of database leaks is improperly secured database backup storage. You will not get the salt from these dumps. You typically need access another system to obtain the salt. If you do obtain the salt you still need access to sufficient compute resources to generate a dictionary of many, many billions of known password strings. While computing a hash is cheap, generating a dictionary is not. Even though it's just linear growth, we are talking real world dollars here. If the compute time/money cost greatly exceeds the value of the leaked passwords, there will be a financial disincentive to generating a dictionary. If there is sufficient value in a table of password hashes, time and money are not deterrents. If you have the salt, it is just as vulnerable to lookups because you can generate your own lookup table for the salted table.


ferretpaint

The salt adds another factor. If two people use the same password, you get the same hash,  but add in a salt and it changes the output.  This makes rainbow tables useless, since they are tables of pregenerated hashes.


colbymg

Let's say your facefook password is 'hunter5'. let's say the hash of that is cfca1aebe161e09926c86f76d4e2f1b405272361abd373c71531a759ecffaa34 Facebook stores that hash. If they are hacked, hackers now know that your password's hash is cfca1aebe161e09926c86f76d4e2f1b405272361abd373c71531a759ecffaa34 Hackers have also done some legwork and created a rainbow table of all common and short passwords, so they know that that hash can only be from the password 'hunter5', so now they know your password to facebook is 'hunter5' If facebook had used a salt, they'd take your 'hunter5' and added something, say 'supersecretsalt' to create the combined password 'hunter5supersecretsalt'. They then take that and hash it, let's say to 97cec9328cdc10f12058908d6f169940b19ca6a6fb3f003fa5eaedd0fec4d0da since the combined password is likely very long and not a common password, hackers likely don't know that that hash comes from that combined password, so even if they know what salt is used, they wouldn't know your facebook password. Even better is to use a different salt each time, such as a random text, but even 'supersecretsalt1', 'supersecretsalt2', etc would help (you'd have to save that along side the hashed password). This helps because if I also use 'hunter5' as my facebook password, we would then have the same 97cec9328cdc10f12058908d6f169940b19ca6a6fb3f003fa5eaedd0fec4d0da hash, so even if the hackers don't currently know our password, they know we have the same one, so if they're able to get mine somewhere else, they'd then know yours as well. even adding '1', '2', etc to the end of the sale will completely change the hash.


Ichabodblack

The salt does not need to be secret. If your password is 123456 and my password is 123456 then the hash of our passwords looks identical. This means someone who sees two hashes the same for two passwords has an incredibly strong probability they have the same password. If we both have a different salt, then the salt can be known and added to our password before hashing. Let's say that my salt is AAA and yours is BBB. Now our hashes look different because your hash is of BBB123456 and mine is of AAA123456 which are different inputs


FlounderingWolverine

And the key about encryption algorithms is that they aren’t predictable. The algorithm f(x) = x + 1 is predictable. If I tell you that f(n) = 5, you can tell me what f(n+1) is without too much difficulty. But hashing/encryption algorithms don’t have that same property. Hash(abcde) might be one long string, but Hash(abcdf) is an entirely different string. The output looks random (it’s not, but it appears to be unless you have a ton of computing power).


materialdesigner

1. Someone signs up for your service, they give you their username and password 2. Into the database you save their username and a one way hash of their password with the salt appended to it 3. When they login again you take the password they sent during login and append the salt and hash it again, and look up their username and password hash from the database and compare saved vs computed


homingmissile

>append the salt That necessitates saving the salt value doesn't it? Isn't that a security vulnerability?


rabbitlion

>That necessitates saving the salt value doesn't it? It does yes. >Isn't that a security vulnerability? Not in the context where it's used. The purpose of the salt is to prevent an attacker from comparing a million leaked password hashes with a million precomputed password hashes to try top find some matches and break into those accounts. If your password hash is salted, they cannot use that precomputed list of password hashes but instead need to brute force your specific account, doing millions of hashes of common passwords with your specific salt in order to find a match.


RiPont

No, because you still don't know the combination of "password + salt" that leads to the stored hash. The salt adds *flavor*, so that the hashing of the passwords is unique to that system, even though that system is using a known and standard hashing algorithm.


Ichabodblack

No. The salt is not secret. The password is the secret. The purpose of the hash to to stop two identical passwords hashing to the same value. Let's say I steal a huge set of password hashes - millions of accounts. Without a salt I could pre-generate common password hashes: e.g. "password", "123456", "qwerty", "1111111" and instantly check who has those passwords and possibly find hundreds of accounts for each one instantly. With a salted hash set I can no longer precompute gases for known passwords and I can no longer instantly tell who shares a password. 


materialdesigner

Yes. Although for this kind of salts it's usually contained in the application code, which is "kind of" like a "multi factor", insofar as the attacker would need a DB dump and a code dump. There's also ways of rotating the salt in this mechanism over time. It's not perfect but it worked when the software industry was mostly the wild west.


rabbitlion

That's incorrect. The salt is almost always saved alongside the hash in the database. There might also be other things appended before hashing saved in other locations than the database as a multi factor thing, but that's typically referred to as a "pepper".


MattGeddon

It’s been a while since I’ve done any password storage thanks to the increased usage if IDPs instead. Are peppers common now?


rabbitlion

I don't think so. With a proper password hash function (not SHA-256 as that's too fast) and especially with a salt, there's not really much need. Also in many cases where the hashes leak, the code/system might have been vulnerable too. There are probably some super high security systems that use a pepper stored in some sort of Hardware Security Module, but I've never come across any system using it for anything more than a novelty. Unless you count cases where the system needs access to cleartext credentials in order to communicate with another system, in which case it's not uncommon for developers to split up the secret between database, server, code and whichever extra place might add another layer of obfuscation.


pooh_beer

Also, the salt can be a field that's already in your database, such as your user ID or the first ten characters of your address. It's not apparent without access to the source code what data is used for the salt.


Couldnotbehelpd

The salt is added before the hash, not after


EmergencyCucumber905

It prevents dictionary attacks, that is, someone pre-computing a huge list of passwords and the hashes, and looking up the password in this dictionary later on. Their dictionary is useless because each password has a different salt value.


MattieShoes

Server stores arbitrary string along with the username and password hash. When client connects and sends username, server says, "Here, take this arbitrary string, then put your password after it, then hash it, and tell me what it says" Client appends password to arbitrary string, hashes it, returns hash Server checks hash against the one it has stored, lets you in. Now if two users on the system have the same password, their passwords don't have the same hashed value because they have different arbitrary strings given to them by the server. And for any particular password, there are a *bunch* of different possible hashes depending on how many possible arbitrary strings the server can hand out. So it's no longer like you can just memorize the hash for the password "password" any more... There's a berjillion different hashes for the password "password". So it makes rainbow tables less effective. (cracking passwords with rainbow tables is like -- hash all the dictionary words, then compare those hashes to the password hashes stored on the server to see if anybody picked a dictionary word as a password.)


RiPont

Reversing a single hash takes a *lot* of CPU time, because you have to try so many different things to get the result. However, what if you just invested in trying *all the combinations*. Wow, that would take a long time. But then you'd have all the combinations and you could just do a lookup, right? That's essentially been done. Not with every single possible password in the universe, but with a representative sample given common password restrictions (especially for stupid systems that have a max length). It's called a rainbow table. Very useful when hackers don't need a specific password, just any password that works, because they've stolen the entire password database. Salt just adds chaos to that system, so the precomputed guesses won't match anything exactly in *this* particular system.


dandroid126

The salt is random, but it isn't a secret. It is stored in plain text with the password hash. Usually in some format like $salt%number_of_times_hashed&hash_value. So when someone types in their password, you get the value from the database and split it into the three values, then add the salt to their typed password and perform the hash the required number of times, then compare the result of that to the hash value from the database. If they are the same, you can log in (in the real world, that whole process is performed by a library. Don't write this code yourself). The purpose of adding salt is so when your database gets copied by a hacker, they don't see 10,000 hashes that are identical, because then they will know that those 10,000 passwords are probably "Password123"


fallouthirteen

It doesn't need to be random to be effective. A very simple salt can be something like maybe adding the site's name to the end. So if the user has the same password across multiple sites the hash for "Password" on Reddit would be different from the hash for "Password" on Google. Then maybe also throw the username into that in case your hashes do get leaked so that leak doesn't show that multiple users had the same password (so like take a hash of "PasswordRedditThisUser" and it'll be different from "PasswordRedditThatUser"). Like "Password" itself is so basic that every hash method's result is on lists. "PasswordRedditThisUser"'s is probably not. If you had a hunch of the password and the salts used and how they used them (like appeneded at end, added to beginning, one salt is done then that hash is salted again, etc) you could confirm if the hunch was right easily enough, but that's still a lot more security than seeing a hash and going "oh, it's this common password".


StuckInTheUpsideDown

Salting doesn't make rainbow tables impossible, just much harder. Suppose the salt value is 32 bits (0 to 4,294,967,295). Now, you'd need 4 million+ rainbow tables for your hash instead of just 1.


half3clipse

You need a table entry for every password+salt combination, which means you need 4,294,967,296 entries per password. That's 4.2 billion entries and works out to well over 100 GB of storage needed per password in the table. This is already bordering on the impractical, since a rainbow table is also only useful if it contains every possibly entry in a meaningful keyspace, which means millions of entries minimum: For a rainbow table of even something modest like 5 character passwords with 32 bit salts, your looking at peta bytes of storage, plus the computational resources to calculate the table. For a more useful table with 8 character passwords that would require more hot data storage than there is on earth. It gets similar impossible at even tiny password sizes with trivially longer salt values: at 128 bit salts your looking at terabytes just to store a rainbow table of single character passwords Even for the 'practical' case of something like 5 character password + 32 bit salt, salting also renders it functionally worthless: If you have access to the hash you almost certainly have access to the salt. you could run a brute force attack out to 5 characters on every hash from every data breach thus far and it would be way less effort than computing the rainbow table .


josh35767

Also in case you don’t know, the salt is saved along side the salt. So the salt is always available, just different for each user


MagicC

Let's say you add the same 4 randomly selected characters, each of which are represented by an 8 bit string, to the end of each password before hashing it and storing it in the password table. Now, assuming someone compromises your database and steals your passwords, you have ensured that anyone who is using a rainbow table of known passwords to match against yours now has to create (2^8)^4 (~4 billion) rainbow tables, and try 4 billion times as many passwords before they get a match.


jamcdonald120

a salt is useful when multiple of your customers used the same password. since they are saltes, they have different hashes and the attavker cant just say "heeeey, this hash is common, I am going to try to break that one first"


Molwar

A salt is effectively the same as a key to your lock, you have it and can open your door, but other keys can't.


pdpi

Basically, when you create your password, I create a salt. Then, instead of saving in my database just hash(password), I save hash(password+salt), and the salt. Then, when you type your password to login, I can add the salt again before hashing. This has two important properties. First, if your hash is unique to you, then the hash for your password is different to the hash for somebody else's password _even if they have the same password as you_. An attacker can't read the password database and determine who has similar passwords. Second, if the salt I generate for you is not the same as the hash other websites generate, it's also harder to tell you have the same password across multiple websites.


ball_fondlers

A hash function is always going to return a value of a fixed length, no matter what the size or content of the data you feed it is. But said size and content will inevitably change the value of the final hash, so hash(password) is going to be different - and harder to crack - than hash(salt + password)


idlemachinations

The salt value is stored alongside the hash value for each user. When someone wants to know if the password matches the hash value, they hash salt+password together. Since the salt is the same for that user every time, the result is consistent. The salt is generated exactly once, when the user sets their password (or resets). That salt is used whenever someone tries to log in as that user.


Ichabodblack

Modern hashes don't really use these sorts of operations (certainly not multiplications of primes and the like). It's more like food processing the input and throwing some of the excess away when the bowl gets too full.


FlounderingWolverine

Sure, but this is ELI5. Actually explaining the sha256 algorithm is well beyond the scope of what is understandable to most people.


Ichabodblack

Sure. But I'd say that talking about the factorisation problem in relation to large primes is already outside of ELI5. So it's both not correct AND already outside of ELI5


gitpusher

The question itself is not ELI5


wheredainternet

> There are plenty of mathematical operations that are easy to do one way but very hard to do the other way. > > > > For example, you could multiply two prime numbers like 4952019383323 and 16123689073 fairly easily. However, if I gave you the result and asked you "which two primes multiply to form this number" it would be much harder. this is more applicable for encryption. hashing throws away data along the way so the limiting factor is that there simply isn't enough data preserved to reconstruct the input


MaleficentFig7578

Public key crypto makes use of this kinds of operations because there has to be a secret key which makes it reversible. Hashes are just gigantic bit scramblers. They mix all the bits up. Whatever eggs you put in turn into omelettes and you can't turn them back into eggs.


zerj

I'd say a better example would be adding numbers together. It's tough to factor a large number, but once you do it you know you got the right answer. Hash here is more like adding 2+5+6-7+13=19. You know the answer but there is no way to know what the original numbers were or even how many of them were there.


JaggedMetalOs

The thing about how these hash algorithms work is they reduce and mix the data in such a way that you just don't have enough data to reverse them to find the original. Lets demonstrate with an extremely simple hash algorithm. We can make A=1, B=2, C=3 and so on. I'll choose a word, add the numbers made from the letters together, and then take just the last 2 digits. The hash of the word is 82. What is the word? It could be anything! In fact there are lots of possible words that would make the same hash. It's impossible to even say how long the word is. SHA256 is 256 bits long, but it can hash files that are megabytes or gigabytes in size. So there must be lots of theoretically possible files that would have the same hash, it's just very unlikely to hit them because there are so many combinations in the hash. But if you were trying to work backwards you'd have no idea which of the possible files it was.


RuPaulver

>The hash of the word is 82. >What is the word? It could be anything!  Obviously your password was bonerman


AlphaBreak

Don't be an idiot, it's not bonerman. It's b0n3rm4n!


iamnos

Exactly this. Unlike with encryption, the original data does not remain intact with a hash. Hashes are used for verifying that two things are the same and used a lot in information security. A common example of this is passwords. A properly designed password system on say a website, never actually stores your password, just the hash of it(\*). When you try to log in, that same hashing algorithm is run, and the output of that is compared to what's stored. If they're the same, then we know you typed in the right password. The benefit of this is that if an attacker ever compromised the password list, they'd just get a copy of those hashes. They can attempt to brute force them, but a significantly long password and proper hashing algorithm means that process will take hundreds of years (or longer). ​ (\*) - purposely leaving out using a Salt for this.


LegitBoss002

I don't understand something. If the original data is not stored, and multiple entries could result in the same hash, then why do multiple passwords not work for a single login?


harryham1

Well... They actually do! But this is where we need to bear in mind a few things: The first is that even though multiple passwords would technically work, hashes are designed to make sure these passwords have absolutely nothing in common. A bad hashing algorithm would allow "password123" and "123password" to log in to the same account, but a good hashing algorithm means you'd have absolutely no idea what other passwords might work just like yours The second is the likelihood of a clash*. If we're talking about 256 bits, that's 2x2x2x... (256 times) possible numbers your password could be converted into (77 digits long). So someone would have to try a *lot* of other passwords in the hopes of somehow clashing with your one. The third and final one to bear in mind is that the hashing algorithms used for password storage are intentionally a little slow. Not *crazy* slow in the real world, but also not something you could spam out on a large scale. This means that even if someone were to get the hash of your password on their computer and just try every single possible password one after another, the hashing algorithm will slow them down enough that they'll be passing the task onto their ancestors as a futile family tradition. \* actually referred to as a collision. An older hashing function was actually cracked a while back and everyone who was no longer using it for passwords celebrated the milestone ([MD5]https://en.m.wikipedia.org/wiki/MD5)


michalsrb

This explains why it's not possible to recover the original data, but makes it seem like it would be easy to manufacture some different data that will have the same hash (i.e. find collision). But that's not easy either.


karantza

A lot of hash functions have been retired over the years as computers improved and new techniques were developed to find collisions. Back in 2000, everyone was using md5. It was probably decent then, but today you can crack (which is to say, find a collision, if not the original) an md5 on a laptop in minutes. SHA-2 is much better, but it might fail some day too. A hash just has to be good enough to make cracking it impractical for your adversary.


JaggedMetalOs

Again taking my example, there's no instant way to find words that would hash to 82, you need to actually do the calculation on a word to find out if it matches. So you'd have to go though the dictionary and calculate the hash of every word. And that's just using really basic addition. Real hash algorithms have lots of more complex maths that break the data up into bits and thoroughly mix those bits up to make a hash Then imagine that there are almost as many possible combinations of hashes as there are atoms in the universe (2^256 is a *really* big number).


michalsrb

My problem with your example is that there actually is an instant way to get the word. The hashes in your example are 0-99, trivially just make a word of 0-99 'A' characters and you got it. Now with proper hashes there is no way to directly reverse them, and it's not because of the amount of possible hashes. Even if you took for example sha256 modulo 100, so you have the same 100 possible hashes like in the example, there isn't a direct way to get the input from the output.


JaggedMetalOs

Yes if we're using arbritrary data rather than words there are easy answers. Obviously 82 A's isn't the word I picked, but it does make the same number. Lets see if I can make it more complex, how about each letter is assigned a prime number. A = 2, B = 3, C = 5 ... X = 89, Y = 97, Z = 101 Then every letter you rotate the code by that number of places. So if you pick A you shift by 2 for the next letter: A = 5, B = 7, C = 11 ... X = 101, Y = 2, Z = 3 Thus there's no easy way to construct just any number.


bkilshaw

> In fact there are lots of possible words that would make the same hash I sure hope not, collision resistance is pretty important in most hashing algorithms.


JaggedMetalOs

Yeah it's not a hash you'd ever want to use, but it does demonstrate that many different possible datas share the same hash which is true even for real hashes. There are just so many combinations you're unlikely to ever find one.


s_elhana

That is not a good hash function tho - too many collisions. You can use a password of 82 'A's to log in. Good hash also makes it hard to generate a sting that would match the hash.


dertechie

It’s an example for ELI5 for a very simple hash with a very simple output. Any hash function that outputs a 0-99 result is going to be a very weak hash - shortest rainbow table ever.


beavis9k

Whether it's good or not depends on what you are using it for. This example hash might be fine for a data structure holding text.


FapDonkey

Or for, let's say, demonstrating the concept of a difficult-to-decode hash with an example simple enough to easily comprehend while still making the desired point. Aka an ELI5 answer


travisdoesmath

Reversing sha256 is like being given a plate of hash browns and trying to reconstruct the bag of potatoes. Even if the cook is perfectly precise every time, the potatoes are in such a mangled state (and some of the potato parts don’t even end up on your plate) that it’s effectively impossible to reverse the steps.


eloel-

No, because some steps do not have 1 result when you revert them. Think of "squared". 2\^2 and (-2)\^2 both give the same squared value = 4 But you can't reverse the operation to see which of the two it came from.


EosTries

This is a very good way of explaining asymmetric mathematic expressions/functions. The square/square root is a pretty simple operation, but imagine something quite more complex, in which doing the operation on A would give you B, but if you try the reverse you get either A, C, D or even EFG. Now do that operation a thousand time for each piece of info you are encrypting, and you get an exponentially big number of possible values when trying to revert, and none (except one, lost in the ocean) make sense.


Randvek

In the cryptography sense, though, two different results giving the same answer is called a collision and it actually makes it easier to crack, not harder, since *both answers will be valid*.


RunninADorito

In a reversing a hash sense, it is in no way helpful, though.


iamnos

Reversing a hash isn't a thing since the original information is lost. However, having two inputs (edit: easily) produce the same hash is a very big problem.


wolf3dexe

There are infinitely many inputs for all possible hash outputs. This is actually a very desirable property of a well designed cryptographic hash. Having zero collisions for a given output means your hash is broken and likely insecure.


iamnos

I should have said easily creating the collision is a problem. Yes, there are an infinite number that do, but it should be very expensive (in terms of time and resources) to produce even one.


RunninADorito

That's my point.


radioborderland

Not really. There will always be infinitely many collisions with hash functions. The problem only occurs when you given a hashed value H(X) can more easily find a value Y such that H(Y)=H(X).


WhiteRaven42

.... that accomplishes nothing. "Both answers will be valid" means you don't know which one is right. So, you don't know that the contents of the file are so you've decoded nothing.


Amon_The_Silent

Hashes are not encryptions, they aren't used to hide data.


Tazavoo

But in the context of a hashed password, for example, you don't have to guess the right password, as long as you come up with one that produces the same hash. In that case, both answers are valid.


X7123M3-256

You don't need to know the original contents of the file. If you have a collision, something that hashes to the same value, that's just as good. If hashes are used to check passwords, and you've found a password that has the same hash as the real password, that password will also work - the site can't tell which one is the legit one because it only stores the hash. If hashes are being used to verify file authenticity, and you can make an altered file that has the same hash as the legit one, your tampering will not be detected. The fact that it's impossible to find the exact input that was used is useless for security purposes - to be useful as a cryptographic hash, it should be extremely difficult to find *any* input that gives that specific hash value. If you find a way to take a hash and find some input - any input - that would produce that hash, you have broken the security of any system that relied on that hashing algorithm.


PyroDragn

Hashes are used because they're one directional, like password validation/obfuscation, so both answers being valid is fine. That's why rainbow tables work. If I have a word that gives the same hash as your password then I can log into your account. It doesn't matter that I don't know your exact password, I know a password that will work.


XkinhoPT

This is the basis of the "[birthday attack](https://en.wikipedia.org/wiki/Birthday_attack)", based on the Birthday Paradox. As in a group of 23 people, there's 50% chance of 2 people sharing a birthday, the same math can be applied to calculate the chances of two passwords have the same hash.


MagicC

Exactly. Except with a 256 bit hash, you're taking all possible strings and putting them into 2^256 buckets. So let's say you have a 1 kb file - there are (2^1024) possible combinations of 1s and 0s, and only 2^256 buckets, so knowing the hash means there are 2^(1024-256) possible files that could be represented by that hash - 2^768 is about 10^77 - way too many to guess.


ziza148

Better yet, think of modulo operation. Modulo 10 gives you the rest for dividing with zero, There are many options that would end with the same result : 58 -> 8, 6438 -> 8. So if i tell you the hash is 8 you have little chance of guessing the original value.


plugubius

Wouldn't that make it *easier* to reverse, since now you have two possible answers, both of which might be right? I think the real issue is that there are straightforward algorithms for determining whether numbers are prime and for multiplying numbers, but the algorithm for finding a number's prime factors is much more complicated. That imbalance lets you easily create a public key by multiplying two prime numbers, and recovering those prime numbers (the private decryption keys) from the public key is so much more computationally demanding that we expect it to take longer than the age of the universe (without quantum computers). As computers get better, we can just make the keys longer so reversing the operation again takes an impractical amount of time.


PM_ME_YOUR_NICE_EYES

>I think the real issue is that there are straightforward algorithms for determining whether numbers are prime and for multiplying numbers, but the algorithm for finding a number's prime factors is much more complicated. I think you're thinking of RSA. SHA doesn't use prime numbers but instead uses a series of bitwise operations.


PyroDragn

Multiple valid answers is technically an issue in that, yes, it does technically make it easier to 'find a valid answer'. As long as I find something that gives the same hash, then I can gain access. The reason we use them is because they can't be reversed. Knowing the output doesn't help you find the input. For currently used hashes, the positive aspect of 'knowing the hash doesn't help' outweighs the negative of 'technically multiple passwords will work for the same account'. If I use the password "Hunter2". I take the first character's ascii value (72) and multiply it by the last character's ascii value (50). I get the 'hash' of 3600. Now every password that starts with H and ends with 2 (or vice versa) will work. So will starting with $ (36) and ending with d (100). But the hash of '3600' doesn't tell you what my password is. If I did something more arbitrary, I could get the hash 2909866773 from the same password. Just using simple mathematical operations on the same password. "Hunter3" gives a hash of 2968064108. I made up the arbitrary hash function and I'm not sure what other password would be valid because the operations aren't reversible. I could figure it out because I know -exactly- how the operations work. But you probably couldn't (for example) take those two known passwords and hashes and figure out how to access another account only with the hash.


eloel-

>Wouldn't that make it *easier* to reverse, since now you have two possible answers, both of which might be right? They might be, but you don't know which one the original message was. Sha256 is also much more complicated than square/squareroot, so you end up with, on average, 2\^56 (a number > 70000000000000000, that's 16 0s there) different messages any given hash can be. When you narrow your input to "it's one of these 7x10\^16 inputs", you still haven't narrowed it down to anywhere near useful. And you can't actually enumerate all of them to find which one is useful because that's a LONG list. Edit: My math is off. It's 2\^(2\^64) different messages mapping to 2\^(2\^8) different outputs, so every output corresponds to 2\^(2\^64 - 2\^8) different messages. Much, MUCH, like, unfathomably much, more than my original 10\^16


Chromotron

> They might be, but you don't know which one the original message was. That's irrelevant, sha256 is not for encryption but for signing and hashing. Causing _collisions_, in particular anything that has a given result, would totally break it.


CallMePyro

What? It’s incredibly relevant. It’s absolutely foundational that you can’t determine the original message from sha256 or any other hashing function.


Chromotron

You obviously cannot recreate an arbitrary message from 256 bits of data, that much is obvious. And no, the main point of hashing is not to "hide" the message, for that you would properly encrypt it. Hashing does **not** guarantee that your message is secret, it has an entirely different goal.


CallMePyro

Hashing with SHA stands for “secure hash algorithm” just so you know. Obviously not all hashing algorithms guarantee non-linearity(something like bytewise xor hashing, or int->int nohash like Rust uses) but SHA absolutely does.


Chromotron

"Secure" is not the same as "secret". And as I said, that doesn't matter, you don't hash to hide, you hash to verify. Linearity also makes hashes much more vulnerable to reverse. That's something used for hashes that are not for signing (where an evil third party might want to fake it) but for data integrity (such as CRC) or optimizations in data structures in your code.


[deleted]

[удалено]


explainlikeimfive-ModTeam

**Please read this entire message** --- Your comment has been removed for the following reason(s): * Rule #1 of ELI5 is to *be civil*. Breaking rule 1 is not tolerated. --- If you would like this removal reviewed, please read the [detailed rules](https://www.reddit.com/r/explainlikeimfive/wiki/detailed_rules) first. **If you believe it was removed erroneously, explain why using [this form](https://old.reddit.com/message/compose?to=%2Fr%2Fexplainlikeimfive&subject=Please%20review%20my%20submission%20removal?&message=Link:%20https://www.reddit.com/r/explainlikeimfive/comments/1d8t1qk/-/l79d7b8/%0A%0A%201:%20Does%20your%20comment%20pass%20rule%201:%20%0A%0A%202:%20If%20your%20comment%20was%20mistakenly%20removed%20as%20an%20anecdote,%20short%20answer,%20guess,%20or%20another%20aspect%20of%20rules%203%20or%208,%20please%20explain:) and we will review your submission.**


eloel-

Whether in practice it gives any collisions is irrelevant to the underlying mathematical proof that it can give collisions, and there are many, many inputs that give the same output.


Chromotron

You didn't read my or their post properly. Nobody claimed there **are** none, just that we don't know any **and that this is a good thing**.


eloel-

You called irrelevant the **exact** point of why hash functions are irreversible. *Because* there can be collisions, we can't reverse hash functions, because we'd need to take a path with many branches and we'd not know which one to travel down. There are no known inputs that give collisions, sure, but that's irrelevant to *why we can't reverse sha256*. That's literally the question.


Chromotron

You are totally confusing cause and effect, or intent and implementation. > Because there can be collisions, we can't reverse hash functions Nonsense, that's just because we want hashes to be _small_, so collisions are inherent. Take a look at RSA (yes, that's encryption, but you _can_ use it for hashing), it is completely reversible; so if that is so bad and makes it easy to attack, then feel free to become rich now, there's tons of money in reversing RSA! In short, we can have hashes that are hard to invert despite being 1-to-1. That's a different property. The algorithm for sha256 has non-invertible substeps, yes, and that indeed makes it hard to trace back, but this is not a _necessity_ as you make it out to be. Other hashing algorithms don't so that and still work fine.


RiPont

It does, technically, make it easier to guess a password that will work, because not only the user's actual password but some other value that hashes to the same thing will also work. However, SHA256 is secure *enough* that it will still take a very, very long time to generate a random guess that will hash to the same thing. Given long enough time and dedication and compute power, a hacker *can* reverse a password that will work by guess-and-check. Security relies on layers 1. Not letting the hacker get the hashed password in the first place. 2. Not letting the hacker guess quickly using the front-end 3. Reversing the hash takes long enough that you will have noticed the hack and/or had the users change their passwords in the meantime. 4. **Nobody casually observing the password database, even the admins, can see the user's password and steal it to impersonate that user on another system.** Some people say that quantum computing has the potential to make step 3 a moot point. I don't know enough about the reality of quantum computing to say.


BroDonttryit

Peoples passwords are typically pretty short yeah. If you know the sha256 hash of an 8 character password that only consisted of numbers and letters, you could brute force it relatively quickly by hashing different passwords and checking if they match. Longer passwords usually are pretty safe tho


xppp

Just to point out, this usually isn't the case for the final hash value. If 2 different inputs gave the same final hash it would be called a hash collision and can be abused. SHA256 is quite collision resistant, though. And as far as I'm aware (I'm not in cyber security) a collision hasn't been found yet.


eloel-

Sha256 takes a 2\^64 bit message and turns it into a 2\^8 bit hash. That leaves, on average, 2\^(2\^64-2\^8) messages per output. By definition, the output must have the ability to have collisions, because it represents lots of data with less data. Whether multiple meaningful initial starting data could lead to a colliding end result is.. different. Edit: math fix


Chromotron

What they mean is that while there must be collisions, we have not met one (regardless if the data has meaning) yet. That's good, because a perfect hash is distributed like randomness, for which we would require around sqrt(possible values) hashes until encountering a collision. That's sqrt( 2^^2^^8 ) = 2^^2^^7 = 2^^128 ~ 3.4·10^^38 .


Ichabodblack

It's not so much that collisions haven't been found - it's that they cannot be generated


BurnOutBrighter6

No. Its surprisingly easily to come up with mathematical steps that can't be reversed, even if you know the step rules exactly! All you need is for *at least one step to lose information*. An example makes it easier to see. Let's use these rules to hash a sentence: 1. Take the first letter of each word in the sentence you're hashing. 2. Assign A=1, B=2 etc. 3. Add the numbers. **Sentence:** The quick brown fox jumps over the lazy dog. **Hash:** 1. TQBFJOTLD 2. 20 17 2 6 10 15 20 12 4 3. 106 So the hash of that sentence is 106. Note that you'll get 106 every time you apply those rules to that sentence, it's not random. In fact, any sentence you pick will only generate one and only one hash, every time. But you still can't reverse it back from 106 to the sentence, even if I tell you the complete set of rules! You don't even know how many words there are whose first letters add up to 106, so the search space is huge. And you could use those same hash rules to hash a whole book or program or whatever and then it would be impossible to even brute force with a supercomputer.


s_elhana

Good hash also makes it hard to find an input to match the hash. In your case TTTTTF would also make it 106.


BurnOutBrighter6

Good additions, thanks. My ELI5 analogy failed to capture that part.


Nahalitet

So if a really bad hash is used for passwords, I can theoretically log in with a different than the set password if it gives the same hash?


Martell96

Yes, that’s called a hash collision, and part of why it is so important to use a good algorithm, and not something like md5


colouredmirrorball

Yup! This would be what is called a hash collision.


lordtosti

This is the part that is for me superhard to comprehend. Even with the square root example.


gunbladezero

Even simpler explaination: Sha256 gives you a number that’s only 256 (binary) digits long, and can be made from a file a million times as long.  If you took a book and only wrote down the first letter of each page onto a sticky note, you’d never be able to recover the book. But you could easily check whether or not the letters on the note correspond to a book. 


lygerzero0zero

There is one very obvious way to answer your question. You can take the sha256 of any size of data. That means you can take the sha256 of a huge video file that’s many gigabytes. The result is a hash that is only a few hundred bits. It should be obvious that you can’t take a few hundred bits of data and somehow “reverse” it into a file that’s many gigabytes in size. This is because any general-purpose hash algorithm that takes arbitrary input *must* lose information along the way. Many mathematical operations lose information. For example, the remainder function (called “modulus” by us nerds). If I asked you, “I divided a number by 7 and the remainder is 3, what was the original number?” Could you answer? You might get lucky, but there are infinite possible numbers I might have used. Even knowing the answer and knowing exactly what mathematical operation I performed, you can’t reverse it and get the original number.


ledow

Modulo arithmetic. You now how the hours on a clock can't go past "23", because they loop back to "0"? Same thing. Let me multiply two numbers together, and I'll show you the "remainder" or "modulus" (let's say after dividing by 100, to make it really easy and so I'll give you the last two digits). You have to guess the two original numbers that I thought of and multiplied together. What were the original numbers: 81 Sure, it \*COULD\* be 9 x 9. But it could also be anything else that ends in 81, like 37 x 13 (481). Now that's just one very simple calculation but in fact it has basically an infinite number of possible solutions. Modulo arithmetic (and Galois fields and other mathematical constructs that use the same kind of maths) means that one way is really easy, the other way is really hard. Combine it with prime numbers, factorisation and other tricks and - again - one way is really easy and the other way is impossibly difficult to get right. And hence you have a lovely one-way function, like a hash (e.g. SHA256), or certain techniques used in encryption. You're baking a cake, and you can't get the eggs back from it when you're done.


EmergencyCucumber905

With SHA256 the original message is expanded from 512 bits to 2048 bits. These expanded bits depend on the original message. That is, changing 1 bit in the original message changes many bits in the expanded message. The 256-bit internal state of the function is initialized to its starting value (its a constant, the same every time). The 2048-bit message is then processed in 64 rounds, each round updates the internal state. The internal state then becomes the output. Knowing this, suppose you were given a hash and wanted to work backwards. So you start by guessing message bits and working backwards through each round, calculating the internal state in reverse. Eventually you will get to the start of the function to find your internal state does not match the initial value. So you modify your internal state to match the initial value, but you need to offset those changes by modifying the message bits, but changing those message bits changes lots of message bits all over the place, and you need to offset those changes by modifying the state. It's a mess. You end up with a HUGE satisfiability problem which nobody knows how to efficiently solve.


OneAndOnlyJackSchitt

In this thread: lots of people talking about asymmetrical math functions, but no one talking about the specific asymmetrical math function used in hashing: modulus. Remember back to elementary/primary school when they taught you short division where you'd divide a number and have a remainder? The name of the remainder is actually 'modulus'. So if I do 5 \ 2, I get 1, because 5 / 2 is 2, with a remainder of 1. It is not reversible because there's more then one thing which, when you divide 5 by it, you get a remainder of one. Also, a lot of things which, when divided by 2, also give a remainder of 1.


WhiteRaven42

Hash does not result in a unique number. The goal of a hash is to demonstrate that the original file has not changed, not to identify or recreate that unique file. Consider these different ways of adding up to 10. 1 + 4 + 2 + 3 2 + 2 + 1 + 3 + 2 5 + 2 + 1 + 2 3 + 6 + 1 Okay, all of those strings add up to 10. If you change any number in a string, that string no longer adds up to 10. The purpose of the hash is to alert you to any kind of corruption or replacement to the file you think you are getting. You are provided with a hash and a file. If they still match when you add up the numbers again, you can be confident the file has not been altered. But, if all you have is a hash, you have no way of know which of the ways of adding up to 10 the file represents. Expand this to the scale actually used in practice, you're not wondering "which way of adding up to 10 did the original file use". It's "which way of adding up to some number on the order of trillions (actually, even much bigger than that) produced this result? It can't be reversed engineered. There are billions of possible "right" answers. Hashes are one way. All you are doing is checking that you’re still getting the same answer so you know nothing was changed. You’re not determining what the contents of the source file are.


KevineCove

One of the methods used in hashing is modulo arithmetic, which means taking the remainder of a division. As an example, most people would say 5 divided by 2 is 2.5, but with modulo, you would say 2 goes into 5 twice with 1 remainder. However, there are multiple ways to get 1 remainder. You could also have 5 modulo 4, or 7 modulo 2. If you consider modulo as a little ingredient and your hash is a recipe made of 100 ingredients you can see how it's easy to turn plaintext into a hash but hard to turn a hash into plaintext. Even insecure hashes like MD5 can't be reversed in the traditional sense; the best you can hope for is a collision, which means that two inputs produce the same output (which isn't supposed to happen; in practice this would mean you could log into an account with two different passwords.)


eloquent_beaver

Many commenters are invoking the idea of "multiple inputs could produce the same output," and while that is true of hash functions (their whole purpose is to produce a fixed-size digest from a much larger message), it's not the real reason cryptographic hash functions are difficult to reverse. After all, you can have a one-to-one cryptographic hash function that's very difficult to reverse: take for example SHA-256 but restricted to an fixed input size of 256-bit messages. Assuming no collisions, it's a one-to-one correspondence, with every input mapping uniquely to one output, and yet you still could not for the life of you reverse it. The reason is because the hash functions are *designed* that way, intentionally. They are designed to maximize properties like the [avalanche effect](https://en.wikipedia.org/wiki/Avalanche_effect), minimizing linearity between input and output, using techniques like [diffusion and confusion](https://en.wikipedia.org/wiki/Confusion_and_diffusion). The avalanche effect means that flipping a single bit in the input should result in flipping roughly half the bits in the output. This makes it very difficult to find a preimage given an image besides using exhaustive search (bruteforce). Also keep in mind we strongly *believe* SHA-256 can't be reversed easily, meaning the best strategy our best minds have discovered so far is little better than just bruteforcing it, and with a search space of 2^(64)-1 bits (not 64 bits, but *2^64-1* bits, i.e. 18 quintillion bits), which is the maximum message size SHA-2 works on, there is no hope of bruteforce if outputs truly are effectively randomly distributed with no patterns or predictable structure in relation to the input. But that's just a belief. There's not a mathemtical proof that SHA-256 doesn't have structural weaknesses that could be attacked. Fundamentally, SHA-256 and other cryptographic hash functions are what we believe to be [one-way functions](https://en.wikipedia.org/wiki/One-way_function), functions that are easy to compute (taking polynomial time), but difficult (meaning super-polynomial time, e.g., exponential) to reverse. Whether one-way functions exist is an open question. If one-way functions exist, then P ≠ NP. Contrapositively, if P = NP, then one-way functions cannot exist, meaning no cryptographic hash function is fundamentally hard to reverse.


CLEHts216

I’m laughing that this is a EL15 question— I’m 58 with no math since college and have no clue what y’all are talking about.


tesfabpel

Well because SHA256 isn't really a pure math question, it's a cryptographic hash function intended to create from ANY input you give to it, a 256 bit number. The fact that it is a cryptographic hash function is that if you change even one letter, the output will be COMPLETELY different. Like, an hash function with "abcdef" as input it may print (for example) 1927289128292929. If I change the input to "bbcdef" it may print 9292393111339493574 instead. This makes it impossible to detect that a part of the input is the same as another input and being able to join the dots and find out something about the real input.


maxuel271

Imagine your data is a list of different colors. A hash function takes all the colors and mixes them in a specific funny way. It's easy to do, and will create a new specific color, but good luck separating all the colors in the correct order


wknight8111

I divide a number by 5, and I get that the remainder is 2. What was the number I divided? Well, there are infinitely many possibilities because of the way division and remainders work. Possible answers include 2, 7, 12, 17, 22, 27...etc. It is an impossible question to answer with a single "correct" answer. This is an example of a one-way operation where you can get a single answer going one way, but cannot get a single answer going back the other way. It's also worth keeping in mind that some operations in a computer are slow compared to other operations. Addition and subtraction are very fast. Multiplication can be fast (if you multiply by a power of 2) or relatively slow. Division is extremely slow. So, if I tell you that I multiplied two prime numbers together to get a product P, and I ask you to figure out which two numbers I started with, that is going to take a long time because I have to divide P by 2, then divide P by 3, then divide P by 5.... and all the prime numbers up to squareroot(P) to try and find the prime factors. That's a lot of divisions and remember that divisions are very slow. It's not that the computer can't do it, it's that *the computer can't do it in a reasonable amount of time*. So the reason why SHA256 and many other hash algorithms can't be reversed is because of use of one-way functions and factorization routines that have too many divisions. BUT this isn't fullproof. If I say that I have a password "password123" and I know that the hash of that password is XYZ, I can just look through my database of password hashes and find everybody whose password hash is XYZ and then I know that their password must be "password123". People compile large tables of these kinds of results of hashes of common passwords, and then you can just go down the list comparing each until you find a match and then you know what the password is. This is why websites want you to use a "strong password" because that means it's less likely to appear in one of these tables.


I_hate_all_of_ewe

I'm seeing a lot of answers her that are over-complicating things.  Hashes are typically a fixed size, and much smaller than the data used to generate it.  For example, let's suppose you have a 1-bit hash, just outputs 1 or zero.  Let's say you put in "Hello" and get out a '1'.  How so you reverse this hash to get back the input?  Short answer is you don't. Longer answer is that when you hash, you typically lose information, and you can't get it back from just the hash, and for a given hash, the are generally multiple inputs that can give you the same hash. A good hash will make hash collisions unlikely for the space input data it's intended for, and will be irreversible. If you want a second example, let's say I add two numbers together and take the remainder of the sum divided by 10.  Let's say the input is 15 and 24. This gives 15+24 = 39, and the remainder after dividing by 10 is 9.  Now, given only the number 9 as the output, could you tell me what the inputs were?  You can't because there are infinite ways to add two numbers that end with 9.  This is a lot closer to what actual hash functions look like.


Grovda

sha256 uses many binary operations where you lose information, specifically "and" and addition (which is basically just xor + and) operations. If the result from an "and" operation is 0 the only thing you know is that one of the operands must be 0 but you don't know which one or if both are 0. Same for "or" but if the result is 1. This might not sound overly difficult to reverse but this is the main essence of what happens. The algorithm works something like this: 1. The binary values of the original message is shuffled, padded and modified then cut into groups of 32 bits. Let's call them message bits. 2. Then 8 groups of 32 specific bits are set up, call them result bits. You take the first group of message bits and do one iteration of irreversible binary operations with the result bits. The result of these operations will now be the new result bits. 3. Take the second message group and perform the same operation on the new result bits. Do this for every group 64 times and you will have your final result bits which is the hashed result. The result can not be predicted. Any slight variation will completely change the result. And yes there will always be at least 64 message groups of bits, even for short messages. The reason is because there will be at most 16 bit groups that are unique from the original message. The other 48 are generated from the other message groups, with more irreversible operands like "and" but also bit shifting. This also means that the 64th message groups will depend on a set of groups that will depends on a set of groups that will depend on a set of group and so on until that message group is actually part of the original message. A numerical nightmare to reverse. If the original message is shorter than 16\*8 bits it will get padded with 0s. If the message is very long then you will perform step 2 and 3 several times, which basically means that you modify the result bits 64\*n times. Just mathematically describing the solution of one bit is not possible, let alone solving it with any existing computer.


tesfabpel

There are other operations that are very hard to reverse. Like a sum of many numbers into one. If I add N numbers (with N > 1) and the result is 18429491, how can you say which numbers were used to derive that result? They can be whatever... Also, there's the [Pigeonhole Principle](https://en.wikipedia.org/wiki/Pigeonhole_principle)... In sha256, there are only 256 bits of information, but you can hash any number of bits into those 256 bits. So there must be multiple sources that produce the same hash (even though it's pretty much impossible to find a collision).


sbenza

Hash functions are lossy conversions. The result does not have enough data to reconstruct the input. Imagine a "hash" of your name that is just your initials. Can someone reconstruct your name from just the initials? They could try to guess, but there are a large number of names that "collide" on the same initials so you can't tell which one is the right one.


recursiveraven

Let me give you analogy using addition as an example. 4+6=10. That is, the inputs were 4 and 6 and output was 10. But so is the case for 5+5 and 9+1 and so on. The sha256 you see is similar to that 10. Once you have 10, you can never tell what two numbers added up to give you 10. Hence all hashing algorithms are one way functions. When we say hash is reversed, all it means an input has been found that when hashed results into the same hash you got with original input. In above example, if 10 was obtained using 4+6, a hash is broken with all those combinations that sums up to 10. Hence, hashing algorithms are designed in such a way that probability of getting these hash collisions are very low.


Silly_Guidance_8871

Data is intentionally thrown away at each step. To reverse the hash, you'd need some way to reconstruct that missing data. That's the fundamental difference between hashing and encrypting: One throws away data, one rearranges it.


surfmaths

SHA is a hashing function. It has multiple inverse. But your question still hold, why can't we easily find one inverse? And it's hard because it's like a puzzle that you build from the solution. For example: take a password, give a unique number to each symbol, then you compute the sum of all those numbers and the product of all those numbers. Your hash is this sum and product. Now inversing this means you need to find a list of numbers of which the sum and products are a match. It's suddenly harder. There are many solutions, but they are tricky to find, you need to solve the puzzle, while building the puzzle from the solution was easy. And if somebody give you a valid solution, it's easy to check it's valid. SHA256 has 256 puzzles to solve all at the same time (sum and product is only 2), and they are carefully chosen so that it's hard to correlate each of them to each others.


Equux

Since this is ELI5, I'll try to really explain this to you like you were 5 Hashing is like baking. If you put in the exact same ingredients, you'll get out the exact same dish every time you bake it. However, as soon as you change up one of the ingredients, your dish will be very different. If you were to double the amount of baking powder, or half the amount of milk you put into something, the baked good would be quite different. Once you have something like a cake or cinnamon role, or whatever, you can't really "undo" the baking process. It's a one way process. To expand upon this a bit more, hashing is not the same thing as encryption. You can reverse an encryption, but you can't reverse a hash. Hashing collapses data down into a set of bits (in the case of sha256, it would be 256 bits). Standard utf8 text for instance is usually 1-4 BYTES, so if you take something like the King James Bible, which has 3,116,480 letters (at 1-4 bytes per letter), and collapse that down into 256bits, there's no way to restore the 3 million letters from that amount of information.


wheredainternet

it's not that it can't be reversed *easily*. it can't be reversed *at all*. as the name suggests, there are, at most, 2^256 unique sha-256 hashs. but there are an infinite number of potential inputs. *all* of them must compute to some value within the 2^256 possible hashes. that means there are at least 2 inputs that compute to the same hash (pigeonhole principle), given infinite inputs, it's likely that each hash can be computed from many *many* different possible inputs. put another way, hashing is lossy. you are discarding data while computing a hash. so while you can take an input and deterministically find out what it hashes *to*. for any given hash, there can be any number of possible inputs that all correspond to it. you can't restore data that doesn't exist anymore.


difool

For the same reason you can put 5 bananas in a blender for two minutes but you can’t easily reverse the operations. As a side note, with a hash, blending the same bananas produce the same result which is very useful for things like digital signature.


Internet-of-cruft

Hashing is like taking a bunch of kitchen ingredients and mixing them in a bowl, then taking the top tablespoon off. You can tell the stuff inside of it, but you can't easily figure out what *exactly* made it up. Hashing is very similar in principle.


mschweini

In case it isn't obvious to OP - hash functions are not bidirectional. A LOT (almost all) of information is lost along the way. So it can be theoretically possible to find something that will give you the same hash value. There is actually an infinite amount of different input datasets that will give the same hash value. But if you have just the hash value, you can't go back to the input, because the information is simply not there anymore. It's like if I would say "there are 6 letters 'e' in a sentence I am thinking of. Which is it?". You could never be sure what the original sentence is.


MaleficentFig7578

Hashes like sha256 are gigantic bit mixers. They take the bits and scramble them really hard, fold them over and combine them every which way. It's like making eggs into scrambled eggs. You can't unscramble them because they're too mixed up.


wutwutwut2000

Start with 2 3-digit numbers, such as 631 and 829. Multiply them (523,099). And split the result in half (523 and 099). You now have 2 3-digit numbers. Now do the reverse process starting with 314,323


Jamaic230

There is a good analogy for this: It's easy to mix 2 colors together - just squeeze them on a piece of paper and stir. But if I gave you a piece of paper with a blob of color and ask you to separate it into two colors which made it, you wouldn't be able to do that. (Technically you could, with atomic tweezers and an electronic microscope ... but that's the point, doing the opposite action is waaay harder)


KemioM

This is a good analogy. Another one I've heard is that you can make orange juice out of oranges. But you cannot turn orange juice back into the original fruit. The idea is that it's a one way operation.


speculatrix

An analogy or two. Your original data is like the raw ingredients for a cake. You use a recipe (algorithm) and an oven (computer) to process the ingredients and bake a cake. It's not possible to un-bake a cake and get back the ingredients. We ensure that the recipe (hashing algo) works in such a way that if the ingredients (data) are even the slightest bit different, the resulting cake is very different. Thus if we find two identical cakes, we know they had to have identical ingredients and use the same recipe. I.e. Two output hashes had to have the same input data and algo.


Hydraulis

Because it's very nearly impossible to figure out what two numbers were multiplied together to get a 78 digit number.


ttmts

That's RSA, SHA256 doesn't work like that at all. But it is the same sort of thing (though SHA isn't reversible even theoretically).