What is MD5 algorithm
MD5 Message Digest Algorithm (English: MD5 Message-Digest Algorithm), a widely used cryptographic hash function, can generate a 128-bit (16-byte) hash value (hash value), used to ensure Information transmission is complete and consistent.
MD5 function
Input information of any length, after processing, the output is 128-bit information (digital fingerprint);
different input results in different results (uniqueness);
MD5 is not an encryption algorithm
Those who think it does not belong because they feel that the original text cannot be obtained from the ciphertext (hash value) in turn, that is, there is no decryption algorithm, so these people think that MD5 can only belong to the algorithm and cannot be called an encryption algorithm
; Because they feel that the original text cannot be seen after MD5 processing, that is, the original text has been encrypted, so they think that MD5 is an encryption algorithm; I personally support the former, just as they think that the BASE64 algorithm can only be regarded as encoding.
Is the MD5 algorithm reversible?
The reason why MD5 is irreversible is that it is a hash function that uses the hash algorithm, and part of the information of the original text is lost during the calculation process.
However, one thing worth pointing out is that theoretically, an MD5 may indeed correspond to an infinite number of original texts, because MD5 has a finite number of original texts and there may be an infinite number of original texts. For example, the MD5 used by the mainstream maps a “byte string of any length into a large 128-bit integer. That is to say, there are 2^128 possibilities in total, which is about 3.4*10^38. This number is limited, but the world There are countless possibilities for the original text that can be used to encrypt.
However, one thing to note is that although this is a theoretical finite versus infinite, the problem is that this infinite is not fully established in real life, because on the one hand, the length of the original text in reality is often limited (take commonly used passwords as For example, most people are within 20), on the other hand, it is very difficult to find that two original texts correspond to the same MD5 value (professionals say this is called hash collision), so in a sense, within a certain range, if you want to construct The one-to-one correspondence between the MD5 value and the original text is entirely possible. Therefore, the most effective way to attack MD5 is the rainbow table. For details, you can learn about it through Google.
MD5 is equivalent to lossy compression.
MD5 purpose
1. To prevent tampering:
1) For example, when sending an electronic document, before sending it, I first get the output result a of MD5. Then, after the other party receives the electronic document, the other party also gets an MD5 output b. If a is the same as b, it means that it has not been tampered with halfway.
2) For example, if I provide file downloads, in order to prevent criminals from adding Trojan horses in the installation program, I can publish the MD5 output results obtained from the installation files on the website.
3) SVN also uses MD5 to detect whether the file has been modified after CheckOut.
2. Prevent direct viewing of plaintext:
Many websites now store the MD5 value of the user’s password when storing the user’s password in the database. In this way, even if criminals obtain the MD5 value of the user password of the database, they cannot know the user’s password. (For example, in the UNIX system, the user’s password is encrypted with MD5 (or other similar algorithms) and stored in the file system. When the user logs in, the system calculates the password entered by the user into an MD5 value, and then saves it Compare the MD5 value in the file system to determine whether the entered password is correct. Through such steps, the system can determine the legitimacy of the user’s login system without knowing the clear code of the user’s password. This can not only prevent users from The password is known by users with system administrator privileges, and it also increases the difficulty of password cracking to a certain extent.)
3. Non-repudiation (digital signature):
This requires a third-party certification authority. For example, A writes a file, and the certification body uses the MD5 algorithm to generate summary information for this file and make a record. If A says that this document was not written by him in the future, the authoritative organization only needs to regenerate the summary information of this document, and then compare it with the summary information recorded in the record. If it is the same, it will be proved that it was written by A. This is called a “digital signature”.
MD5 security
It is generally believed that MD5 is very safe, because the time for brute force cracking is unacceptable to ordinary people. In fact, if the user’s password MD5 is processed and then stored in the database, it is actually very insecure. Because the user’s password is relatively short, and many users’ passwords use birthdays, mobile phone numbers, ID card numbers, phone numbers and so on. Or use some commonly used auspicious numbers, or an English word. If I process the commonly used passwords with MD5 first, store the data, and then match your MD5 results, then I may get the plaintext. Therefore, the password strategy of most websites now is to force users to use a combination of numbers, uppercase and lowercase letters to improve the security of user passwords.
MD5 algorithm process
A brief description of the MD5 algorithm can be as follows: MD5 uses 512-bit groups to process input information, and each group is divided into 16 32-bit sub-groups. After a series of processing, the output of the algorithm consists of four 32-bit sub-groups. A 128-bit hash value will be generated after concatenating these four 32-bit groups.
The first step, filling: If the length (bit) of the input information is not equal to 448 for the remainder of 512, it needs to be filled so that the result of the remainder of 512 is equal to 448. The filling method is to fill a 1 and n 0s. After filling, the length of the information is N*512+448(bit);
The second step is to record the information length: use 64 bits to store the information length before filling. These 64 bits are added to the result of the first step, so that the information length becomes N*512+448+64=(N+1)*512 bits.
The third step is to load the standard magic number (four integers): the standard magic number (physical order) is (A=(01234567)16, B=(89ABCDEF)16, C=(FEDCBA98)16, D=( 76543210) 16). If it is defined in the program, it should be:
(A=0X67452301L, B=0XEFCDAB89L, C=0X98BADCFEL, D=0X10325476L). I’m a little dizzy, but I understand it after thinking about it.
The fourth step, four rounds of loop operation: the number of loops is the number of groups (N+1)
1) Subdivide each 512 bytes into 16 groups, each with 64 bits (8 bytes)
2) First recognize four linear functions (& is and, | is or, ~ is not, ^ is exclusive or)
1 | F(X,Y,Z)=(X&Y)|((~X)&Z) |
3) Let Mj represent the jth subgroup of the message (from 0 to 15), <<
1 | FF(a,b,c,d,Mj,s,ti) equals to a=b+((a+F(b,c,d)+Mj+ti)<<<s) |
4) Four-wheel operation
1 | // Round one |
5) After each cycle, add a, b, c, d to A, B, C, D respectively, and then enter the next cycle.
If the above process is implemented with JAVA code, the code is as follows:
1 | public class MD5 { |
JAVA implements MD5
It is very simple to implement MD5 in java. There is a class MessageDigest in the package java.security. The official document is as follows.
The MessageDigest class provides the application with the function of message digest algorithm, such as MD5 or SHA algorithm. A message digest is a secure one-way hash function that accepts data of any size and outputs a fixed-length hash value.
The MessageDigest object is initially initialized. This object manipulates the data by using the update method. The digest can be reset at any time by calling the reset method. Once all the data that needs to be updated has been updated, one of the digest methods should be called to complete the hash calculation.
The digest method can only be called once for a given amount of updated data. After digest is called, the MessageDigest object is reset to its initial state.
The JAVA code is as follows:
1 | import java.security.MessageDigest; |