Cracking PHP's lcg_value()


Cryptographers say it often: It is not sufficient for a pseudo random number generator (PRNG) to have good statistical properties when used for cryptographic applications: it also needs to be unpredictable. This little tutorial illustrates exactly why PHP's lcg_value( ) fails this requirement. In fact, it requires little cryptographic expertise to learn how to break it.

Often when people want to get introduced to cryptography, they spend a lot of time learning to break basic ciphers before they get their hands on anything that is really used today. For example, they'll learn to break Caesar ciphers, substitution ciphers, Vigenere ciphers, transposition ciphers, and many more -- but these are practically never used. In contrast, PHP's lcg_value( ) is sometimes used for security by web developers who just don't know better. Hopefully we can open some eyes to why this should not be done.

As far as we know, this attack has not been published before. However, Samy Kamkar published attacks based upon poor seeding. Our attack is unique in that it does not take advantage of bad seeding/ low entropy: i.e. even if PHP did everything else properly, we would still be able to break it fairly easily.

The Problem

Rather than working in PHP, we prefer to use C. Samy Kamar provides a C equivalent to PHP's lcg_value( ) function. Our goal is to write a function that takes a few of the floating point outputs of Samy's function, and computes the initial internal state of the PRNG that produced those outputs. Before going forward, the reader should have an excellent understanding of the problem and the function we are breaking. For that, we refer the reader to Norman Yue's writeup.

Notice that the function has an internal state consisting of variables s1 and s2. Each of these can be at most 2^31 (actually a bit less: 2147483563L or 2147483399L). So brute forcing it would take on the order of 2^62 operations. That's too long.

Sketching Out a Solution

Being a little bit clever, it is not too difficult to write a program that will take as input a few consecutive outputs from lcg_value( ) and return the internal state in 2^31 effort, which takes on the order of 5 minutes on a typical computer. That is not too long.

The strategy for attacking it actually is a generic approach that works for a lot of ciphers. Essentially, you start with the first lcg_value( ) output, make a guess for the internal s1 (2^31 possibilities for that guess), and from that compute the s2 that produces the known output. The trick is that computing the s2 can be done in constant time! This part takes a bit of thinking, but people will be able to figure it out if they think hard about how to "go backwards" through the PRNG. This gives you a candidate for the entire internal state (s1, s2). With this candidate, you compute the next few outputs to see if they match the real observed outputs. If they do, your candidate is the right one. If not, keep on trying the remaining guesses for the s1 internal variable.

Turning it into a C function is just a matter of writing a for loop that tries all 2^31 possible guesses for the initial s1. The inner part of the loop computes the s2 (using s1 and the known output). It is known that this (s1, s2) combination produces the first output of the PRNG, so then the software needs to compute the next few outputs with this candidate to see if it matches the remaining real observed outputs. If they do match, then you almost certainly have found the real internal state. Otherwise, continue searching. There are lots of little optimisations you can do to make your code fast.

Devilish Details

Conceptually, it is that simple. However there are a few tiny devilish details that you need to deal with.

The first is that we are dealing with floating point numbers, so you can't do exact comparisons. You have to allow for imprecision, so assume that values match if they are "close enough". This is not going to produce any real slow down in your implementation. Also, perform rounding when converting floats to integers.

The second gotcha is that the part about stepping backwards to compute s2 will produce two possibilities. This happens because of the "if (z < 1)" condition. How to know which is the right s2? Answer: take the positive one! (The PRNG does not have negative numbers in it).

Anyone who wants to really learn a little bit about crypto and have fun doing it should take a crack at this. Plan to spend at least a half a day on it, possibly a full day. It is best not to look up the answer or else you are just ruining it for yourself. But if you just can't get it, or if you do get it but want to compare your solution to mine, then here is mine. Have fun!

As stated above, lcg_value() fails the unpredictability requirement which is needed for cryptographic use. With only 2^31 effort, one can compute the internal state and then predict all future outputs. This is despite the fact that the statistical properties of the PRNG are fine.

What if You Don't Get the Raw Output?

In practice, an attacker might not get the full floating point values from the lcg_value() function. Instead, it may be the case that only some number of the most significant bits are available. This will prevent the attack as described from working, but it is possible to extend the attack to cover these cases as well. We will not get into the details of how that works here, but do present a solution for those who want to look into it more. This solution is a lot trickier than the basic attack above.

As an example of how a real attack might work with this function, imagine that somebody has built a web application that sends a code to a user's email whenever they request a password reset. The user must enter the code in the web application in order to reset his password.

Suppose also that that code comes from calling lcg_value() and mapping the result to a 32-bit integer. Note that this mapping reduces the precision from a 53-bits down to 32-bits.

An attacker can request a few password resets from a legitimate account, hopefully getting a few consecutive outputs. He can then map those values back to floats, and use the second solution provided to find the internal seed of lcg_value(). With this, he can then compute future outputs, allowing him to do password resets on other users and then take over their accounts.

Acknowledgments: This is joint work with Blair Strang. Additionally, Norman Yue at Securus Global did an independent implementation to verify the attack.