Section12.6A Taste of Modernity

Now, these methods are the beginnings of how people really factor big numbers. Typically, one does trial division up to a certain size (maybe the first few hundred or thousand primes), then perhaps some modification of Fermat to make sure that there aren't any factors close to the square root if you are attacking something like RSA where that would otherwise be advantageous.

Then what?

There are many answers, some of which involve cool things called continued fractions or number fields. We won't really touch on those topics, but the previous section did talk about something very useful last time - namely, probabilistic/random methods! That's right, we are going to try to find a factor randomly!

Subsection12.6.1The Pollard Rho algorithm

Here is the essence of this approach.

Pick some polynomial that will be easy to compute mod \((n)\).
Plug in an essentially random seed value. Often the seed is \(2\) or \(3\).
Compute the polynomial's value at the seed, and check if that has a non-one gcd with \(n\).
If not, plug the value back into the polynomial, and repeat.

There is a modification I will discuss below that this algorithm uses.

The essence of the method is that by plugging in the values of the polynomial modulo \(n\), we are generating a 'pseudo-random' sequence of numbers. And a “pseudo-random” sequence might be better than the sequences we used for trial division or Fermat factorization, precisely because it will hit some small factors and some other large random factors. It might also be good that it could give us numbers which, although not a factor of \(n\), might at least share a factor with \(n\).

Example12.6.1

Here are some details. Let \(x_0=2\) and \(f(x)=x^2+1\) (these could be different, but they are a typical first choice). We create the following sequence of \(x_i\):

\(x_0\equiv 2\)
\(x_1\equiv f(x_0)\)
\(x_2\equiv f(x_1)\)
etc. - all mod (\(n\)).

For instance, doing it with \(n=8051\), we get

Now, these might be kind of random, but we will not actually try to find common divisors of these numbers with \(n\).

Instead, we will try to see if all the differences \(x_i-x_j\) share a common factor with \(n\), using the (highly efficient to compute) gcd. That is a lot more opportunities! And hopefully it's just as (or more) “random”, and just as effective at finding factors.

However, that is too many to check quickly. So instead one modifies this one last time. Warning; the following is a little dense, though definitely worth while.

Remember, polynomials are well-defined in modular arithmetic, and the sequence will eventually repeat modulo any particular modulus \(d\), since there are finitely many possible \(x_i\).
In particular, if \(d\) is a divisor of \(n\), then if \(gcd(x_i-x_j,n)=d\) is a shared divisor found by the pair \(x_i\) and \(x_j\), then not only will it be the case that \[x_i\equiv x_j\text{ mod }(d)\] but it will also be the case for any number of iterations of \(f\) that \[f^n(x_i)\equiv f^n(x_j)\text{ mod }(d)\] which means the sequence (modulo \(d\), the common divisor) repeats itself every \(j-i\) terms.
So if we choose \(k=j-i\) (or the first multiple of this which is greater than \(i\)), then \[x_k\equiv x_{2k}\text{ mod }(d)\] will have to be true as well, so all the \(x_i,x_j\) pairs which come from the first one to share a divisor can be checked by checking just this one \(gcd(x_{2k}-x_k,n)\).
So for each \(k\) we check whether \[1<gcd(x_{2k}-x_k,n)=d<n\, .\]

It turns out that (probabilistically, just like with Miller-Rabin) it should succeed for \(k\) in the neighborhood of the size of the square root of the smallest factor of \(n\). So if \(n\) has a big, but not too big, divisor, this test should help us find that divisor.

Example12.6.2

In the example above, the numbers we would get for the first three rounds are

\(x_2-x_1=26-5=21\)
\(x_4-x_2=7474-26=7448\)
\(x_6-x_3=871-677=194\)

These factor as

and have the following gcds with 8051:

So this would catch the four factors \(3,7,19\), and \(97\), which is not bad, and indeed the final one caught a common divisor with \(8051\).

This method is usually called the Pollard rho method, because it is due to John Pollard and the fact that the \(x_i\) eventually repeat mod (\(d\)) (in this case, \(d=97\)) means you can draw a tail and then a loop, which to a very imaginative eye might look like a Greek \(\rho\).

Notice that sometimes the rho method doesn't come up with an answer quickly, or at all (it took 50 rounds without success here). I could up this to a lot more. So what do you do then – bring out the big guns? Not at all – just as with primality testing, you just change your starting point to try again!

In general, there are other such probabilistic algorithms, and they are quite successful with factoring numbers which might have reasonably sized but not gigantic factors. According to Wikipedia,

The rho algorithm's most remarkable success has been the factorization of the eighth Fermat number by Pollard and Brent. They used Brent's variant of the algorithm, which found a previously unknown prime factor. The complete factorization of F8 took, in total, 2 hours on a UNIVAC 1100/42.

Well, this was in the 70s. But still, it's not bad! Things don't automatically work quickly even now.

Hmm, what now? Let's change the seed.

Luckily, we have bigger guns at our disposal in Sage (especially in the component program Pari), that polish thing off rather more quickly.

A little better than two hours on a mainframe, or even on this computer, I hope you'll agree.

If you think this sort of thing is cool, the Cunningham Project is a place to explore. I particularly like their Most Wanted lists. The idea is that "The Cunningham Project seeks to factor the numbers \(b^n \pm 1\) for \(b=2, 3, 5, 6, 7, 10, 11, 12\), up to high powers \(n\). Another interesting resource is Sage developer Paul Zimmermann's Integer Factoring Records.