Subsection12.6.1The Pollard Rho algorithm
Here is the essence of this approach.
- Pick some polynomial that will be easy to compute mod \((n)\).
- Plug in an essentially random seed value. Often the seed is \(2\) or \(3\).
- Compute the polynomial's value at the seed, and check if that has a non-one gcd with \(n\).
- If not, plug the value back into the polynomial, and repeat.
There is a modification I will discuss below that this algorithm uses.
The essence of the method is that by plugging in the values of the polynomial modulo \(n\), we are generating a 'pseudo-random' sequence of numbers. And a “pseudo-random” sequence might be better than the sequences we used for trial division or Fermat factorization, precisely because it will hit some small factors and some other large random factors. It might also be good that it could give us numbers which, although not a factor of \(n\), might at least share a factor with \(n\).
Example12.6.1
Here are some details. Let \(x_0=2\) and \(f(x)=x^2+1\) (these could be different, but they are a typical first choice). We create the following sequence of \(x_i\):
- \(x_0\equiv 2\)
- \(x_1\equiv f(x_0)\)
- \(x_2\equiv f(x_1)\)
- etc. - all mod (\(n\)).
For instance, doing it with \(n=8051\), we get
Now, these might be kind of random, but we will not actually try to find common divisors of these numbers with \(n\).
Instead, we will try to see if all the differences \(x_i-x_j\) share a common factor with \(n\), using the (highly efficient to compute) gcd. That is a lot more opportunities! And hopefully it's just as (or more) “random”, and just as effective at finding factors.
However, that is too many to check quickly. So instead one modifies this one last time. Warning; the following is a little dense, though definitely worth while.
- Remember, polynomials are well-defined in modular arithmetic, and the sequence will eventually repeat modulo any particular modulus \(d\), since there are finitely many possible \(x_i\).
- In particular, if \(d\) is a divisor of \(n\), then if \(gcd(x_i-x_j,n)=d\) is a shared divisor found by the pair \(x_i\) and \(x_j\), then not only will it be the case that \[x_i\equiv x_j\text{ mod }(d)\] but it will also be the case for any number of iterations of \(f\) that \[f^n(x_i)\equiv f^n(x_j)\text{ mod }(d)\] which means the sequence (modulo \(d\), the common divisor) repeats itself every \(j-i\) terms.
- So if we choose \(k=j-i\) (or the first multiple of this which is greater than \(i\)), then \[x_k\equiv x_{2k}\text{ mod }(d)\] will have to be true as well, so all the \(x_i,x_j\) pairs which come from the first one to share a divisor can be checked by checking just this one \(gcd(x_{2k}-x_k,n)\).
- So for each \(k\) we check whether \[1<gcd(x_{2k}-x_k,n)=d<n\, .\]
It turns out that (probabilistically, just like with Miller-Rabin) it should succeed for \(k\) in the neighborhood of the size of the square root of the smallest factor of \(n\). So if \(n\) has a big, but not too big, divisor, this test should help us find that divisor.
Example12.6.2
In the example above, the numbers we would get for the first three rounds are
- \(x_2-x_1=26-5=21\)
- \(x_4-x_2=7474-26=7448\)
- \(x_6-x_3=871-677=194\)
These factor as
and have the following gcds with 8051:
So this would catch the four factors \(3,7,19\), and \(97\), which is not bad, and indeed the final one caught a common divisor with \(8051\).
This method is usually called the Pollard rho method, because it is due to John Pollard and the fact that the \(x_i\) eventually repeat mod (\(d\)) (in this case, \(d=97\)) means you can draw a tail and then a loop, which to a very imaginative eye might look like a Greek \(\rho\).
Notice that sometimes the rho method doesn't come up with an answer quickly, or at all (it took 50 rounds without success here). I could up this to a lot more. So what do you do then – bring out the big guns? Not at all – just as with primality testing, you just change your starting point to try again!
In general, there are other such probabilistic algorithms, and they are quite successful with factoring numbers which might have reasonably sized but not gigantic factors. According to Wikipedia,
The rho algorithm's most remarkable success has been the factorization of the eighth Fermat number by Pollard and Brent. They used Brent's variant of the algorithm, which found a previously unknown prime factor. The complete factorization of F8 took, in total, 2 hours on a UNIVAC 1100/42.
Well, this was in the 70s. But still, it's not bad! Things don't automatically work quickly even now.
Hmm, what now? Let's change the seed.
Luckily, we have bigger guns at our disposal in Sage (especially in the component program Pari), that polish thing off rather more quickly.
A little better than two hours on a mainframe, or even on this computer, I hope you'll agree.