NTIC Introduction to Factorization

Section 12.5 Introduction to Factorization

Let's take a last crack at issues directly related to cryptography. (That doesn't mean that other stuff we do in this text is unrelated – oh no! Especially the geometry is connected. But we will not make direct connections.)

🔗

We will focus on the main attack on the RSA algorithm, namely finding nontrivial factorizations, or factorization.

🔗

Subsection 12.5.1 Factorization and the RSA

🔗

Let's look at another toy RSA problem to get a sense of what is going on. First, I choose a modulus

$n=899\text{.}$ I will also use Sage to verify it has two prime factors, without telling you what they are.

xxxxxxxxxx
 
n=899
print("There are %s prime factors and their powers are %s and %s."%(len(n.factor()), n.factor()[0][1], n.factor()[1][1]))

🔗

Then I choose an exponent to raise my secret message by …

xxxxxxxxxx
 
e=13
print("We choose n=%s and exponent e=%s, and verify that gcd(e,phi(n))=1: %s"%(n,e,1==gcd(e,euler_phi(n))))

🔗

I haven't told you

$\phi(n)\text{,}$ but this guarantees it is coprime to my (public) encryption key, which I have chosen to be

$e=13\text{.}$ Now we can encode our message,

$x=11\text{.}$

xxxxxxxxxx
 
x=11
message=mod(x,n)^e
message

🔗

Now, how could we hope to crack this sinister message? (Assume that Sage doesn't have enough power to compute euler_phi(899) directly.) Well, we do know

$n=899$ and that

$e=13\text{.}$ That could help. Remember, if we knew

$p$ and

$q\text{,}$ we could easily calculate

$\phi(n)$ without even using Sage, which should be enough.

🔗

Question 12.5.1.

Can you quickly now factor $n=899$ without using Sage?

Solution

Hint: be smart about it. Think strategically; how should I have chosen a public modulus $n$ to make this hard to do? How should $p$ and $q$ relate?

🔗

Hopefully you figured out

$p$ and

$q\text{.}$ Then we just need to find an inverse modulo

$\phi(n)=(p-1)(q-1)$ to get our decryption key.

🔗

Sage note 12.5.2. Trying your primes yourself.

You can fill in the values you got for p and q here to make things work. Try it!

xxxxxxxxxx
 
p=
q=
f=inverse_mod(e,(p-1)*(q-1))
f

When we decrypt, we should get the original message $x=11$ again.

xxxxxxxxxx
 
message^f

🔗

This simple example makes it clear why factorization, not just looking for primes, might be important. To be truthful, many researchers in factorization simply do it to stay one step ahead of the other side, who is presumably also researching factorization – so to some extent it is an arms race.

🔗

But factorization is also inherently interesting mathematically! Here is an interesting statement, as an example.

🔗

Fact 12.5.3.

If I know $\phi(n)$ and $n\text{,}$ and know that $n$ is a product of exactly two distinct primes, I can easily compute them both.

🔗

Proof.

Of course, if we know $\phi(n)\text{,}$ we already can crack the code, but who cares; maybe we are given $\phi(n)$ and $n$ and want the factorization. Here is the short proof.

Suppose the (as yet unknown) primes are $p$ and $q\text{.}$ Then expand our formula to

\begin{equation*} \phi(n)=(p-1)(q-1)=pq-p-q+1=n-(p+q)+1 \end{equation*}

We now can represent both $p+q$ and $pq$ as formulas in $n$ and $\phi(n)\text{:}$

$p+q=n-\phi(n)+1$
$pq=n$

Where might we have a formula with $p+q$ and $pq\text{?}$ That should seem familiar …

\begin{equation*} (x-p)(x-q)=x^2-(p+q)x+pq \end{equation*}

So we can simply use the quadratic formula on this expression to get the values for $p$ and $q\text{!}$

\begin{equation*} p,q=\frac{(p+q)\pm\sqrt{(p+q)^2-4pq}}{2}=\frac{n-\phi(n)+1}{2}\pm\frac{\sqrt{(n-\phi(n)+1)^2-4n}}{2} \end{equation*}

🔗

Example 12.5.4.

Continuing the example above,

$\begin{equation*} x^2-(899-840+1)x+899=x^2-60x+899=0 \end{equation*}$

gives

$\begin{equation*} x=\frac{60\pm \sqrt{60^2-4(1)(899)}}{2(1)}=30\pm\frac{\sqrt{3600-3596}}{2}=30\pm 1=29,31\text{.} \end{equation*}$

🔗

Subsection 12.5.2 Trial division

🔗

The first, and oldest, method of factoring is one you already know, and maybe used a few minutes ago – trial factorization, or trial division. It is the method we used with the Sieve of Eratosthenes; you just try each prime number, one by one.

🔗

In Algorithm 6.2.2, do you remember what the highest number you would have to try is in order to factor a given

$n$ by trial division? (Can you prove it?)

🔗

The following algorithm does this very naively (and slowly, even for trial division). Let's try to talk through what each step does.

🔗

Sage note 12.5.5. Code for trial division.

This is one of the few places where it really is important to follow the code. That said, the details of the syntax are not as important as the algorithm – unless you want to harness the power of computers more effectively!

xxxxxxxxxx
 
def TrialDivFactor(n):  #  We define the function
    p = next_prime(1)     #  We start off by testing the next prime after 1
    top = ceil(math.sqrt(n))  # This was proved to be the biggest number we need
    while p < top:        #  As long as the prime is less than that bound, we keep going
        if mod(n,p)==0:   #  In this case, p divides n and we're done!
            break         #  This is Python's way of saying we are done searching
        p=next_prime(p)   #  Otherwise, we try the next prime until we're done looking
    if n==1:              #  We probably could have checked for this right away
        print("1 is not prime")   #  Well, 1 is not a prime!
    elif p==n:            #  If we get all the way through and end with a prime...
        print(n,"is prime")  #   Then our number was prime
    elif mod(n,p)==0:     #  But otherwise... (!)
        print(n,"factors as",p,"times",n/p)  #  We have a factorization!
    else:                 #  And finally...
        print(n,"is prime")      #  We must have gotten lucky.

🔗

Algorithm 12.5.6. Trial Factorization.

To factor $n\text{,}$ first enumerate the primes in ascending order $p_1,p_2,\cdots p_k\text{,}$ where $p_k$ is the largest prime less than or equal to $\sqrt{n}\text{.}$ For each prime in order, check whether $p_i \mid n\text{.}$ If it does, terminate by returning $p_i$ and $n/p_i\text{;}$ otherwise $n$ must be prime.

🔗

Now let me verify it works on easy examples. Remember, we are just looking for factors at this point, not complete factorizations.

xxxxxxxxxx
 
for z in range(1,18):
    TrialDivFactor(z)

🔗

Okay, so this seems reasonable. But it's a little more problematic when you try to do large numbers, where large means “bigger than you can do by hand, but nowhere close to the size we looked at in general.” I'll actually time how long it takes.

xxxxxxxxxx
 
@interact
def _(n=6739815371):
    TrialDivFactor(n)
    timeit('TrialDivFactor(%s)'%n)

🔗

Sage actually implements this in a much faster way, primarily by using optimized integers and a special version of Python that allows turning it into muchfaster code in the C language (Cython). Notice that the command returns just a single factor – giving another slight speedup.

xxxxxxxxxx
 
@interact
def _(n=6739815371):
    print(n.trial_division())
    timeit('%s.trial_division()'%n)

🔗

That's roughly one thousand times faster for the initial example! Naturally, it's possible to speed up even more. But note that getting the full factorization slows us back down; after all, one has to check that the remaining factor is prime (or factor it, if it isn't).

xxxxxxxxxx
 
@interact
def _(n=6739815371):
    print(n.factor())
    timeit('%s.factor()'%n)

🔗

Even for this smaller number it takes some actual time – here is where one sees the difference between different implementations of the same algorithm.

xxxxxxxxxx
 
timeit('TrialDivFactor(997*991)')

xxxxxxxxxx
 
timeit('(997*991).trial_division()')

xxxxxxxxxx
 
timeit('(997*991).factor()')

🔗

Subsection 12.5.3 Starting in the middle

🔗

So much for trial division! But we have other tools at our disposal.

🔗

Some of you might have tried something other than straight trial factorization when attacking

$n=899$ from our earlier problem. Reason this way; since we know that someone is trying to protect a secret, they probably are not going to pick a number with primes like

$3$ and

$5$ in it. After all, that would be too easy to factor.

🔗

In fact, it stands to reason that the primes

$p$ and

$q$ should be relatively large compared to

$n$ – so why not start in the middle?

🔗

This was Fermat's idea for factoring larger numbers. However, he didn't just start with primes in the middle; for one thing, if your number is even somewhat big and you don't have a computer or huge list of primes, how would you know where to start? So Fermat became clever, as always, and used an algebraic identity to help himself along.

🔗

Fact 12.5.7.

Write $n=ab\text{,}$ with $a>b\text{,}$ and assume $n$ is odd. Then we can write $n$ as a difference of two square numbers.

🔗

Proof.

Namely, $n$ is the difference of the squares of $s=\frac{a+b}{2}$ and $t=\frac{a-b}{2}\text{:}$

\begin{equation*} s^2-t^2=\left(\frac{a+b}{2}\right)^2-\left(\frac{a-b}{2}\right)^2=\frac{a^2}{2}-\frac{a^2}{2}+\frac{b^2}{2}-\frac{b^2}{2}+\frac{2ab}{4}+\frac{2ab}{4}=ab=n\text{.} \end{equation*}

🔗

Remark 12.5.8.

Why is it fine to assume $n$ is odd in these circumstances?

🔗

This may seem like an obscure identity to us, but at the time (and even well into the last century) such identities were the bread and butter of algebra, before we had tools like computers to help us along.

🔗

So what Fermat did was try this identity backwards. Here is his strategy.

🔗

Algorithm 12.5.9. The Fermat factorization algorithm.

To find a factor for a number $n\text{,}$ begin by seeking a perfect square $s^2$ bigger than $n\text{,}$ but still as close as possible. Now, do the following until you succeed, increasing $s$ by one each time.

Check whether $s^2-n$ is itself a perfect square $t^2\text{.}$
That means we essentially turned

$\begin{equation*} s^2-t^2=n\text{ around into }s^2-n=t^2\text{.} \end{equation*}$

Once you succeed, then $s$ and $t$ are not the factors of $n\text{;}$ rather, they are

$\begin{equation*} a=s+t\text{ and }b=s-t\text{.} \end{equation*}$

🔗

Proof.

It should be clear why $a$ and $b$ are the factors. But how do we know this algorithm terminates?

Assuming you started with $s$ as instructed, eventually you will reach $s=(n+1)/2\text{,}$ which is much larger than $\sqrt{n}\text{.}$ But then $((n+1)/2)^2-n=\frac{n^2+2n+1-4n}{4}=((n-1)/2)^2\text{.}$ You should check that this gives us the trivial factorization $n=n\cdot 1\text{,}$ though! (See Exercise 12.7.11)

🔗

Here is an implementation – again, assuredly slow, but at least verbose in its explanation – of this strategy. We simply start with the next

$s$ above the square root of

$n\text{,}$ and just keep trying

$s^2-n$ again and again for bigger and bigger

$s\text{.}$

xxxxxxxxxx
 
def FermatFactor(n,verbose=False):
    if n%2==0:
        raise TypeError("Input must be odd!")
    s=ceil(math.sqrt(n))
    top=(n+1)/2
    while is_square(s^2-n)==0:
        if verbose:
            print(s,"squared minus",n,"is",s^2-n,"which is not a perfect square")
        s=s+1
    t=sqrt(s^2-n)
    print("Fermat found that",s,"squared minus",t,"squared equals",n)
    if s^2==n:
        print("So",n,"was already a perfect square,",s,"times",s)
    elif s<top:
        print("So",s+t,"times",s-t,"equals",(s-t)*(s+t),"which is",n)
    elif s==top:
        print("So Fermat did not find a factor, which means",n,"is prime!")

🔗

Example 12.5.10.

Before we move on, let's try to factor $143$ and $93$ using this algorithm. Remember, we start with $s^2-n\text{,}$ where $s$ is the next integer above $\sqrt{n}\text{,}$ and see if it is a perfect square; then we increase $s$ by one each time.

After you attempt this by hand, you can see what Sage does with them to check.

xxxxxxxxxx
 
FermatFactor(143,verbose=True)

Well, we struck gold on the first try here! That happens if your number is the product of two primes which are two apart. (Such primes are known as twin primes, and have some interesting stories. Among other things, calculating with them helped find a bug in the Pentium computer chip in 1995; see Subsection 22.3.2.)

xxxxxxxxxx
 
FermatFactor(93,verbose=True)

As you can see, we probably would have been better off with trial division for $n=93\text{.}$ It's obvious that it's divisible by 3, but that takes a long time to reach from the middle.