LLL reduction

Introduction

In this section, we hope to bring some intuitive understanding to the LLL algorithm and how it works. The LLL algorithm is a lattice reduction algorithm, meaning it takes in a basis for some lattice and hopefully returns another basis for the same lattice with shorter basis vectors. Before introducing LLL reduction, we'll introduce 2 key algorithms that LLL is built from, Gram-Schmidt orthogonalization and Gaussian Reduction. We give a brief overview on why these are used to build LLL.

As the volume of a lattice is fixed, and is given by the determinant of the basis vectors, whenever our basis vectors gets shorter, they must, in some intuitive sense, become more orthogonal to each other in order for the determinant to remain the same. Hence, Gram-Schmidt orthogonalization is used as an approximation to the shortest basis vector. However, the vectors that we get are in general not in the lattice, hence we only use this as a rough idea of what the shortest vectors would be like.

Lagrange's algorithm can be thought as the GCD algorithm for 2 numbers generalized to lattices. This iteratively reduces the length of each vector by subtracting some amount of one from another until we can't do it anymore. Such an algorithm actually gives the shortest possible vectors in 2 dimensions! Unfortunately, this algorithm may not terminate for higher dimensions, even in 3 dimensions. Hence, it needs to be modified a bit to allow the algorithm to halt.

Gram-Schmidt Orthogonalization

Overview

Gram-Schmidt orthogonalization is an algorithm that takes in a basis $\left\{b_i\right\}_{i=1}^n$ as an input and returns a basis $\left\{b_i^*\right\}_{i=1}^n$ where all vectors are orthogonal, i.e. at right angles. This new basis is defined as

b_i^*=b_i-\sum_{j=1}^{i-1}\mu_{i,j}b_j^*\quad\mu_{i,j}=\frac{\langle b_i,b_j^*\rangle}{\langle b_j^*,b_j^*\rangle}

where $\mu_{i,j}$ is the Gram-Schmidt coefficients.

One can immediately check that this new basis is orthogonal, meaning

\langle b_i^*,b_j^*\rangle=\begin{cases}0&i\neq j\\\left\lVert b_i^*\right\rVert^2&i=j\end{cases}

Let $\mathcal B$ be the matrix where the $i$ th row is given by $b_i$ and $\mathcal B^*$ be the matrix where the $i$ th row is given by $b_i^*$ , then the Gram-Schmidt orthogonalization gives us $\mathcal B=\mu\mathcal B^*$ where $\mu_{i,i}=1,\mu_{j,i}=0$ and $\mu_{i,j}$ is the Gram-Schmidt coefficient. As an example, consider the basis of a subspace of $\mathbb R^4$ :

\begin{matrix} b_1 &= & (&-1&-2&3&1&)\\ b_2 &= & (&-6&-4&5&1&)\\ b_3 &= & (&5&5&1&-3&) \end{matrix}

Instead of doing the Gram-Schmidt orthogonalization by hand, we can get sage to do it for us:

B = Matrix([
[-1, -2, 3, 1],
[-6, -4, 5, 1],
[5, 5, 1, -3]])

B.gram_schmidt()

This outputs two matrices, $\mathcal B^*$ and $\mu$ :

(
[-1 -2  3  1]  [ 1  0  0]
[-4  0 -1 -1]  [ 2  1  0]
[ 0  3  3 -3], [-1 -1  1]
)

One can quickly verify that $\mathcal B=\mu\mathcal B^*$ and that the rows of $\mathcal B^*$ are orthogonal to each other.

A useful result is that

\det\left(\mathcal B\mathcal B^T\right)=\det\left(\mathcal B^*\mathcal B^{*T}\right)=\prod_i\left\lVert b_i^*\right\rVert

Intuitively, this tells us that the more orthogonal a set of basis for a lattice is, the shorter it is as the volume must be constant.

Exercises

1) Show that the basis $b_i^*$ is orthogonal.

2) Verify that the output of sage is indeed correct.

3) Show that $\mu\mu^T=1$ and $\mathcal B^*\mathcal B^{*T}$ is a diagonal matrix whose entries are $\left\lVert b_i^*\right\rVert$ . Conclude that $\det\left(\mathcal B\mathcal B^T\right)=\det\left(\mathcal B^*\mathcal B^{*T}\right)=\prod_i\left\lVert b_i^*\right\rVert$ .

4*) Given the Iwasawa decomposition $\mathcal B=LDO$ where $L$ is a lower diagonal matrix with $1$ on its diagonal, $D$ is a diagonal matrix and $O$ an orthogonal matrix, meaning $OO^T=1$ , show that $\mathcal B^*=DO$ and $\mu=L$ . Furthermore, prove that such a decomposition is unique.

Loading...

LLL reduction

Overview

There are a few issues that one may encounter when attempting to generalize Lagrange's algorithm to higher dimensions. Most importantly, one needs to figure what is the proper way to swap the vectors around and when to terminate, ideally in in polynomial time. A rough sketch of how the algorithm should look like is

def LLL(B):
    d = B.nrows()
    i = 1
    while i<d:
        size_reduce(B)
        if swap_condition(B):
            i += 1
        else:
            B[i],B[i-1] = B[i-1],B[i]
            i = max(i-1,1)
    return B

There are two things we need to figure out, in what order should we reduce the basis elements by and how should we know when to swap. Ideally, we also want the basis to be ordered in a way such that the smallest basis vectors comes first. Intuitively, it would also be better to reduce a vector by the larger vectors first before reducing by the smaller vectors, a very vague analogy to filling up a jar with big stones first before putting in the sand. This leads us to the following size reduction algorithm:

def size_reduce(B):
    d = B.nrows()
    i = 1
    while i<d:
        Bs,M = B.gram_schmidt()
        for j in reversed(range(i)):
            B[i] -= round(M[i,j])*B[j]
            Bs,M = B.gram_schmidt()
    return B

We can further improve this by optimizing the Gram Schmidt computation as this algorithm does not modify $\mathcal B^*$ at all. Furthermore $\mu$ changes in a very predictable fasion and when vectors are swapped, one can write explicit formulas for how $\mathcal B^*$ changes as well.

Next, we need to figure a swapping condition. Naively, we want

\left\lVert b_i\right\rVert\leq\left\lVert b_{i+1}\right\rVert

for all $i$ . However, such a condition does not guarantee termination in polynomial time. As short basis vectors should be almost orthogonal, we may also want to incorperate this notion. Concretely, we want $\left|\mu_{i,j}\right|$ to be somewhat small for all pairs of $i,j$ , i.e. we may want something like

|\mu_{i,j}|\leq c

However, since $\mu_{i,j}=\frac{\langle b_i,b_j^*\rangle}{\langle b_j^*,b_j^*\rangle}$ , this condition is easily satisfied for a sufficiently long $b_j^*$ , which is not what we want. The key idea is to merge these two in some way and was first noticed by Lovász - named the Lovász condition:

\delta\left\lVert b_i^*\right\rVert^2\leq\left\lVert b_{i+1}^*+\mu_{i+1,i}b_i^*\right\rVert^2\quad\delta\in\left(\frac14,1\right)

It turns out that using this condition, the algorithm above terminates in polynomial time! More specifically, it has a time complexity of $O\left(d^5n\log^3B\right)$ where we have $d$ basis vectors as a subset of $\mathbb R^n$ and $B$ is a bound for the largest norm of $b_i$ . $\frac14<\delta$ ensures that the lattice vectors are ordered roughly by size and $\delta<1$ ensures the algorithm terminates.

Polynomial time proof

This follows the proof provided by the authors of the LLL paper. We first prove that the algorithm terminates by showing it swaps the vectors finitely many times. Let $d$ be the number of basis vectors as a subset of $\mathbb R^n$ . Let $d_i$ be the volume of the lattice generated by $\left\{b_j\right\}_{j=1}^i$ at each step of the algorithm. We have $d_i=\prod_{j=1}^i\left\lVert b_j^*\right\rVert$ . Now consider the quantity

D=\prod_{i=1}^dd_i

This quantity only changes whenever some $b_i^*$ changes, i.e when swaps happen. Let's consider what happens when we swap $b_i$ and $b_{i+1}$ . Recall the Gram-Schmidt algorithm:

b_i^*=b_i-\sum_{j=1}^{i-1}\mu_{i,j}b_j^*\quad\mu_{i,j}=\frac{\langle b_i,b_j^*\rangle}{\langle b_j^*,b_j^*\rangle}

From this, see that when we swap $b_i$ and $b_{i+1}$ , $b_i^*$ is replaced by $b_{i+1}^*+\mu_{i+1,i}b_i^*$ . Now using the Lovász condition, we see that we have $\left\lVert b_{i+1}^*+\mu_{i+1,i}b_i^*\right\rVert^2<\delta\left\lVert b_i^*\right\rVert^2$ , hence the value of $d_i$ must decrease by at least $\delta$ , i.e. the new $d_i$ is less than $\frac{d_i}\delta$ . All other $d_j,j\neq i$ must remain the same as the volume remains fixed when we swap basis vectors around. Hence at each swap, $D$ decreases by $\delta$ . This is why we need $\delta<1$ .Now we are left with showing $d_i$ is bounded from below then we are done.

Let $\lambda_1(L)$ be the length of the shortest (nonzero) vector in the lattice. We can treat $d_i$ as the volume of the lattice $L_i$ generated by $\left\{b_j\right\}_{j=1}^i$ . Let $x_i$ be the shortest vector in the lattice in $L_i$ . By using Minkowski's lattice point theorem, we have

\begin{align*} \lambda_1(L)\leq x_i&\leq\underbrace{\frac2{\sqrt\pi}\Gamma\left(\frac i2+1\right)^{\frac1i}}_{C_i}d_i^\frac1i\\ d_i&\geq\frac{\lambda_1(L)^i}{C_i^i}=d_{i,\min} \end{align*}

(Note that the value of $C_i$ isn't particularly important, one can use a easier value like $\sqrt i$ )

Hence we see that $d_i$ , and hence $D$ has a (loose) lower bound $D_{\min}=\prod_{i=1}^dd_{i,\min}$ , meaning that there are at most $\frac{\log D}{\log D_{\min}\delta}$ swaps. Since at each iteration, $k$ either increases by $1$ when there is no swaps or decreases by at most $1$ when there is swaps and $k$ ranges from $2$ to $d$ , the number of time the loop runs must be at most $2\frac{\log D}{\log D_{\min}\delta}+d$ , hence the algorithm terminates.

This proof also gives us a handle on the time complexity of the operation. Let $B$ is the length of the longest input basis vector. Since we have $d_i\leq B^i$ , $D\leq B^{\frac{m^2+m}2}$ and the algorithm loops $O\left(d^2\log B\right)$ times. The Gram-Schmidt orthogonalization is the most expensive part in the entire process, taking up $O\left(d^2n\right)$ arithmetic operations. By using classical algorithm for arithmetic operations, each takes $O\left(n\log B\right)$ time. From this, we deduce that the time complexity of the LLL algorithm is $O\left(d^5m\log^2B\right)$ , a somewhat reasonable polynomial time algorithm.

Let $b_i$ be the output of the LLL algorithm, it turns out that we have the bound

\left\lVert b_1\right\rVert\leq\left(\frac4{4\delta-1}\right)^{\frac{d-1}4}\text{vol}(L)^\frac1d

which requires $\delta>\frac14$ . Such bounds for the shortest vector will be elaborated in more detail in the section on reduced basis.

Exercises

1) Implement the LLL in sage and experimentally verify that $D$ does indeed decrease by $\delta$ each time.

2) Show that the time complexity analysis is correct, and indeed each loop takes at most $O\left(d^2n\right)$ operations.