Just as electrical engineers need to use multimeters and oscilliscopes to verify that their circuit components function as expected, quantum physicists need to verify that their quantum components work. In quantum computing, it is necessary to verify that the input qubits are actually what you think they are, and to verify that quantum gates act the way you expect them to act. Verification is an essential tool for both engineering these systems, as well as for understanding the underlying physics.

The method of verifying a quantum state is known as **quantum state tomography**. The experimentalist collects a series of measurements of the system, and then tries to infer what the system is. Because these measurements are linear, this is an example of a *linear inverse problem*.

In quantum information, the basic quantum state is a collection of qubits. For a system of qubits, if the state is *separable*, then we can mathematically represent it with about complex numbers, in analogy with how we store bits in a normal coputer. However, quantum states can also be entangled, in which case we require about complex numbers to represent the state. This exponential growth in data is one of the reasons quantum computing has advantages over classical computing (e.g. the famous Shor's algorithm).

In general, we represent our quantum state as a density matrix which is a Hermitian matrix of size . The quantum tomography measurement process collects measurements

where is a term that represents noise, and is a Hermitian matrix that represents an observable. For experimental reasons, is the tensor product of Pauli matrices.

The main challenge of tomography is easy to state: it is hard and slow to collect each measurement . After a long period of time (about half a day), the measurements are no longer reliable, so there is a fixed time budget. But the quantum state requires about numbers, so for large qubit systems this is an unreasonable number of measurements. Conventional approaches have been stuck at about 8 qubit systems, but physicists would like to study larger systems for a variety of reasons.

This page discusses our solutions to the problem. To learn more about other uses of quantum state tomography as well as the related technique of quantum process tomography, see this paper by Scott Aaronson, which gives a few references for uses of tomography (e.g. observing chemical reactions, confirm preparation of photon or ion entangle states, testing quantum gates, and characterizing optical devices). For more information on quantum information, we recommend Quantum Computation and Quantum Information by Nielsen and Chuang, Cambridge University Press 2011.

This approach is described in my paper Quantum state tomography via compressed sensing (Gross, Liu, Flammia, Becker and Eisert, in Physical Review Letters, 2010), and it is based on the new paradigm of *matrix completion*.

This section is still under construction, Jan 2013.

Can we find an algorithm that is both *faster* and more *accurate* than the convex approach? In 2013, Volkan Cevher and I show that, surprisingly, we can.

This section is still under construction, Jan 2013.

In 2005, S. Burer and R.D.C. Monteiro showed two things: first, it is possible to re-write the non-convex algorithm in what I call a “splitting” approach, and secondly, that if is large enough (about where is the number of constraints), then the non-convex algorithm has the same global minimum as the convex problem and no other local minima (see also Moscato, Norman and Pataki 1998).

The advantage of the splitting approach is cheaper computational costs, since it avoids eigenvalue decompositions. However, despite the formal equivalence of the two low-rank problems in terms of their local minima, they two formulations need not have the same stationary points. Furthermore, they lead to different computer implementations, so each algorithm may convege to a different local minima.

This section is still under construction, Jan 2013.

For a rank matrix, our algorithm requires memory and each step takes computation.

The problem is much more challenging than matrix completion because the measurement operator is much larger and denser. In tomography, each row of this matrix has nonzero entries (“nnz”), as opposed to entry in matrix completion. Since we have rows, this means requires storage if handled naively.

The following table compares the basic statistics of the quantum tomography problem with the well-known Netflix prize data set, which was considered a huge data set when released in 2007.

problem | matrix size | nnz() | storage (naive) | storage (clever) | range() |

Netflix | 17,770 x 480,189 | ~.9 GB | depends | sparse | |

Quantum tomography, 16 qubits | 65,536 x 65,536 | ~400 GB | 100 MB | dense |

Another complicating factor is that the gradient, which is in the range of , is not sparse, and therefore it is slow to multiply. For this reason, Lanczos methods to compute the eigenvalue decomposition are slow.

We tackle these numerical difficulties with the following tools:

The matrices in each step of the algorithm are never explicitly formed, so we can take advantage of low-rank factors

Eigenvalue decompositions are approximated using randomized linear algebra; see the SIAM Review paper Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions by Halko, Martinsson and Tropp (2010).

We create a custom routine to apply

We implicitly store each row in numbers rather than numbers, taking advantage of the tensor factorization.

We do coarse-grained parallelism using MatlabMPI. The overhead of using Matlab and MatlabMPI is not significant for large problems.

The multiplies are written in C using the mex interface to Matlab

The C files use pthreads for parallel computation on the local cores of each computer, using shared memory. There is extremely low overhead using this approach.

The resulting problem is solved on the Electra hybrid CPU/GPU cluster at EPFL.

Below, we show our basic implementation for small problems that uses pure Matlab, and our production-scale implementation that we ran on the cluster. The time per iteration grows approximately with the dimension squared.