Papers
For the very latest, see autogenerated lists, like:
- Stephen’s arXiv papers
- google scholar
- DBLP computer science papers
- ResearchGate and academia.edu
- ORCID
- Semantic Scholar
Click the title or triangle to show abstracts
Preprints
- NeurIPS 2024 ("test of time award" 1987) preprint
- [arXiv]
- ICML 2025, to appear (as spotlight poster, top 2.6% of submissions; and as oral presentation, top 1%) [arXiv] Supersedes our tech report: Variational Entropy Search for Adjusting Expected Improvement
- [arXiv]
- [arXiv]
2024 Global solutions in deep learning in sub-linear time
Abstract:2025 Stochastic Subspace Descent Accelerated via Bi-fidelity Line Search
Abstract: Efficient optimization remains a fundamental challenge across numerous scientific and engineering domains, especially when objective function and gradient evaluations are computationally expensive. While zeroth-order optimization methods offer effective approaches when gradients are inaccessible, their practical performance can be limited by the high cost associated with function queries. This work introduces the bi-fidelity stochastic subspace descent (BF-SSD) algorithm, a novel zeroth-order optimization method designed to reduce this computational burden. BF-SSD leverages a bi-fidelity framework, constructing a surrogate model from a combination of computationally inexpensive low-fidelity (LF) and accurate high-fidelity (HF) function evaluations. This surrogate model facilitates an efficient backtracking line search for step size selection, for which we provide theoretical convergence guarantees under standard assumptions. We perform a comprehensive empirical evaluation of BF-SSD across four distinct problems; a synthetic optimization benchmark, dual-form kernel ridge regression, black-box adversarial attacks on machine learning models, and transformer-based black-box language model fine-tuning. Numerical results demonstrate that BF-SSD consistently achieves superior optimization performance while requiring significantly fewer HF function evaluations compared to relevant baseline methods. This study highlights the efficacy of integrating bi-fidelity strategies within zeroth-order optimization, positioning BF-SSD as a promising and computationally efficient approach for tackling large-scale, high-dimensional problems encountered in various real-world applications.2025 A Relaxed Primal-Dual Hybrid Gradient Method with Line Search
Abstract: The primal-dual hybrid gradient method (PDHG) is useful for optimization problems that commonly appear in image reconstruction. A downside of PDHG is that there are typically three user-set parameters and performance of the algorithm is sensitive to their values. Toward a parameter-free algorithm, we combine two existing line searches. The first, by Malitsky et al., is over two of the step sizes in the PDHG iterations. We then use the connection between PDHG and the primal-dual form of Douglas-Rachford splitting to construct a line search over the relaxation parameter. We demonstrate the efficacy of the combined line search on multiple problems, including a novel inverse problem in magnetic resonance image reconstruction. The method presented in this manuscript is the first parameter-free variant of PDHG (across all numerical experiments, there were no changes to line search hyperparameters).2025 WENDy for Nonlinear-in-Parameters ODEs
Abstract: The Weak-form Estimation of Non-linear Dynamics (WENDy) algorithm is extended to accommodate systems of ordinary differential equations that are nonlinear-in-parameters. The extension rests on derived analytic expressions for a likelihood function, its gradient and its Hessian matrix. WENDy makes use of these to approximate a maximum likelihood estimator based on optimization routines suited for non-convex optimization problems. The resulting parameter estimation algorithm has better accuracy, a substantially larger domain of convergence, and is often orders of magnitude faster than the conventional output error least squares method (based on forward solvers). The algorithm is efficiently implemented in Julia. We demonstrate the algorithm's ability to accommodate the weak form optimization for both additive normal and multiplicative log-normal noise, and present results on a suite of benchmark systems of ordinary differential equations. In order to demonstrate the practical benefits of our approach, we present extensive comparisons between our method and output error methods in terms of accuracy, precision, bias, and coverage.2025 Exploring Exploration in Bayesian Optimization
Abstract: A well-balanced exploration-exploitation trade-off is crucial for successful acquisition functions in Bayesian optimization. However, there is a lack of quantitative measures for exploration, making it difficult to analyze and compare different acquisition functions. This work introduces two novel approaches - observation traveling salesman distance and observation entropy - to quantify the exploration characteristics of acquisition functions based on their selected observations. Using these measures, we examine the explorative nature of several well-known acquisition functions across a diverse set of black-box problems, uncover links between exploration and empirical performance, and reveal new relationships among existing acquisition functions. Beyond enabling a deeper understanding of acquisition functions, these measures also provide a foundation for guiding their design in a more principled and systematic manner.2025 A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization
Abstract: Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function's optimum and are often considered fundamentally distinct from EI. In this work, we challenge this prevailing perspective by introducing a unified theoretical framework, Variational Entropy Search, which reveals that EI and information-theoretic acquisition functions are more closely related than previously recognized. We demonstrate that EI can be interpreted as a variational inference approximation of the popular information-theoretic acquisition function, named Max-value Entropy Search. Building on this insight, we propose VES-Gamma, a novel acquisition function that balances the strengths of EI and MES. Extensive empirical evaluations across both low- and high-dimensional synthetic and real-world benchmarks demonstrate that VES-Gamma is competitive with state-of-the-art acquisition functions and in many cases outperforms EI and MES.2025 Evaluation of data driven low-rank matrix factorization for accelerated solutions of the Vlasov equation
Abstract: Low-rank methods have shown success in accelerating simulations of a collisionless plasma described by the Vlasov equation, but still rely on computationally costly linear algebra every time step. We propose a data-driven factorization method using artificial neural networks, specifically with convolutional layer architecture, that trains on existing simulation data. At inference time, the model outputs a low-rank decomposition of the distribution field of the charged particles, and we demonstrate that this step is faster than the standard linear algebra technique. Numerical experiments show that the method effectively interpolates time-series data, generalizing to unseen test data in a manner beyond just memorizing training data; patterns in factorization also inherently followed the same numerical trend as those within algebraic methods (e.g., truncated singular-value decomposition). However, when training on the first 70% of a time-series data and testing on the remaining 30%, the method fails to meaningfully extrapolate. Despite this limiting result, the technique may have benefits for simulations in a statistical steady-state or otherwise showing temporal stability2023 In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD
Abstract: Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. Additionally, performing inference at runtime requires non-trivial coupling of ML framework libraries with simulation codes. This work offers a solution to both limitations by simplifying this coupling and enabling in situ training and inference workflows on heterogeneous clusters. Leveraging SmartSim, the presented framework deploys a database to store data and ML models in memory, thus circumventing the file system. On the Polaris supercomputer, we demonstrate perfect scaling efficiency to the full machine size of the data transfer and inference costs thanks to a novel co-located deployment of the database. Moreover, we train an autoencoder in situ from a turbulent flow simulation, showing that the framework overhead is negligible relative to a solver time step and training epoch.2022 Fast Algorithms for Monotone Lower Subsets of Kronecker Least Squares Problems
Abstract: Approximate solutions to large least squares problems can be computed efficiently using leverage score-based row-sketches, but directly computing the leverage scores, or sampling according to them with naive methods, still requires an expensive manipulation and processing of the design matrix. In this paper we develop efficient leverage score-based sampling methods for matrices with certain Kronecker product-type structure; in particular we consider matrices that are monotone lower column subsets of Kronecker product matrices. Our discussion is general, encompassing least squares problems on infinite domains, in which case matrices formally have infinitely many rows. We briefly survey leverage score-based sampling guarantees from the numerical linear algebra and approximation theory communities, and follow this with efficient algorithms for sampling when the design matrix has Kronecker-type structure. Our numerical examples confirm that sketches based on exact leverage score sampling for our class of structured matrices achieve superior residual compared to approximate leverage score sampling methods.2021 On the computation of a non-parametric estimator by convex optimization
Abstract: Estimation of linear functionals from observed data is an important task in many subjects. Juditsky & Nemirovski [The Annals of Statistics 37.5A (2009): 2278-2300] propose a framework for non-parametric estimation of linear functionals in a very general setting, with nearly minimax optimal confidence intervals. They compute this estimator and the associated confidence interval by approximating the saddle-point of a function. While this optimization problem is convex, it is rather difficult to solve using existing off-the-shelf optimization software. Furthermore, this computation can be expensive when the estimators live in a high-dimensional space. We propose a different algorithm to construct this estimator. Our algorithm can be used with existing optimization software and is much cheaper to implement even when the estimators are in a high-dimensional space, as long as the Hellinger affinity (or the Bhattacharyya coefficient) for the chosen parametric distribution can be efficiently computed given the parameters. We hope that our algorithm will foster the adoption of this estimation technique to a wider variety of problems with relative ease.Journal papers
- Physical Review Letters. [DOI] [arXiv] [code] See also the companion paper (an extended version): Theory of versatile fidelity estimation with confidence
- Physical Review A. [DOI] [arXiv] [code] See also the companion paper (an extended version): Versatile fidelity estimation with confidence
- IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 70(1) pp 52--63. [DOI] Cover artwork for this journal issue features our article
- Computational Optimization and Applications. [DOI] [arXiv] This is an updated version of Stochastic Subspace Descent
- Journal of Optimization Theory and Applications (JOTA). [DOI] [arXiv] [code] Free final version
- Optics Express, 28(26) pp 39413-39429. [DOI]
- Advances in Computational Mathematics, vol 46 (76). [DOI] [arXiv] [code] Free final version
- J. of Biomedical Optics, 25(5), 056501. [DOI]
- J. Optimization Theory and Applications (JOTA). [DOI] [arXiv] [code] Free final version, or locally hosted
- . [code] BiorXiv; or final (open access) version
- SIAM J. Optimization, 29(4), 2445-2482. [DOI] [arXiv] Optimization Online
- Formal Methods and Design. [DOI]
- Optics Express, vol 26(8) pp. 9850-9865. [DOI] Researchgate full text
- IEEE Trans. Info Theory. [DOI] [arXiv] [code] CU Scholar website
- IEEE J. Selected Topics in Signal Processing. [DOI] Final version
- Journal of Nonlinear and Convex Analysis, vol. 15, no. 1, pp. 137--159. [arXiv]
- Medical Physics, vol. 40, 071719. [DOI]
- Journal of Physical Chemistry B, vol. 114, no. 48, pp. 14176--14183. [DOI]
- Mathematical Programming Computation, vol 3 no 3 165--218. [DOI] [code] local version. An extended version appears as chapter 4 in my thesis. Typos: eq. 4.6a should have 1, not \sigma_k^{-1}t_k^{(1)}, in the second parameter of Trunc, and the two equations above should have z^{(1)} not z^{(2)} in the l-infinity norm constraint. Listing 6 should have the adjoint of W in line 4.
- Proceedings of the National Academy of Science, Vol. 104, 9575-9579. [DOI]
- Physical Review Letters, vol. 97 no. 5. [DOI] self-hosted version
- UMAP Journal, (26)1 pp. 37-48.
[J50] 2024 Mesh-Float-Zip (MFZ), Manifold harmonic bases for unstructured spatial data compression
Abstract: Block-coder style data compression schemes, such as JPEG, ZFP and SPERR, rely on de-correlating transformations in order to efficiently store data with entropy encoding schemes. These de-correlating transformations are typically reliant on a uniform spatiotemporal sampling of the transformations and/or the data. Thus, for nonuniform data which may include spatial (and temporal) nonuniformity, such as those arising from partial differential equations solveing over complex geometries and unstructured grids, there is not an efficient de-correlating transformation nor a data compression pipeline that sufficiently adapts to this nonuniformity. We presented the manifold harmonic basis (MHB) as a de-correlating transformation, and prod d a collocation method over the nonuniform domain for solving the eigen problem that defines the MHB. Additionally, we demonstrated a modified version of ZFP's compression pipeline that adapted to the problems presented by nonuniformity in a block-coder style compression scheme. This resulted in a state-of-the-art spatial data compression technique for nonuniform data which we call Mesh-Float-Zip (MFZ). We illustrated the performance of the proposed compression strategy and compared its accuracy against ZFP and SZ compressors (applied to both original and reshuffled data) on three scientific computing datasets.[J49] 2024 Online randomized interpolative decomposition with a posteriori error estimator for temporal PDE data reduction
Abstract: Traditional low-rank approximation is a powerful tool to compress the huge data matrices that arise in simulations of partial differential equations (PDE), but suffers from high computational cost and requires several passes over the PDE data. The compressed data may also lack interpretability thus making it difficult to identify feature patterns from the original data. To address this issue, we present an online randomized algorithm to compute the interpolative decomposition (ID) of large-scale data matrices in situ. Compared to previous randomized IDs that used the QR decomposition to determine the column basis, we adopt a streaming ridge leverage score-based column subset selection algorithm that dynamically selects proper basis columns from the data and thus avoids an extra pass over the data to compute the coefficient matrix of the ID. In particular, we adopt a single-pass error estimator based on the non-adaptive Hutch++ algorithm to provide real-time error approximation for determining the best coefficients. As a result, our approach only needs a single pass over the original data and thus is suitable for large and high-dimensional matrices stored outside of core memory or generated in PDE simulations. We also provide numerical experiments on turbulent channel flow and ignition simulations, and on the NSTX Gas Puff Image dataset, comparing our algorithm with the offline ID algorithm to demonstrate its utility in real-world applications.[J48] 2024 Optimization of Array Encoding for Ultrasound Imaging
Abstract: Objective; The transmit encoding model for synthetic aperture imaging is a robust and flexible framework for understanding the effect of acoustic transmission on ultrasound image reconstruction. Our objective is to use machine learning (ML) to construct scanning sequences, parameterized by time delays and apodization weights, that produce high quality B-mode images. Approach; We use an ML model in PyTorch and simulated RF data from Field II to probe the space of possible encoding sequences for those that minimize a loss function that describes image quality. This approach is made computationally feasible by a novel formulation of the derivative for delay-and-sum beamforming. We demonstrate these results experimentally on wire targets and a tissue-mimicking phantom. Main Results; When trained according to a given set of imaging parameters (imaging domain, hardware restrictions), our ML imaging model produces optimized encoding sequences that improve a number of standard quality metrics including resolution, field of view, and contrast, over conventional sequences. Significance; This work demonstrates that the set of encoding schemes that are commonly used represent only a narrow subset of those available. Additionally, it demonstrates the value for ML tasks in synthetic transmit aperture imaging to consider the beamformer within the model, instead of as purely post-processing.[J47] 2024 Bi-fidelity Variational Auto-encoder for Uncertainty Quantification
Abstract: Quantifying the uncertainty of quantities of interest (QoIs) from physical systems is a primary objective in model validation. However, achieving this goal entails balancing the need for computational efficiency with the requirement for numerical accuracy. To address this trade-off, we propose a novel bi-fidelity formulation of variational auto-encoders (BF-VAE) designed to estimate the uncertainty associated with a QoI from low-fidelity (LF) and high-fidelity (HF) samples of the QoI. This model allows for the approximation of the statistics of the HF QoI by leveraging information derived from its LF counterpart. Specifically, we design a bi-fidelity auto-regressive model in the latent space that is integrated within the VAE's probabilistic encoder-decoder structure. An effective algorithm is proposed to maximize the variational lower bound of the HF log-likelihood in the presence of limited HF data, resulting in the synthesis of HF realizations with a reduced computational cost. Additionally, we introduce the concept of the bi-fidelity information bottleneck (BF-IB) to provide an information-theoretic interpretation of the proposed BF-VAE model. Our numerical results demonstrate that BF-VAE leads to considerably improved accuracy, as compared to a VAE trained using only HF data when limited HF data is available.[J46] 2025 The Dependence of Parallel Imaging with Linear Predictability on the Undersampling Direction
Abstract: Parallel imaging with linear predictability takes advantage of information present in multiple receive coils to accurately reconstruct the image with fewer samples. Commonly used algorithms based on linear predictability include GRAPPA and SPIRiT. We present a sufficient condition for reconstruction based on the direction of undersampling and the arrangement of the sensing coils. This condition is justified theoretically and examples are shown using real data. We also propose a metric based on the fully-sampled auto-calibration region which can show which direction(s) of undersampling will allow for a good quality image reconstruction.[J45] 2023 QuadConv: Quadrature-Based Convolutions with Applications to Non-Uniform PDE Data Compression
Abstract: We present a new convolution layer for deep learning architectures which we call QuadConv -- an approximation to continuous convolution via quadrature. Our operator is developed explicitly for use on unstructured data, and accomplishes this by learning a continuous kernel that can be sampled at arbitrary locations. In the setting of neural compression, we show that a QuadConv-based autoencoder, resulting in a Quadrature Convolutional Neural Network (QCNN), can match the performance of standard discrete convolutions on structured uniform data, as in CNNs, and maintain this accuracy on unstructured data.[J44] 2023 Subsampling of Parametric Models with Bi-fidelity Boosting
Abstract: Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise to reduce the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bi-fidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the non-boosted solution[J43] 2024 Versatile fidelity estimation with confidence
Abstract: As quantum devices become more complex and the requirements on these devices become more demanding, it is crucial to be able to verify the performance of such devices in a scalable and reliable fashion. A cornerstone task in this challenge is quantifying how close an experimentally prepared quantum state is to the desired one. Here we present a method to construct an estimator for the quantum state fidelity that is compatible with any measurement protocol. Our method provides a confidence interval on this estimator that is guaranteed to be nearly minimax optimal for the specified measurement protocol. For a well-chosen measurement scheme, our method is competitive in the number of measurement outcomes required for estimation. We demonstrate our method using simulations and experimental data from a trapped-ion quantum computer and compare the results to state-of-the-art techniques. Our method can be easily extended to estimate the expectation value of any observable, such as entanglement witnesses.[J42] 2024 Theory of Versatile fidelity estimation with confidence
Abstract: Estimating the fidelity with a target state is important in quantum information tasks. Many fidelity estimation techniques present a suitable measurement scheme to perform the estimation. In contrast, we present techniques that allow the experimentalist to choose a convenient measurement setting. Our primary focus lies on a method that constructs an estimator with nearly minimax optimal confidence interval for any specified measurement setting. We demonstrate, through a combination of theoretical and numerical results, various desirable properties for the method -- robustness against experimental imperfections, competitive sample complexity, and accurate estimates in practice. We compare this method with Maximum Likelihood Estimation and the associated Profile Likelihood method, a Semi-Definite Programming based approach, as well as a popular direct fidelity estimation technique.[J41] 2023 Modeling massive multivariate spatial data with the basis graphical lasso
Abstract: We propose a new modeling framework for highly-multivariate spatial processes that synthesizes ideas from recent multiscale and spectral approaches with graphical models. The basis graphical lasso writes a univariate Gaussian process as a linear combination of basis functions weighted with entries of a Gaussian graphical vector whose graph is estimated from optimizing an ℓ1 penalized likelihood. This paper extends the setting to a multivariate Gaussian process where the basis functions are weighted with Gaussian graphical vectors. We motivate a model where the basis functions represent different levels of resolution and the graphical vectors for each level are assumed to be independent. Using an orthogonal basis grants linear complexity and memory usage in the number of spatial locations, the number of basis functions, and the number of realizations. An additional fusion penalty encourages a parsimonious conditional independence structure in the multilevel graphical model. We illustrate our method on a large climate ensemble from the National Center for Atmospheric Research's Community Atmosphere Model that involves 40 spatial processes.[J40] 2024 Approximate maximum likelihood estimators for linear regression with design matrix uncertainty
Abstract: In this paper we consider regression problems subject to arbitrary noise in the operator or design matrix. This characterization appropriately models many physical phenomena with uncertainty in the regressors. Although the problem has been studied extensively for ordinary/total least squares, and via models that implicitly or explicitly assume Gaussianity, less attention has been paid to improving estimation for regression problems under general uncertainty in the design matrix. To address difficulties encountered when dealing with distributions of sums of random variables, we rely on the saddle point method to estimate densities and form an approximate log-likelihood to maximize. We show that the proposed method performs favorably against other classical methods.[J39] 2024 High probability convergence and uniform stability bounds for nonconvex stochastic gradient descent
Abstract: Stochastic gradient descent is one of the most common iterative algorithms used in machine learning. While being computationally cheap to implement, recent literature suggests it may have implicit regularization properties that prevent over-fitting. This paper analyzes the properties of stochastic gradient descent from a theoretical standpoint to help bridge the gap between theoretical and empirical results. Most theoretical results either assume convexity or only provide convergence results in mean, while this paper proves convergence bounds in high probability without assuming convexity. Assuming strong smoothness, we prove high probability convergence bounds in two settings: (1) assuming the Polyak-Łojasiewicz inequality and norm sub-Gaussian gradient noise and (2) assuming norm sub-Weibull gradient noise. In the first setting, we combine our convergence bounds with existing generalization bounds in order to bound the true risk and show that for a certain number of epochs, convergence and generalization balance in such a way that the true risk goes to the empirical minimum as the number of samples goes to infinity. In the second setting, as an intermediate step to proving convergence, we prove a probability result of independent interest. The probability result extends Freedman-type concentration beyond the sub-exponential threshold to heavier-tailed martingale difference sequences.[J38] 2023 Comparison of spatial encodings for ultrasound imaging
Abstract: Ultrasound pulse sequencing and receive signal focusing work hand-in-hand to determine image quality. These are commonly linked by geometry, for example, using focused beams or plane waves in transmission paired with appropriate time-of-flight calculations for focusing. Spatial encoding allows a broader class of array transmissions but requires decoding of the recorded echoes before geometric focusing can be applied. Recent work has expanded spatial encoding to include not only element apodizations, but also element time delays. This powerful technique allows for a unified beamforming strategy across different pulse sequences and increased flexibility in array signal processing giving access to estimates of individual transmit element signals, but tradeoffs in image quality between these encodings have not been previously studied. We evaluate in simulation several commonly used time delay and amplitude encodings and investigate the optimization of the parameter space for each. Using the signal-to-noise ratio (SNR), point resolution, and lesion detectability, we found tradeoffs between focused beams, plane waves, and Hadamard weight encodings. Beams with broader geometries maintained a wider field of view after decoding at the cost of the SNR and lesion detectability. Focused beams and plane waves showed slightly reduced resolution compared to Hadamard weights in some cases, especially close to the array. We also found overall degraded image quality using random weight or random delay encodings. We validate these findings with experimental phantom imaging for select cases. We believe that these findings provide a starting point for sequence optimization and improved image quality using the spatial encoding approach for imaging.[J37] 2021 A study of scalar optically-pumped magnetometers for use in magnetoencephalography without shielding
Abstract: Scalar optically-pumped magnetometers (OPMs) are being developed in small packages with high sensitivities. The high common-mode rejection ratio of these sensors allows for detection of very small signals in the presence of large background fields making them ideally suited for brain imaging applications in unshielded environments. Despite a flurry of activity around the topic, questions remain concerning how well a dipolar source can be localized under such conditions, especially when using few sensors. In this paper, we investigate the source localization capabilities using an array of scalar OPMs in the presence of a large background field while varying dipole strength, sensor count, and forward model accuracy. We also consider localization performance as the orientation angle of the background field changes. Our results are validated experimentally through accurate localization using a phantom virtual array mimicking a current dipole in a conducting sphere in a large background field. Our results are intended to give researchers a general sense of the capabilities and limitations of scalar OPMs for magnetoencephalography systems.[J36] 2021 Spectral estimation from simulations via sketching
Abstract: Sketching is a stochastic dimension reduction method that preserves geometric structures of data and has applications in high-dimensional regression, low rank approximation and graph sparsification. In this work, we show that sketching can be used to compress simulation data and still accurately estimate time autocorrelation and power spectral density. For a given compression ratio, the accuracy is much higher than using previously known methods. In addition to providing theoretical guarantees, we apply sketching to a molecular dynamics simulation of methanol and find that the estimate of spectral density is 90% accurate using only 10% of the data.[J35] 2021 A stochastic subspace approach to gradient-free optimization in high dimensions
Abstract: We present a stochastic descent algorithm for unconstrained optimization that is particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained optimization and machine learning problems. The algorithm maps the gradient onto a low-dimensional random subspace of dimension l at each iteration, similar to coordinate descent but without restricting directional derivatives to be along the axes. Without requiring a full gradient, this mapping can be performed by computing l directional derivatives (e.g., via forward-mode automatic differentiation). We give proofs for convergence in expectation under various convexity assumptions as well as probabilistic convergence results under strong-convexity. Our method extends the well-known Gaussian smoothing technique to descent in subspaces of dimension greater than one, opening the doors to new analysis of Gaussian smoothing when more than one directional derivative is used at each iteration. We also provide a finite-dimensional variant of a special case of the Johnson-Lindenstrauss lemma. Experimentally, we show that our method compares favorably to coordinate descent, Gaussian smoothing, gradient descent and BFGS (when gradients are calculated via forward-mode automatic differentiation) on problems from the machine learning and shape optimization literature.[J34] 2021 Bounds for the tracking error of first-order online optimization methods
Abstract: This paper investigates online algorithms for smooth time-varying optimization problems, focusing first on methods with constant step-size, momentum, and extrapolation-length. Assuming strong convexity, precise results for the tracking iterate error (the limit supremum of the norm of the difference between the optimal solution and the iterates) for online gradient descent are derived. The paper then considers a general first-order framework, where a universal lower bound on the tracking iterate error is established. Furthermore, a method using "long-steps" is proposed and shown to achieve the lower bound up to a fixed constant. This method is then compared with online gradient descent for specific examples. Finally, the paper analyzes the effect of regularization when the cost is not strongly convex. With regularization, it is possible to achieve a non-regret bound. The paper ends by testing the accelerated and regularized methods on synthetic time-varying least-squares and logistic regression problems, respectively.[J33] 2020 l_1-regularized maximum likelihood estimation with focused-spot illumination quadruples the diffraction-limited resolution in fluorescence microscopy
Abstract: Super-resolution fluorescence microscopy has proven to be a useful tool in biological studies. To achieve more than two-fold resolution improvement over the diffraction limit, existing methods require exploitation of the physical properties of the fluorophores. Recently, it has been demonstrated that achieving more than two-fold resolution improvement without such exploitation is possible using only a focused illumination spot and numerical post-processing. However, how the achievable resolution is affected by the processing step has not been thoroughly investigated. In this paper, we focus on the processing aspect of this emerging super-resolution microscopy technique. Based on a careful examination of the dominant noise source and the available prior information in the image, we find that if a processing scheme is appropriate for the dominant noise model in the image and can utilize the prior information in the form of sparsity, improved accuracy can be expected. Based on simulation results, we identify an improved processing scheme and apply it in a real-world experiment to super-resolve a known calibration sample. We show an improved super-resolution of 60nm, approximately four times beyond the conventional diffraction-limited resolution.[J32] 2020 Robust Least Squares for Quantized Data
Abstract: In this paper we formulate and solve a robust least squares problem for a system of linear equations subject to quantization error in the data matrix. Ordinary least squares fails to consider uncertainty in the operator, modeling all noise in the observed signal. Total least squares accounts for uncertainty in the data matrix, but necessarily increases the condition number of the operator compared to ordinary least squares. Tikhonov regularization or ridge regression is frequently employed to combat ill-conditioning, but requires parameter tuning which presents a host of challenges and places strong assumptions on parameter prior distributions. The proposed method also requires selection of a parameter, but it can be chosen in a natural way, e.g., a matrix rounded to the 4th digit uses an uncertainty bounding parameter of 0.5e-4. We show here that our robust method is theoretically appropriate, tractable, and performs favorably against ordinary and total least squares.[J31] 2021 Randomization of Approximate Bilinear Computation for Matrix Multiplication
Abstract: We present a method for randomizing formulas for bilinear computation of matrix products. We consider the implications of such randomization when there are two sources of error: One due to the formula itself only being approximately correct, and one due to using floating point arithmetic. Our theoretical results and numerical experiments indicate that our method can improve performance when each of these error sources are present individually, as well as when they are present at the same time.[J30] 2020 Resolvability of Hamming Graphs
Abstract: A subset of vertices in a graph is called resolving when the geodesic distances to those vertices uniquely distinguish every vertex in the graph. Here, we characterize the resolvability of Hamming graphs in terms of a constrained linear system and deduce a novel but straightforward characterization of resolvability for hypercubes. We propose an integer linear programming method to assess resolvability rapidly, and provide a more costly but definite method based on Gröbner bases to determine whether or not a set of vertices resolves an arbitrary Hamming graph. As proof of concept, we identify a resolving set of size 77 in the metric space of all octapeptides (i.e., proteins composed of eight amino acids) with respect to the Hamming distance; in particular, any octamer may be readily represented as a 77-dimensional real-vector. Representing k-mers as low-dimensional numerical vectors may enable new applications of machine learning algorithms to symbolic sequences.[J29] 2020 Fast Randomized Matrix and Tensor Interpolative Decomposition Using CountSketch
Abstract: We propose a new fast randomized algorithm for interpolative decomposition of matrices which utilizes CountSketch. We then extend this approach to the tensor interpolative decomposition problem introduced by Biagioni et al. (J. Comput. Phys. 281, pp. 116-134, 2015). Theoretical performance guarantees are provided for both the matrix and tensor settings. Numerical experiments on both synthetic and real data demonstrate that our algorithms maintain the accuracy of competing methods, while running in less time, achieving at least an order of magnitude speed-up on large matrices and tensors.[J28] 2020 Guarantees for the Kronecker Fast Johnson-Lindenstrauss Transform Using a Coherence and Sampling Argument
Abstract: In the recent paper [Jin, Kolda & Ward, arXiv:1909.04801], it is proved that the Kronecker fast Johnson-Lindenstrauss transform (KFJLT) is, in fact, a Johnson-Lindenstrauss transform, which had previously only been conjectured. In this paper, we provide an alternative proof of this, for when the KFJLT is applied to Kronecker vectors, using a coherence and sampling argument. Our proof yields a different bound on the embedding dimension, which can be combined with the bound in the paper by Jin et al. to get a better bound overall. As a stepping stone to proving our result, we also show that the KFJLT is a subspace embedding for matrices with columns that have Kronecker product structure. Lastly, we compare the KFJLT to four other sketch techniques in numerical experiments on both synthetic and real-world data.[J27] 2020 Nonstationary Modeling With Sparsity for Spatial Data via the Basis Graphical Lasso
Abstract: Many modern spatial models express the stochastic variation component as a basis expansion with random coefficients. Low rank models, approximate spectral decompositions, multiresolution representations, stochastic partial differential equations and empirical orthogonal functions all fall within this basic framework. Given a particular basis, stochastic dependence relies on flexible modeling of the coefficients. Under a Gaussianity assumption, we propose a graphical model family for the stochastic coefficients by parameterizing the precision matrix. Sparsity in the precision matrix is encouraged using a penalized likelihood framework. Computations follow from a majorization-minimization approach, a byproduct of which is a connection to the graphical lasso. The result is a flexible nonstationary spatial model that is adaptable to very large datasets. We apply the model to two large and heterogeneous spatial datasets in statistical climatology and recover physically sensible graphical structures. Moreover, the model performs competitively against the popular LatticeKrig model in predictive cross-validation, but substantially improves the Akaike information criterion score[J26] 2020 Analyzing the super-resolution characteristics of focused-spot illumination approaches
Abstract: Significance: It is commonly assumed that using the objective lens to create a tightly focused light spot for illumination provides a twofold resolution improvement over the Rayleigh resolution limit and that resolution improvement is independent of object properties. Nevertheless, such an assumption has not been carefully examined. We examine this assumption by analyzing the performance of two super-resolution methods, known as image scanning microscopy (ISM) and illumination-enhanced sparsity (IES). Aim: We aim to identify the fundamental differences between the two methods, and to provide examples that help researchers determine which method to utilize for different imaging conditions. Approach: We input the same image datasets into the two methods and analyze their restorations. In numerical simulations, we design objects of distinct brightness and sparsity levels for imaging. We use biological imaging experiments to verify the simulation results. Results: The resolution of IES often exceeds twice the Rayleigh resolution limit when imaging sparse objects. A decrease in object sparsity negatively affects the resolution improvement in both methods. Conclusions: The IES method is superior for imaging sparse objects with its main features being bright and small against a dark, large background. For objects that are largely bright with small dark features, the ISM method is favorable.[J25] 2020 Efficient Solvers for Sparse Subspace Clustering
Abstract: Sparse subspace clustering (SSC) clusters n points that lie near a union of low-dimensional subspaces. The SSC model expresses each point as a linear or affine combination of the other points, using either ℓ1 or ℓ0 regularization. Using ℓ1 regularization results in a convex problem but requires O(n2) storage, and is typically solved by the alternating direction method of multipliers which takes O(n3) flops. The ℓ0 model is non-convex but only needs memory linear in n, and is solved via orthogonal matching pursuit and cannot handle the case of affine subspaces. This paper shows that a proximal gradient framework can solve SSC, covering both ℓ1 and ℓ0 models, and both linear and affine constraints. For both ℓ1 and ℓ0, algorithms to compute the proximity operator in the presence of affine constraints have not been presented in the SSC literature, so we derive an exact and efficient algorithm that solves the ℓ1 case with just O(n2) flops. In the ℓ0 case, our algorithm retains the low-memory overhead, and is the first algorithm to solve the SSC-ℓ0 model with affine constraints. Experiments show our algorithms do not rely on sensitive regularization parameters, and they are less sensitive to sparsity misspecification and high noise.[J24] 2020 Optimization and Learning with Information Streams: Time-varying Algorithms and Applications
Abstract: There is a growing cross-disciplinary effort in the broad domain of optimization and learning with streams of data, applied to settings where traditional batch optimization techniques cannot produce solutions at time scales that match the inter-arrival times of the data points due to computational and/or communication bottlenecks. Special types of online algorithms can handle this situation, and this article focuses on such time-varying optimization algorithms, with emphasis on Machine Leaning and Signal Processing, as well as data-driven Control. Approaches for the design of time-varying or online first-order optimization methods are discussed, with emphasis on algorithms that can handle errors in the gradient, as may arise when the gradient is estimated. Insights on performance metrics and accompanying claims are provided, along with evidence of cases where algorithms that are provably convergent in batch optimization may perform poorly in an online regime. The role of distributed computation is discussed. Illustrative numerical examples for a number of applications of broad interest are provided to convey key ideas.[J23] 2019 Safe Feature Elimination for Non-Negativity Constrained Convex Optimization
Abstract: Inspired by recent work on safe feature elimination for 1-norm regularized least-squares, we develop strategies to eliminate features from convex optimization problems with non-negativity constraints. Our strategy is safe in the sense that it will only remove features/coordinates from the problem when they are guaranteed to be zero at a solution. To perform feature elimination we use an accurate, but not optimal, primal-dual feasible pair, making our methods robust and able to be used on ill-conditioned problems. We supplement our feature elimination problem with a method to construct an accurate dual feasible point from an accurate primal feasible point; this allows us to use a first-order method to find an accurate primal feasible point, then use that point to construct an accurate dual feasible point and perform feature elimination. Under reasonable conditions, our feature elimination strategy will eventually eliminate all zero features from the problem. As an application of our methods we show how safe feature elimination can be used to robustly certify the uniqueness of non-negative least-squares (NNLS) problems. We give numerical examples on a well-conditioned synthetic NNLS problem and a on set of 40000 extremely ill-conditioned NNLS problems arising in a microscopy application.[J22] 2019 Improved Fixed-Rank Nystrom Approximation via QR Decomposition: Practical and Theoretical Aspects
Abstract: The Nystrom method is a popular technique that uses a small number of landmark points to compute a fixed-rank approximation of large kernel matrices that arise in machine learning problems. In practice, to ensure high quality approximations, the number of landmark points is chosen to be greater than the target rank. However, for simplicity the standard Nystrom method uses a sub-optimal procedure for rank reduction. In this paper, we examine the drawbacks of the standard Nystrom method in terms of poor performance and lack of theoretical guarantees. To address these issues, we present an efficient modification for generating improved fixed-rank Nystrom approximations. Theoretical analysis and numerical experiments are provided to demonstrate the advantages of the modified method over the standard Nystrom method. Overall, the aim of this paper is to convince researchers to use the modified method, as it has nearly identical computational complexity, is easy to code, has greatly improved accuracy in many cases, and is optimal in a sense that we make precise.[J21] 2019 Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models
Abstract: Background Linear mixed-effects models (LMM) are a leading method in conducting genome-wide association studies (GWAS) but require residual maximum likelihood (REML) estimation of variance components, which is computationally demanding. Previous work has reduced the computational burden of variance component estimation by replacing direct matrix operations with iterative and stochastic methods and by employing loose tolerances to limit the number of iterations in the REML optimization procedure. Here, we introduce two novel algorithms, stochastic Lanczos derivative-free REML (SLDF_REML) and Lanczos first-order Monte Carlo REML (L_FOMC_REML), that exploit problem structure via the principle of Krylov subspace shift-invariance to speed computation beyond existing methods. Both novel algorithms only require a single round of computation involving iterative matrix operations, after which their respective objectives can be repeatedly evaluated using vector operations. Further, in contrast to existing stochastic methods, SLDF_REML can exploit precomputed genomic relatedness matrices (GRMs), when available, to further speed computation. Results Results of numerical experiments are congruent with theory and demonstrate that interpreted-language implementations of both algorithms match or exceed existing compiled-language software packages in speed, accuracy, and flexibility. Conclusions Both the SLDF_REML and L_FOMC_REML algorithms outperform existing methods for REML estimation of variance components for LMM and are suitable for incorporation into existing GWAS LMM software implementations.[J20] 2019 On Quasi-Newton Forward-Backward Splitting: Proximal Calculus and Convergence
Abstract: We introduce a framework for quasi-Newton forward--backward splitting algorithms (proximal quasi-Newton methods) with a metric induced by diagonal ± rank-r symmetric positive definite matrices. This special type of metric allows for a highly efficient evaluation of the proximal mapping. The key to this efficiency is a general proximal calculus in the new metric. By using duality, formulas are derived that relate the proximal mapping in a rank-r modified metric to the original metric. We also describe efficient implementations of the proximity calculation for a large class of functions; the implementations exploit the piece-wise linear nature of the dual problem. Then, we apply these results to acceleration of composite convex minimization problems, which leads to elegant quasi-Newton methods for which we prove convergence. The algorithm is tested on several numerical examples and compared to a comprehensive list of alternatives in the literature. Our quasi-Newton splitting algorithm with the prescribed metric compares favorably against state-of-the-art. The algorithm has extensive applications including signal processing, sparse recovery, machine learning and classification to name a few.[J19] 2019 Adapting Regularized Low Rank Models for Parallel Architectures
Abstract: We introduce a reformulation of regularized low-rank recovery models to take advantage of GPU, multiple CPU, and hybridized architectures. Low-rank recovery often involves nuclear-norm minimization through iterative thresholding of singular values. These models are slow to fit and difficult to parallelize because of their dependence on computing a singular value decomposition at each iteration. Regularized low-rank recovery models also incorporate non-smooth terms to separate structured components (e.g. sparse outliers) from the low-rank component, making these problems more difficult. Using Burer-Monteiro splitting and marginalization, we develop a smooth, non-convex formulation of regularized low-rank recovery models that can be fit with first-order solvers. We develop a computable certificate of convergence for this non-convex program, and use it to establish bounds on the suboptimality of any point. Using robust principal component analysis (RPCA) as an example, we include numerical experiments showing that this approach is an order-of-magnitude faster than existing RPCA solvers on the GPU. We also show that this acceleration allows new applications for RPCA, including real-time background subtraction and MR image analysis.[J18] 2018 Template Polyhedra and Bilinear Optimization
Abstract: In this paper, we study the template polyhedral abstract domain using connections to bilinear optimization techniques. The connections between abstract interpretation and convex optimization approaches have been studied for nearly a decade now. Specifically, data flow constraints for numerical domains such as polyhedra can be expressed in terms of bilinear constraints. Algorithms such as policy and strategy iteration have been proposed for the special case of bilinear constraints that arise from template polyhedra wherein the desired invariants conform to a fixed template form. In particular, policy iteration improves upon a known post-fixed point by alternating between solving for an improved post-fixed point against finding certificates that are used to prove the new fixed point. In the first part of this paper, we propose a policy iteration scheme that changes the template on the fly in order to prove a target reachability property of interest. We show how the change to the template naturally fits inside a policy iteration scheme, and thus, propose a scheme that updates the template matrices associated with each program location. We demonstrate that the approach is effective over a set of benchmark instances, wherein, starting from a simple predefined choice of templates, the approach is able to infer appropriate template directions to prove a property of interest. However, it is well known that policy iteration can end up “stuck” in a saddle point from which future iterations cannot make progress. In the second part of this paper, we study this problem further by empirically comparing policy iteration with a variety of other approaches for bilinear programming. These approaches adapt well-known algorithms to the special case of bilinear programs as well as using off-the-shelf tools for nonlinear programming. Our initial experience suggests that policy iteration seems to be the most advantageous approach for problems arising from abstract interpretation, despite the potential problems of getting stuck at a saddle point.[J17] 2018 Achieving superresolution with illumination-enhanced sparsity
Abstract: Recent advances in superresolution fluorescence microscopy have been limited by a belief that surpassing two-fold resolution enhancement of the Rayleigh resolution limit requires stimulated emission or the fluorophore to undergo state transitions. Here we demonstrate a new superresolution method that requires only image acquisitions with a focused illumination spot and computational post-processing. The proposed method utilizes the focused illumination spot to effectively reduce the object size and enhance the object sparsity and consequently increases the resolution and accuracy through nonlinear image post-processing. This method clearly resolves 70nm resolution test objects emitting ~530nm light with a 1.4 numerical aperture (NA) objective, and, when imaging through a 0.5NA objective, exhibits high spatial frequencies comparable to a 1.4NA widefield image, both demonstrating a resolution enhancement above two-fold of the Rayleigh resolution limit. More importantly, we examine how the resolution increases with photon numbers, and show that the more-than-two-fold enhancement is achievable with realistic photon budgets.[J16] 2017 Preconditioned Data Sparsification for Big Data with Applications to PCA and K-means
Abstract: We analyze a compression scheme for large data sets that randomly keeps a small percentage of the components of each data sample. The benefit is that the output is a sparse matrix and therefore subsequent processing, such as PCA or K-means, is significantly faster, especially in a distributed-data setting. Furthermore, the sampling is single-pass and applicable to streaming data. The sampling mechanism is a variant of previous methods proposed in the literature combined with a randomized preconditioning to smooth the data. We provide guarantees for PCA in terms of the covariance matrix, and guarantees for K-means in terms of the error in the center estimators at a given step. We present numerical evidence to show both that our bounds are nearly tight and that our algorithms provide a real benefit when applied to standard test data sets, as well as providing certain benefits over related sampling approaches.[J15] 2016 Efficient Adjoint Computation for Wavelet and Convolution Operators (Lecture Notes)
Abstract: First-order optimization algorithms, often preferred for large problems, require the gradient of the differentiable terms in the objective function. These gradients often involve linear operators and their adjoints, which must be applied rapidly. We consider two example problems and derive methods for quickly evaluating the required adjoint operator. The first example is an image deblurring problem, where we must compute efficiently the adjoint of multi-stage wavelet reconstruction. Our formulation of the adjoint works for a variety of boundary conditions, which allows the formulation to generalize to a larger class of problems. The second example is a blind channel estimation problem taken from the optimization literature where we must compute the adjoint of the convolution of two signals. In each example, we show how the adjoint operator can be applied efficiently while leveraging existing software.[J14] 2015 Designing Statistical Estimators That Balance Sample Size, Risk, and Computational Cost
Abstract: This paper proposes a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization. When we have a large amount of data, we can exploit excess samples to decrease statistical risk, to decrease computational cost, or to trade off between the two. We propose to achieve this tradeoff by varying the amount of smoothing applied to the optimization problem. This work uses regularized linear regression as a case study to argue for the existence of this tradeoff both theoretically and experimentally. We also apply our method to describe a tradeoff in an image interpolation problem.[J13] 2014 Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics
Abstract: This article reviews recent advances in convex optimization algorithms for Big Data, which aim to reduce the computational, storage, and communications bottlenecks. We provide an overview of this emerging field, describe contemporary approximation techniques like first-order methods and randomization for scalability, and survey the important role of parallel and distributed computation. The new Big Data algorithms are based on surprisingly simple principles and attain staggering accelerations even on classical problems.[J12] 2014 An Algorithm for Splitting Parallel Sums of Linearly Composed Monotone Operators, with Applications to Signal Recovery
Abstract: We present a new primal-dual splitting algorithm for structured monotone inclusions in Hilbert spaces and analyze its asymptotic behavior. A novelty of our framework, which is motivated by image recovery applications, is to consider inclusions that combine a variety of monotonicity-preserving operations such as sums, linear compositions, parallel sums, and a new notion of parallel composition. The special case of minimization problems is studied in detail, and applications to signal recovery are discussed. Numerical simulations are provided to illustrate the implementation of the algorithm.[J11] 2013 Improving IMRT delivery efficiency with iteratively reweighted L1-minimization for inverse planning
Abstract:[J10] 2012 A Non-Uniform Sampler for Wideband Spectrally-Sparse Environments
Abstract: We present a wide bandwidth, compressed sensing based non-uniform sampling (NUS) system with a custom sample-=and-hold chip designed to take advantage of a low average sampling rate. By sampling signals non-uniformly, the average sample rate can be more than a magnitude lower than the Nyquist rate, provided that these signals have a relatively low information content as measured by the sparsity of their spectrum. The hardware design combines a wideband Indium-Phosphide (InP) heterojunction bipolar transistor (HBT) sample-and-hold with a commercial off-the-shelf (COTS) analog-to-digital converter (ADC) to digitize an 800 MHz to 2 GHz band (having 100 MHz of non-contiguous spectral content) at an average sample rate of 236 Msps. Signal reconstruction is performed via a non-linear compressed sensing algorithm, and the challenges of developing an efficient GPU implementation are discussed. Measured biterror-rate (BER) data for a GSM channel is presented, and comparisons to a conventional wideband 4.4 Gsps ADC are made.[J9] 2012 A Compressed Sensing Parameter Extraction Platform for Radar Pulse Signal Acquisition
Abstract: In this paper we present a complete (hardware/ software) sub-Nyquist rate (× 13) wideband signal acquisition chain capable of acquiring radar pulse parameters in an instantaneous bandwidth spanning 100 MHz-2.5 GHz with the equivalent of 8 effective number of bits (ENOB) digitizing performance. The approach is based on the alternative sensing-paradigm of compressed sensing (CS). The hardware platform features a fully-integrated CS receiver architecture named the random-modulation preintegrator (RMPI) fabricated in Northrop Grumman's 450 nm InP HBT bipolar technology. The software back-end consists of a novel CS parameter recovery algorithm which extracts information about the signal without performing full time-domain signal reconstruction. This approach significantly reduces the computational overhead involved in retrieving desired information which demonstrates an avenue toward employing CS techniques in power-constrained real-time applications. The developed techniques are validated on CS samples physically measured by the fabricated RMPI and measurement results are presented. The parameter estimation algorithms are described in detail and a complete description of the physical hardware is given.[J8] 2011 Dynamical Behavior Near a Liquid-Liquid Phase Transition in Simulations of Supercooled Water
Abstract: We examine the behavior of the diffusion coefficient of the ST2 model of water over a broad region of the phase diagram via molecular dynamics simulations. The ST2 model has an accessible liquid–liquid transition between low-density and high-density phases, making the model an ideal candidate to explore the impacts of the liquid–liquid transition on dynamics. We locate characteristic dynamical loci in the phase diagram and compare them with the previously investigated thermodynamic loci. The low-density liquid phase shows a crossover from non-Arrhenius to Arrhenius behavior, signaling the onset of a crossover from fragile-to-strong behavior. We explain this crossover in terms of the asymptotic approach of the low-density liquid to a random tetrahedral network and show that the temperature dependence of the diffusion coefficient over a wide temperature range can be simply related to the concentration of defects in the network. Our findings thus confirm that the low-density phase of ST2 water is a well-defined metastable liquid.[J7] 2011 Templates for Convex Cone Problems with Applications to Sparse Signal Recovery
Abstract: This paper develops a general framework for solving a variety of convex cone problems that frequently arise in signal processing, machine learning, statistics, and other fields. The approach works as follows: first, determine a conic formulation of the problem; second, determine its dual; third, apply smoothing; and fourth, solve using an optimal first-order method. A merit of this approach is its flexibility: for example, all compressed sensing problems can be solved via this approach. These include models with objective functionals such as the total-variation norm, ||Wx||1 where W is arbitrary, or a combination thereof. In addition, the paper introduces a number of technical contributions such as a novel continuation scheme and a novel approach for controlling the step size, and applies results showing that the smooth and unsmoothed problems are sometimes formally equivalent. Combined with our framework, these lead to novel, stable and computationally efficient algorithms. For instance, our general implementation is competitive with state-of-the-art methods for solving intensively studied problems such as the LASSO. Further, numerical experiments show that one can solve the Dantzig selector problem, for which no efficient large-scale solvers exist, in a few hundred iterations. Finally, the paper is accompanied with a software release. This software is not a single, monolithic solver; rather, it is a suite of programs and routines designed to serve as building blocks for constructing complete algorithms.[J6] 2011 NESTA: a fast and accurate first-order method for sparse recovery
Abstract: Accurate signal recovery or image reconstruction from indirect and possibly undersampled data is a topic of considerable interest; for example, the literature in the recent field of compressed sensing is already quite immense. Inspired by recent breakthroughs in the development of novel first-order methods in convex optimization, most notably Nesterov's smoothing technique, this paper introduces a fast and accurate algorithm for solving common recovery problems in signal processing. In the spirit of Nesterov's work, one of the key ideas of this algorithm is a subtle averaging of sequences of iterates, which has been shown to improve the convergence properties of standard gradient-descent algorithms. This paper demonstrates that this approach is ideally suited for solving large-scale compressed sensing reconstruction problems as 1) it is computationally efficient, 2) it is accurate and returns solutions with several correct digits, 3) it is flexible and amenable to many kinds of reconstruction problems, and 4) it is robust in the sense that its excellent performance across a wide range of problems does not depend on the fine tuning of several parameters. Comprehensive numerical experiments on realistic signals exhibiting a large dynamic range show that this algorithm compares favorably with recently proposed state-of-the-art methods. We also apply the algorithm to solve other problems for which there are fewer alternatives, such as total-variation minimization, and convex programs seeking to minimize the l1 norm of Wx under constraints, in which W is not diagonal.[J5] 2010 Quantum state tomography via compressed sensing
Abstract: We establish methods for quantum state tomography based on compressed sensing. These methods are specialized for quantum states that are fairly pure, and they offer a significant performance improvement on large quantum systems. In particular, they are able to reconstruct an unknown density matrix of dimension d and rank r using O(rd log^2 d) measurement settings, compared to standard methods that require d^2 settings. Our methods have several features that make them amenable to experimental implementation: they require only simple Pauli measurements, use fast convex optimization, are stable against noise, and can be applied to states that are only approximately low-rank. The acquired data can be used to certify that the state is indeed close to pure, so no a priori assumptions are needed. We present both theoretical bounds and numerical simulations.[J4] 2007 Relation between the Widom line and the breakdown of the Stokes-Einstein relation in supercooled water]
Abstract: Supercooled water exhibits a breakdown of the Stokes–Einstein relation between the diffusion constant D and the alpha relaxation time τα. For water simulated with two different potentials, TIP5P and ST2, we find that the temperature of the decoupling of diffusion and alpha relaxation correlates with the temperature of the maximum in specific heat that corresponds to crossing the Widom line TW(P). Specifically, we find that our results for Dτα/T collapse onto a single “master curve” if temperature is replaced by T − TW(P). We further find that the size of the mobile molecule clusters (dynamical heterogeneities) increases sharply near TW(P). Moreover, our calculations of mobile particle cluster size <n(t*)>w for different pressures, where t* is the time for which the mobile particle cluster size is largest, also collapse onto a single master curve if T is replaced by T − TW(P). The crossover to a more locally structured low density liquid (LDL) as T → TW(P) appears to be well correlated both with the breakdown of the Stokes–Einstein relation and with the growth of dynamic heterogeneities. Our results are consistent with the possibility that the breakdown of the SE relation in supercooled water at low pressures is associated with the hypothesized liquid–liquid phase transition.[J3] 2006 Fractional Stokes-Einstein and Debye-Stokes-Einstein Relations in a Network-Forming Liquid
Abstract: We study the breakdown of the Stokes-Einstein (SE) and Debye-Stokes-Einstein (DSE) relations for translational and rotational motion in a prototypical model of a network-forming liquid, the ST2 model of water. We find that the emergence of fractional SE and DSE relations at low temperature is ubiquitous in this system, with exponents that vary little over a range of distinct physical regimes. We also show that the same fractional SE relation is obeyed by both mobile and immobile dynamical heterogeneities of the liquid.[J2] 2005 The Dynamics of Falling Dominoes
Abstract:Conference papers
- Workshop on Responsibly Building the Next Generation of Multimodal Foundational Models (NeurIPS workship RBFM). openreview PDF
- Model Reduction and Surrogate Modeling (MORe2024).
- Theory of Quantum Computation, Communication and Cryptography (TQC2024), Sept 2024, Okinawa / hybrid.
- 2024 IEEE Conference on Computational Imaging Using Synthetic Apertures (CISA). [DOI] related poster from 2024 CCTSI Research Innovations in Health AI
- 2024 IEEE Conference on Computational Imaging Using Synthetic Apertures (CISA). [DOI]
- QIP (Quantum Information Processing), as poster, Ghent, February 6--10.
- Conference on Lasers and Electro-Optics (CLEO).
- ICML. [arXiv]
- In Asilomar conference on Signals, Systems and Computers. [DOI]
- SPIE BiOS, San Francisco, CA. [DOI]
- Machine Learning workshop, at the 2019 IEEE International Conference on Big Data (BigData 2019), Dec. 9-12, Los Angeles.. [DOI] [arXiv] [code]
- 2nd IEEE Data Science Workshop, June 2019 Minneapolis. [DOI] [arXiv]
- in INNS Big Data and Deep Learning (Sestri Levante, Genova, Italy 16-18 April 2019). [DOI] [arXiv] Published version appears in Recent Advances in Big Data and Deep Learning
- Conference on Artificial Intelligence (AAAI 2018) San Francisco, Feb 2018. [DOI] A preliminary, shorter version appears as Randomized Clustered Nystrom for Large-Scale Kernel Machines
- SIAM Workshop on Dimension Reduction, Pittsburg, July 9-10. link or CU Scholar website
- IEEE GlobalSIP (Global Conference on Signal and Information Processing. [arXiv] Also available via CU Scholar website
- as extended abstract at EAGE (Madrid, Spain, June). [arXiv]
- NIPS, Montreal. link
- NIPS. link
- accepted in ICIP Melbourne.
- ICML, Atlanta, (the v3 on arXiv is our expanded version of our NIPS conference version). [arXiv] For fun, see the ICML 2013 review boards. There is an earlier version of this paper on the JMLR website.
- NIPS (Lake Tahoe) (recieved a "spotlight presentation" as one of the top 5% of submissions). [arXiv] official link
[C37] 2025 Aligning to What? Limits to RLHF Based Alignment
Abstract: Reinforcement Learning from Human Feedback (RLHF) is increasingly used to align large language models (LLMs) with human preferences. However, the effectiveness of RLHF in addressing underlying biases remains unclear. This study investigates the relationship between RLHF and both covert and overt biases in LLMs, particularly focusing on biases against African Americans. We applied various RLHF techniques (DPO, ORPO, and RLOO) to Llama 3 8B and evaluated the covert and overt biases of the resulting models using matched-guise probing and explicit bias testing. We performed additional tests with DPO on different base models and datasets; among several implications, we found that SFT before RLHF calcifies model biases. Additionally, we extend the tools for measuring covert-biases to multi-modal models. Through our experiments we collect evidence that indicates that current alignment techniques are inadequate for nebulous tasks such as mitigating covert biases, highlighting the need for capable datasets, data curating techniques, or alignment tools.[C36] 2024 Aligning to What? Limits to RLHF Based Alignment
Abstract: Reinforcement Learning from Human Feedback (RLHF) is increasingly used to align large language models (LLMs) with human preferences. However, the effectiveness of RLHF in addressing underlying biases remains unclear. This study investigates the relationship between RLHF and both covert and overt biases in LLMs, particularly focusing on biases against African Americans. We applied various RLHF techniques (DPO, ORPO, RLOO) to Llama 3 8B and evaluated the resulting models using matched-guise probing and explicit bias testing.Our findings suggest that RLHF may not effectively align LLMs as intended. In most cases, RLHF either worsened both covert and overt biases or left them relatively unchanged compared to the base model. These results indicate that current RLHF techniques fail to address underlying biases introduced during pretraining, particularly for ambiguous objectives like harmlessness. Our study highlights the need for the development of improved techniques to ensure genuine alignment of LLMs with abstract alignment goals.[C35] 2024 (POSTER) Surrogate-based Line Search for Stochastic Subspace Descent
Abstract: Gradient-free optimization (GFO) is the optimization approach when the gradient of the objective function is unavailable, commonly happen in black-box optimizations. Traditional GFO methods that estimate gradients using finite differences face significant challenges in high-dimensional spaces, as the number of required function evaluations grows with dimensionality. To address these challenges, Kozak et al. introduced the Stochastic Subspace Descent (SSD) method, which enhances efficiency by estimating gradients within a randomly selected subspace. This approach has demonstrated substantial effectiveness, particularly in cases where the objective function exhibits weak dependence on most of its inputs—a situation frequently encountered in practice. However, the optimization performance of SSD, particularly regarding step size selection, remains suboptimal due to unknown variables such as the Lipschitz constant of the objective function gradient and the projected energy of the effective objective function onto the subspace. To overcome these limitations, recent developments have utilized reduced-order or surrogate models, such as polynomial chaos and deep learning-based surrogates, which offer cost-effective approximations of the original expensive models while maintaining a consistent performance. Building on these advancements, this work proposes a novel method that leverages reduced-order surrogate models to refine the step size selection process through a surrogate-adjusted line search method. This approach aims to harmonize the efficiency of surrogate models with the robustness of SSD, thereby enhancing the overall optimization process. This paper makes several significant contributions to the field of gradient-free optimization; 1. We introduce a novel surrogate-adjusted line search algorithm that utilizes reduced-order surrogate models to approximate the optimal step size, effectively addressing the prevalent challenge of step size tuning in gradient-free optimization. 2. We provide a comprehensive theoretical analysis of the surrogate-adjusted line search algorithm, which includes a rigorous convergence analysis and a detailed discussion on the trade-offs between the number of iterations and the accuracy of surrogate construction required for optimal step size adjustment. 3. We validate the effectiveness of our proposed method through extensive numerical experiments across various applications. The results confirm the advantages of the proposed algorithm, showcasing its potential to significantly improve the performance of gradient-free optimization methods.[C34] 2024 (POSTER) Experiment Design for Minimax Fidelity Estimation
Abstract:[C33] 2024 Deep Learning with Enforced Data Consistency
Abstract: Current deep learning methods for MRI image reconstruction handle data consistency in one of two ways, (1) add a penalty term into the loss function, or (2) add a data consis-tency layer into the network architecture. Adding a penalty to the loss function does not enforce data consistency; data inconsistent results are still often generated, which can result in erroneous reconstructions. The data consistency layer improves reconstruction quality on out-of-distribution data and largely eliminates hallucinations. However, it is too compu-tationally expensive for training with parallel imaging data. In this manuscript we explore a computationally efficient approximation to hard data consistency. We present results when adding this data consistency layer into two existing networks designed for MRI reconstruction. After retraining with the additional consistency layer, the networks show improved out-of-distribution performance and suppression of hallucinations.[C32] 2024 Compressed Sensing with Homodyne Detection
Abstract: Compressed sensing and partial Fourier sampling are two methods for accelerating MRI scans. Compressed sensing relies on the sparsity of the image in a separate domain to reduce the sampling burden and partial Fourier sampling relies on an inherent Hermitian symmetry for the same purpose. Partial Fourier with homodyne detection explicitly takes advantage of the fact that the phase of the image is low-bandwidth. Conventional combinations of these methods do not use homodyne detection; instead, the image is reconstructed by solving the same optimization algorithm as that of compressed sensing; the only difference is that a significant portion (approximately 3/8) of the samples are eliminated. In this manuscript, we reconstruct the image by solving a convex optimization problem that explicitly incorporates the low-bandwidth phase estimate of homodyne detection. We show that this method improves image quality for a given sampling burden.[C31] 2023 (POSTER) Versatile fidelity estimation with confidence
Abstract:[C30] 2022 Broadband Spectroscopic Imaging Using Dual Frequency Comb Spectroscopy and Compressive Sensing
Abstract:[C29] 2022 (POSTER) Empirical Results Suggest Quasi-Monte Carlo Sampling Increases Accuracy in the Estimation of Annual Energy Production from Operational Data
Abstract: Quasi-Monte Carlo sampling was implemented in the OpenOA annual energy production analysis method. Preliminary results show that this method improves the order of the asymptotic convergence of OpenOA’s AEP estimates.[C28] 2022 A Strategy for Synthetic Aperture Sequence Design Using Numerical Optimization
Abstract: Modifying acoustic transmission represents one way of improving image quality for ultrasound imaging. This has been demonstrated in the transmit encoding model for synthetic aperture imaging [1], which allows for great flexibility in apodization and time delay of each array element. After decoding these recorded echoes, geometric focusing can be applied to produce an image whose quality depends highly on the specific encoding used. Commonly used encoding schemes, such as those which produce focused beams or planewaves, are often motivated from geometric principles. We note that these geometrically intuitive schemes represent only a narrow subset of all possible encoding schemes, and demonstrate that even small modification to transmit parameters can lead to meaningfully improved resolution, field of view, and contrast in images. These modified encodings often cannot be derived through application of fundamental geometric principles. In response to these challenges, we propose a machine learning (ML) framework for sequence design that identifies a set of optimal transmit parameters, in the sense that they minimize some loss metric on beamformed images. Additionally, we describe critical implementation details that make backpropagation in our approach computationally feasible.[C27] 2021 Superresolution photoacoustic tomography using random speckle illumination and second order moments
Abstract: Idier et al. [IEEE Trans. Comput. Imaging 4(1), 2018] propose a method which achieves superresolution in the microscopy setting by leveraging random speckle illumination and knowledge about statistical second order moments for the illumination patterns and model noise. This is achieved without any assumptions on the sparsity of the imaged object. In this paper, we show that their technique can be extended to photoacoustic tomography. We propose a simple algorithm for doing the reconstruction which only requires a small number of linear algebra steps. It is therefore much faster than the iterative method used by Idier et al. We also propose a new representation of the imaged object based on Dirac delta expansion functions.[C26] 2021 A Sampling Based Method for Tensor Ring Decomposition
Abstract: We propose a sampling-based method for computing the tensor ring (TR) decomposition of a data tensor. The method uses leverage score sampled alternating least squares to fit the TR cores in an iterative fashion. By taking advantage of the special structure of TR tensors, we can efficiently estimate the leverage scores and attain a method which has complexity sublinear in the number of input tensor entries. We provide high-probability relative-error guarantees for the sampled least squares problems. We compare our proposal to existing methods in experiments on both synthetic and real data. Our method achieves substantial speedup -- sometimes two or three orders of magnitude -- over competing methods, while maintaining good accuracy. We also provide an example of how our method can be used for rapid feature extraction.[C25] 2020 Nuclear Norm Based Spectrum Estimation for Molecular Dynamic Simulations
Abstract: Molecular dynamic (MD) simulations are used to probe molecular systems in regimes not accessible to physical experiments. A common goal of these simulations is to compute the power spectral density (PSD) of some component of the system such as particle velocity. In certain MD simulations, only a few time locations are observed, which makes it difficult to estimate the autocorrelation and PSD. This work develops a novel nuclear norm minimization-based method for this type of sub-sampled data, based on a parametric representation of the PSD as the sum of Lorentzians. We show results on both synthetic data and a test system of methanol.[C24] 2021 Stochastic Gradient Langevin Dynamics with Variance Reduction
Abstract: Stochastic gradient Langevin dynamics (SGLD) has gained the attention of optimization researchers due to its global optimization properties. This paper proves an improved convergence property to local minimizers of nonconvex objective functions using SGLD accelerated by variance reductions. Moreover, we prove an ergodicity property of the SGLD scheme, which gives insights on its potential to find global minimizers of nonconvex objectives.[C23] 2020 Computational super-resolution microscopy: leveraging noise model, regularization and sparsity to achieve highest resolution
Abstract: We report progress in algorithm development for a computation-based super-resolution microscopy technique. Building upon previous results, we examine our recently implemented microscope system and construct alter- native processing algorithms. Based on numerical simulations results, we evaluate the performance of each algorithm and determine the one most suitable for our super-resolution microscope[C22] 2019 One-Pass Sparsified Gaussian Mixtures
Abstract: We present a one-pass sparsified Gaussian mixture model (SGMM). Given N data points in P dimensions, X, the model fits K Gaussian distributions to X and (softly) classifies each point to these clusters. After paying an up-front cost of O(NPlogP) to precondition the data, we subsample Q entries of each data point and discard the full P-dimensional data. SGMM operates in O(KNQ) time per iteration for diagonal or spherical covariances, independent of P, while estimating the model parameters in the full P-dimensional space, making it one-pass and hence suitable for streaming data. We derive the maximum likelihood estimators for the parameters in the sparsified regime, demonstrate clustering on synthetic and real data, and show that SGMM is faster than GMM while preserving accuracy.[C21] 2019 Online Sparse Subspace Clustering
Abstract: This paper focuses on the sparse subspace clustering problem, and develops an online algorithmic solution to cluster data points on-the-fly, without revisiting the whole dataset. The strategy involves an online solution of a sparse representation (SR) problem to build a (sparse) dictionary of similarities where points in the same subspace are considered "similar," followed by a spectral clustering based on the obtained similarity matrix. When the SR cost is strongly convex, the online solution converges to within a neighborhood of the optimal time-varying batch solution. A dynamic regret analysis is performed when the SR cost is not strongly convex.[C20] 2019 Perturbed Proximal Descent to Escape Saddle Points for Non-convex and Non-smooth Objective Functions
Abstract: We consider the problem of finding local minimizers in non-convex and non-smooth optimization. Under the assumption of strict saddle points, positive results have been derived for first-order methods. We present the first known results for the non-smooth case, which requires different analysis and a different algorithm.[C19] 2018 Low-rank Tucker decomposition of large tensors using TensorSketch
Abstract: We propose two randomized algorithms for low-rank Tucker decomposition of tensors. The algorithms, which incorporate sketching, only require a single pass of the input tensor and can handle tensors whose elements are streamed in any order. To the best of our knowledge, ours are the only algorithms which can do this. We test our algorithms on sparse synthetic data and compare them to multiple other methods. We also apply one of our algorithms to a real dense 38 GB tensor representing a video and use the resulting decomposition to correctly classify frames containing disturbances.[C18] 2018 Randomized Clustered Nystrom for Large-Scale Kernel Machines
Abstract: The Nystrom method is a popular technique for generating low-rank approximations of kernel matrices that arise in many machine learning problems. The approximation quality of the Nystrom method depends crucially on the number of selected landmark points and the selection procedure. In this paper, we introduce a randomized algorithm for generating landmark points that is scalable to large high-dimensional data sets. The proposed method performs K-means clustering on low-dimensional random projections of a data set and thus leads to significant savings for high-dimensional data sets. Our theoretical results characterize the tradeoffs between accuracy and efficiency of the proposed method. Moreover, numerical experiments on classification and regression tasks demonstrate the superior performance and efficiency of our proposed method compared with existing approaches.[C17] 2017 Estimating Active Subspaces with Randomized Gradient Sampling
Abstract: In this work, we present an ecient method for estimating active subspaces using only random observations of gradient vectors. Our method is based on the bi-linear representation of low-rank gradient matrices with a novel initialization step for alternating minimization.[C16] 2017 Robust Partially-Compressed Least-Squares
Abstract: Randomized matrix compression techniques, such as the Johnson-Lindenstrauss transform, have emerged as an effective and practical way for solving large-scale problems efficiently. With a focus on computational efficiency, however, forsaking solutions quality and accuracy becomes the trade-off. In this paper, we investigate compressed least-squares problems and propose new models and algorithms that address the issue of error and noise introduced by compression. While maintaining computational efficiency, our models provide robust solutions that are more accurate--relative to solutions of uncompressed least-squares--than those of classical compressed variants. We introduce tools from robust optimization together with a form of partial compression to improve the error-time trade-offs of compressed least-squares solvers. We develop an efficient solution algorithm for our Robust Partially-Compressed (RPC) model based on a reduction to a one-dimensional search. We also derive the first approximation error bounds for Partially-Compressed least-squares solutions. Empirical results comparing numerous alternatives suggest that robust and partially compressed solutions are effectively insulated against aggressive randomized transforms.[C15] 2016 A Randomized Approach to Efficient Kernel Clustering
Abstract: Kernel-based K-means clustering has gained popularity due to its simplicity and the power of its implicit non-linear representation of the data. A dominant concern is the memory requirement since memory scales as the square of the number of data points. We provide a new analysis of a class of approximate kernel methods that have more modest memory requirements, and propose a specific one-pass randomized kernel approximation followed by standard K-means on the transformed data. The analysis and experiments suggest the method is accurate, while requiring drastically less memory than standard kernel K-means and significantly less memory than Nystrom based approximations.[C14] 2015 Efficient Dictionary Learning via Very Sparse Random Projections
Abstract: Performing signal processing tasks on compressive measurements of data has received great attention in recent years. In this paper, we extend previous work on compressive dictionary learning by showing that more general random projections may be used, including sparse ones. More precisely, we examine compressive K-means clustering as a special case of compressive dictionary learning and give theoretical guarantees for its performance for a very general class of random projections. We then propose a memory and computation efficient dictionary learning algorithm, specifically designed for analyzing large volumes of high-dimensional data, which learns the dictionary from very sparse random projections. Experimental results demonstrate that our approach allows for reduction of computational complexity and memory/data access, with controllable loss in accuracy.[C13] 2015 General Optimization Framework for Robust and Regularized 3D FWI
Abstract: Scarcity of hydrocarbon resources and high exploration risks motivate the development of high fidelity algorithms and computationally viable approaches to exploratory geophysics. Whereas early approaches considered least-squares minimization, recent developments have emphasized the importance of robust formulations, as well as formulations that allow disciplined encoding of prior information into the inverse problem formulation. The cost of a more flexible optimization framework is a greater computational complexity, as least-squares optimization can be performed using straightforward methods (e.g., steepest descent, Gauss-Newton, L-BFGS), whilst incorporation of robust (non-smooth) penalties requires custom changes that may be difficult to implement in the context of a general seismic inversion workflow. In this study, we propose a generic, flexible optimization framework capable of incorporating a broad range of noise models, forward models, regularizers, and reparametrization transforms. This framework covers seamlessly robust noise models (such as Huber and Student's t), as well as sparse regularizers, projected constraints, and Total Variation regularization. The proposed framework is also expandable --- we explain the adjustments that are required for any new formulation to be included. Lastly, we conclude with few numerical examples demonstrating the versatility of the formulation.[C12] 2014 Time--Data Tradeoffs by Aggressive Smoothing
Abstract: This paper proposes a tradeoff between sample complexity and computation time that applies to statistical estimators based on convex optimization. As the amount of data increases, we can smooth optimization problems more and more aggressively to achieve accurate estimates more quickly. This work provides theoretical and experimental evidence of this tradeoff for a class of regularized linear inverse problems.[C11] 2014 QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models
Abstract: In this paper, we develop a family of algorithms for optimizing superposition-structured” or “dirty” statistical estimators for high-dimensional problems involving the minimization of the sum of a smooth loss function with a hybrid regularization. Most of the current approaches are first-order methods, including proximal gradient or Alternating Direction Method of Multipliers (ADMM). We propose a new family of second-order methods where we approximate the loss function using quadratic approximation. The superposition structured regularizer then leads to a subproblem that can be efficiently solved by alternating minimization. We propose a general active subspace selection approach to speed up the solver by utilizing the low-dimensional structure given by the regularizers, and provide convergence guarantees for our algorithm. Empirically, we show that our approach is more than 10 times faster than state-of-the-art first-order approaches for the latent variable graphical model selection problems and multi-task learning problems when there is more than one regularizer. For these problems, our approach appears to be the first algorithm that can extend active subspace ideas to multiple regularizers."[C10] 2014 A variational approach to stable principal component pursuit
Abstract: We introduce a new convex formulation for stable principal component pursuit (SPCP) to decompose noisy signals into low-rank and sparse representations. For numerical solutions of our SPCP formulation, we first develop a convex variational framework and then accelerate it with quasi-Newton methods. We show, via synthetic and real data experiments, that our approach offers advantages over the classical SPCP formulations in scalability and practical parameter selection.[C9] 2014 Joint low-rank representation and matrix completion under a singular value thresholding framework
Abstract: Matrix completion is the process of estimating missing entries from a matrix using some prior knowledge. Typically, the prior knowledge is that the matrix is low-rank. In this paper, we present an extension of standard matrix completion that leverages prior knowledge that the matrix is low-rank and that the data samples can be efficiently represented by a fixed known dictionary. Specifically, we compute a low-rank representation of a data matrix with respect to a given dictionary using only a few observed entries. A novel modified version of the singular value thresholding (SVT) algorithm named joint low-rank representation and matrix completion SVT (J-SVT) is proposed. Experiments on simulated data show that the proposed J-SVT algorithm provides better reconstruction results compared to standard matrix completion[C8] 2014 Metric learning with rank and sparsity constraints
Abstract: Choosing a distance preserving measure or metric is fundamental to many signal processing algorithms, such as k-means, nearest neighbor searches, hashing, and compressive sensing. In virtually all these applications, the efficiency of the signal processing algorithm depends on how fast we can evaluate the learned metric. Moreover, storing the chosen metric can create space bottlenecks in high dimensional signal processing problems. As a result, we consider data dependent metric learning with rank as well as sparsity constraints. We propose a new non-convex algorithm and empirically demonstrate its performance on various datasets; a side benefit is that it is also much faster than existing approaches. The added sparsity constraints significantly improve the speed of multiplying with the learned metrics without sacrificing their quality.[C7] 2013 A proximal splitting method for inf-convolutive variational models in image recovery
Abstract:[C6] 2013 Scalable and accurate quantum tomography from fewer measurements
Abstract: —Quantum tomography is a matrix recovery problem that is not amenable to nuclear-norm minimization. We review existing recovery strategies and propose new ones, with an emphasis on methods with provable guarantees. An outcome of this is a new method with guarantees that is more accurate and more scalable than previous methods.[C5] 2013 Randomized Singular Value Projection
Abstract: Affine rank minimization algorithms typically rely on calculating the gradient of a data error followed by a singular value decomposition at every iteration. Because these two steps are expensive, heuristic approximations are often used to reduce computational burden. To this end, we propose a recovery scheme that merges the two steps with randomized approximations, and as a result, operates on space proportional to the degrees of freedom in the problem. We theoretically establish the estimation guarantees of the algorithm as a function of approximation tolerance. While the theoretical approximation requirements are overly pessimistic, we demonstrate that in practice the algorithm performs well on the quantum tomography recovery problem.[C4] 2013 Sparse projections onto the simplex
Abstract: Most learning methods with rank or sparsity constraints use convex relaxations, which lead to optimization with the nuclear norm or the l1-norm. However, several important learning applications cannot benefit from this approach as they feature these convex norms as constraints in addition to the non-convex rank and sparsity constraints. In this setting, we derive efficient sparse projections onto the simplex and its extension, and illustrate how to use them to solve high-dimensional learning problems in quantum tomography, sparse density estimation and portfolio selection with non-convex constraints.[C3] 2012 A quasi-Newton proximal splitting method
Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification.[C2] 2012 A 100MHz-2GHz 12.5x sub-Nyquist Rate Receiver in 90nm CMOS
Abstract: A fully-integrated, high-speed, wideband receiver called the random modulation pre-integrator is realized in IBM 90nm digital CMOS. It achieves an effective instantaneous bandwidth of 2GHz, with >54dB dynamic range. Most notably, the aggregate digitization rate is fs =320MSPS, 12.5x below the Nyquist rate. Signal recovery can be accomplished for any signal with a concise representation. The system is validated using radarpulses and tones as the input and recovering the time-domain waveforms.[C1] 2012 Design and implementation of a fully integrated compressed-sensing signal acquisition system
Abstract: Compressed sensing (CS) is a topic of tremendous interest because it provides theoretical guarantees and computationally tractable algorithms to fully recover signals sampled at a rate close to its information content. This paper presents the design of the first physically realized fully-integrated CS based Analog-to-Information (A2I) pre-processor known as the Random-Modulation Pre-Integrator (RMPI) [1]. The RMPI achieves 2GHz bandwidth while digitizing samples at a rate 12.5x lower than the Nyquist rate. The success of this implementation is due to a coherent theory/algorithm/hardware co-design approach. This paper addresses key aspects of the design, presents simulation and hardware measurements, and discusses limiting factors in performance.Book chapters
- Dual Smoothing Techniques for Variational Matrix Decomposition, S. Becker and A. Aravkin, in “Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing”, T. Bouwmans, N. Aybat, E. Zahzah, eds. CRC Press, 2016. arXiv
Selected Talks
Here I list a few talks that have video or slides or websites
- “Research in Applied Math at CU”, Math Circles talk (K-12), Sept 15 2024. Slides.
- “Randomization methods for big-data”, at CU CS Dept colloquium Oct 26 2023, and Purdue Math dept Applied Math seminar Oct 23 2023.  Slides.
    - For related sketching code:
- Random sketching operators in Matlab and Python, as part of the class I teach. This has Gaussian sketch, Fast Johnson Lindenstrauss (with FFT or a fast Hadamard), and count sketch. The Hadamard and count sketch have C functions to make them extra fast.
- For the tensor product sketches, which is more complicated, see Osman’s github page
 
- “High-Probability convergence and algorithmic stability for stochastic gradient descent”, ML-MTP: Machine Learning in Montpellier: Theory and Practice seminar, September 2022. Slides. Similar versions given in Arcachon (Curves and Surfaces, June 2022) and Nice (October 2022)
- “Stochastic Subspace Descent: Stochastic gradient-free optimization, with applications to PDE-constrained optimization” Slides (2.7 MB pdf), at ICCOPT, Lehigh University July 2022. Similar version given January 2020 in Denver (joint MAA meetings).
- “Optimization for statistical estimators: Applications to quantum fidelity estimation” Slides (3.7 MB pdf), at Conference on the Mathematics of Complex Data, KTH Royal Institute of Technology, Stockholm. June 13-16 2022
- “Matrix Completion and Robust PCA”, University of Colorado Boulder, Computer Science department colloquium, Boulder, CO. Nov 20 2014. video
- Advances in first-order methods: constraints, non-smoothness and faster convergence, Minisymposium, SIAM Imaging Science, Philadelphia. Slides, May 22 2012
- “TFOCS: Flexible First-order Methods for Rank Minimization”, at the Low-rank Matrix Optimization minisymposium at the 2011 SIAM Conference on Optimization in Darmstadt. Here’s the RPCA video (Matlab code to generate this can be found at the Demo page on the TFOCS website. May 19 2011
- Quick intro to convex optimization talk for Patrick Sanan’s department “ACM^tea”, Oct 23 2009
Technical Reports
- Variational Entropy Search for Adjusting Expected Improvement, Nuojin Cheng, Stephen Becker, 2024. Superseded by A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization, Nuojin Cheng, Leonard Papenmeier, Stephen Becker, Luigi Nardi, 2025.
- Locality-sensitive hashing in function spaces, Will Shand and Stephen Becker, 2020
- Tensor Robust Principal Component Analysis: Better recovery with atomic norm regularization, Derek Driggs, Stephen Becker and Jordan Boyd-Graber, 2019
- URV Factorization with Random Orthogonal System Mixing, Stephen Becker, James Folberth, Laura Grigori, 2017
- The Chen-Teboulle algorithm is the proximal point algorithm, from 2011 but posted 2019, shows the Chen-Teboulle algorithm admits a more aggressive stepsize than via the original analysis.
- Exact linesearch for LASSO discusses exact step-size selection for piecewise quadratic objective functions, with code. 2016.
Theses
- 
    2011 Practical Compressed Sensing: modern data acquisition and signal processing, California Institute of Technology (PhD thesis). Co-winner, Carey prize. Since 2004, the field of compressed sensing has grown quickly and seen tremendous interest because it provides a theoretically sound and computationally tractable method to stably recover signals by sampling at the information rate. This thesis presents in detail the design of one of the world’s first compressed sensing hardware devices, the random modulation pre-integrator (RMPI). The RMPI is an analog-to-digital converter (ADC) that bypasses a current limitation in ADC technology and achieves an unprecedented 8 effective number of bits over a bandwidth of 2.5 GHz. Subtle but important design considerations are discussed, and state-of-the-art reconstruction techniques are presented. Inspired by the need for a fast method to solve reconstruction problems for the RMPI, we develop two efficient large-scale optimization methods, NESTA and TFOCS, that are applicable to a wide range of other problems, such as image denoising and deblurring, MRI reconstruction, and matrix completion (including the famous Netflix problem). While many algorithms solve unconstrained l1 problems, NESTA and TFOCS can solve the constrained form of l1 minimization, and allow weighted norms. In addition to l1 minimization problems such as the LASSO, both NESTA and TFOCS solve total-variation minimization problem. TFOCS also solves the Dantzig selector and most variants of the nuclear norm minimization problem. A common theme in both NESTA and TFOCS is the use of smoothing techniques, which make the problem tractable, and the use of optimal first-order methods that have an accelerated convergence rate yet have the same cost per iteration as gradient descent. The conic dual methodology is introduced in TFOCS and proves to be extremely flexible, covering such generic problems as linear programming, quadratic programming, and semi-definite programming. A novel continuation scheme is presented, and it is shown that the Dantzig selector benefits from an exact-penalty property. Both NESTA and TFOCS are released as software packages available freely for academic use.
- 
    2005 Translational and Rotational Dynamics of Supercooled Water, Wesleyan University (undergraduate thesis). Co-winner, Bertman prize. This thesis presents the results of extensive simulations of the ST2 model of water designed to explore both thermodynamic and dynamic properties in super- cooled states. Constant volume simulations were performed at 1,233 state points, with densities ranging from 0.80 g/cm3 to 1.20 g/cm3 in steps of .01 g/cm3; along these isochores, simulations were run at temperatures from 250 K to 400 K in 5 K intervals. Our results qualitatively reproduce many of the expected properties of supercooled water such as the density anomaly and a novel low-temperature phase transition between two liquid states. The Stokes-Einstein and Debye-Stokes- Einstein equations provide a simple hydrodynamic relation between viscosity and diffusion. These relations hold for simple liquids, but are known to fail near a glass transition. For ST2 water, we find that the Debye-Stokes-Einstein equation does not hold at low temperatures. Furthermore, the data indicate that rotational diffusion is enhanced at low temperatures relative to translational diffusion. We also uncover an unexpected connection between dynamics and thermodynamics; specifically, we find that the structural relaxation time τα along the critical iso-chore provides a precursor to the liquid-liquid phase transition at temperatures up to 150 K above the critical temperature. We observe a nearly identical sig- nature in τα at a higher density which may indicate a second liquid-liquid phase transition, adding credibility to the recent idea that there may be more than one liquid-liquid critical point.
Student theses
Theses of my students (after 2019, CU stopped posting these online, so I’m posting local copies when I haven them):
In the pipeline
- K. Aditi (PhD), 202?
- Nic Rummel (PhD), 202?
- Drona Khurana, 202?
PhD theses
- Multi-fidelity Uncertainty Quantification and Optimization, Nuojin Cheng, 2025, PhD (main advisor: Alireza Doostan, Aerospace). This thesis presents four novel bi-fidelity modeling approaches designed to enhance computational e!ciency and accuracy in uncertainty quantification and optimization. First, Bi-fidelity Boosting (BFB) introduces an e”ective sketching-based subsampling method, accompanied by theoretical analysis of how inter-model correlation impacts performance. Second, the Bi-fidelity Variational Auto-encoder (BF-VAE) leverages deep generative models and transfer learning to achieve high performance with minimal high-fidelity data, also revealing connections between multi-fidelity learning and information bottleneck theory. Third, Langevin Bi-fidelity Importance Sampling (L-BF-IS) develops an efficient score-based Metropolis-Hastings importance sampling estimator for uncertainty quantification, whose e”ectiveness is linked to the discrepancy between model failure probability measures. Finally, a bi-fidelity zero-order optimization framework employs local multi-fidelity surrogates and an Armijo-based line search for optimal step sizes, demonstrating strong empirical performance supported by theoretical convergence guarantees under specific conditions. Collectively, these contributions advance multi-fidelity modeling by providing efficient, theoretically grounded methods for tackling complex computational challenges.
- Computational Methods of Optimization and Geometry Processing for Physics Applications, Jacob Spainhour, 2025, PhD. This thesis presents novel computational methods in two distinct, yet not disjoint subfields of applied mathematics, and is consequently divided into two parts. In Part I, we consider methods of numerical optimization as applied to problems in scientific experimentation, in which a limited quantity of available resources must be carefully allocated to maximize the performance of the experiment. In the first chapter of Part I, this takes the for of discovering optimized ultrasound transmission sequences which, when imaged in the REFoCUS framework, produce high quality images. In the second chapter of Part I, this takes the form of discovering optimized measurement protocols for the verification of quantum states through fidelity estimation. In Part II, we consider methods of computational geometry as applied to problems in multiphysics simulation, in which geometrically complex objects must be computationally handled in a way that is reliable and robust to numerical imperfection. In the first and second chapter of Part II, we develop methods of evaluating the generalized winding number for general curves and surfaces, from which we derive methods of determining containment for geometric objects which otherwise do not define an interior volume. In the third chapter of Part II, we develop a method of 3D material interface reconstruction by means of optimally defining a collection of half-spaces whose convex intersection has geometric moments as close as possible to provided reference data.
- Minimax Optimal Estimation of Expectation Values, Akshay Seshadri, 2025, PhD (co-advised with Emanuel Knill, NIST) 
- Numerical Methods for Non-uniform Data Sources, Kevin Doherty, 2024, PhD. local PDF copy This thesis surveys and creates methods to allow for a mathematically consistent treatment of non-uniform data sources in machine learning and data compression. These methods are fundamentally rooted in numerical methods for quadrature and interpolation, but are leveraged with appropriate computational tools and techniques to adapt to their respective domains and problem types.
- Mathematical Formulations with Uncertain Data in
Optimization and Inverse Problems, Richard Clancy, 2022, PhD This dissertation broadly focuses on incorporating uncertainty into mathematical models and can be broken into three distinct sections. In the first (Ch. 2 and 3), we consider design matrix uncertainty for linear regression. The motivation is that many regressors or covariates used to build linear models, such as survey responses and other data subjective in nature, are intrinsically uncertain. We approach the problem from two angles: 1) using robust optimization and 2) an approximation method to construct an otherwise cumbersome objective for maximum likelihood estimation. Expressions for (sub)gradients are provided allowing for the use of efficient solvers. We illustrate the merit of both methods on synthetic data. In the second section (Ch. 4 and 5), we focus on novel trust region methods that accept uncertainty in objective and gradient evaluations to reduce the computational burden. We introduce an algorithm named TROPHY (Trust Region Optimization using a Precision HierarchY) which uses variable precision arithmetic to reduce communication, storage, and energy costs. The other project uses a Hermite interpolation based framework to build model objectives using function and derivative information. Since not all partial derivatives are available or are expensive to compute, we make use of uncertain gradients to improve performance beyond standard interpolation methods. We also propose the use of sketching to ameliorate issues with redundant data and reduce the burden of inverting very large matrices. The final section (Ch. 6) investigates how operator uncertainty impacts the ability to solve inverse problems accurately. Our goal is to localize regions of brain activity using optically pumped magnetometers which are novel sensors that show promise for use in magnetoencephalography. Through a series of simulations, we establish guidelines for sensor count, noise level, and forward model fidelity needed to localize brain activity to a specified accuracy.
- First-Order Methods for Online and Stochastic Optimization, and Approximate Compiling,
Liam Madden, 2022, PhD (co-advised with Emiliano Dall’Anese, CU ECEE). local PDF copy The oracle complexity model for optimization has been the source of many advances in optimization algorithms with applications to machine learning, artificial intelligence, and highdimensional statistics. For example, many adaptive methods have their beginnings as algorithms with tune-able hyper-parameters appearing in theoretical complexity bounds. This thesis investigates three different problems from the oracle complexity model. The first problem, presented in Chapter 2, considers the online smooth convex optimization problem, motivated by time-varying applications. Two important results are a lower bound on the upper bounds of general online first-order methods and an algorithm with a matching upper bound. The algorithm presented applies Nesterov’s method to a stale cost function until sufficient convergence has been achieved at which point the cost function is updated to the current one. The second problem, presented in Chapter 3, considers the approximate compiling problem of quantum computing. We focus on the CNOT+rotation gate set and consider three different CNOT patterns. Despite the non-convexity of the problem, we find that local minima perform well for random target circuits. We also find that simple CNOT patterns seem to achieve the lower bound on the number of CNOTs required to exactly compile random target circuits. We also use the optimization framework to explore and find new decompositions of particular target circuits such as the Toffoli gate. The third problem, presented in Chapter 4, considers the stochastic smooth non-convex optimization problem, motivated by machine learning applications. Two important results are a Freedman-type concentration inequality that breaks through the sub-exponential threshold to heavier-tailed martingale difference sequences and a high probability convergence bound for stochastic gradient descent with sub-Weibull gradient noise.
- Measuring Image Resolution in Super-Resolution Microscopy and Bayesian Estimation of Population Size and Overlap and Vaccine Effectiveness, Erik Johnson, 2021, PhD (main advisor Dan Larremore, CU CS) This thesis is a compilation of three separate projects. Although the three projects are quite distinct in that they do not directly build on one another, the common thread is statistics and computation applied to medicine and biology. In the first project, we develop a Bayesian statistical model to quantify the size of and overlap between two populations from subsamples from each population. Our model produces joint estimates of the size of each population and the amount of overlap between the populations. We apply our model to compare the genetic repertoires of malaria parasites. In particular, we use our method to construct parasite genetic similarity networks with which we analyze the evolution and transmission of malaria in Ecuador before and after an outbreak in 2012-2013. We also use our method’s inferences to argue that the polymerase chain reaction primers used to study an important malaria parasite gene are biased. In the second project, motivated by the COVID-19 pandemic, we develop a Bayesian statistical model to estimate vaccine effectiveness using data from a common vaccine effectiveness study design called a test-negative design. Since disease positivity is typically assessed using diagnostic tests that have imperfect sensitivity and specificity, our model adjusts for imperfect diagnostic tests along with other potential confounders. In addition, our framework allows data streams from multiple diagnostic test types to be rigorously combined which makes our model flexible for analyses and meta-analyses of test-negative designs. In the third project, we study image resolution metrics that are applicable to modern super-resolution fluorescent microscopy techniques. After showing that the primary existing quantitative image resolution metric called Fourier ring correlation is not ideal, we argue that image resolution in super-resolution microscopy might best be analyzed using information theory. Towards that end, we propose a mutual information-based image resolution metric and we study our metric’s behavior through a few simple examples.
- Topics in Matrix and Tensor Computations, Osman Malik, 2021, PhD. local PDF copy This dissertation looks at matrices and tensors from a computational perspective. An important problem in both matrix and tensor computations is decomposition. Several chapters in this dissertation deal with this problem. Chapter 2 develops randomized algorithms for Tucker tensor decomposition which are faster and can handle larger datasets than competing methods. These algorithms are the first ever one-pass methods for Tucker decomposition to appear in the literature. Chapter 4 develops randomized algorithms with guarantees for both matrix and tensor interpolative decomposition (ID). A key contribution is the first ever theoretical guarantees for any randomized tensor ID algorithm. Chapter 7 develops a randomized algorithm for the tensor ring decomposition. By using leverage score sampling, the resulting iterative algorithm has a per iteration cost which is sublinear in the number of input tensor entries. Chapter 8 considers the binary matrix factorization problem and develops a QUBO formulation for solving it on special purpose hardware like the Fujitsu Digital Annealer and the D-Wave Quantum Annealer. In the tensor decomposition works mention above, randomization—more specifically, matrix sketching—is a key tool used to make algorithms faster and more efficient. In Chapter 3 we dive deeper into this topic and consider one such sketch, the Kronecker fast Johnson–Lindenstrauss transform (KFJLT). We provide a novel proof that the KFJLT indeed is a Johnson–Lindenstrauss transform. Randomization is also used in Chapter 5 to improve the performance of fast matrix multiplication algorithms plagued by either numerical rounding error or error in the computation formula itself. An upshot of this work is a simple method that can help make fast algorithms such as Strassen’s more robust, especially on low-precision hardware like GPUs. Machine learning is a field where tensor methods have received considerable attention. In Chapter 6, we develop a tensor based graph neural network that can be used for a variety of prediction tasks on time varying graphs. We achieve competitive performance in prediction tasks, including in a COVID-19 contact tracing application.
- Randomization in Statistical Machine Learning, Zhishen (Leo) Huang, 2020, PhD. local PDF copy Supervised learning and reinforcement learning problems are often formulated as optimization problems for training. The optimization algorithms themselves bear interest from the mathematical point of view. This thesis discusses the usage of randomization in optimization, which makes possible what corresponding deterministic algorithms are unable to achieve. Applying randomization in algorithms has the capability of reducing the time or space complexity at the expense of potential failure to provide a valid solution. Early prominent examples of randomized algorithms include quicksort, Karger’s algorithm for min-cut problem and the Bloom filter. In this thesis, randomization is introduced into optimization algorithms and is used for data compression. This thesis considers first-order optimization algorithms due to their practicality for large- scale application. The first chapter considers the minimization of nonconvex and nonsmooth objectives, where we give probabilistic guarantees for the proximal gradient descent method to converge to local minima [HB20a]. The second chapter varies the randomization format for gradient descent where Gaussian noise is injected at each iteration step. We point out the ergodicity property of such variation, which is not available for a deterministic version of gradient descent. The third chapter considers using the sketching technique to compress data and evaluate statistics solely based on the sketched dataset [HB20b]. We give theoretical guarantee for the evaluation accuracy of autocorrelation from data sketches and demonstrate numerical performance on molecular dynamic simulation data and synthetic data. The fourth chapter considers a deterministic algorithm for integer-constrained programming, where we suggest a finer convex relaxation where the primal problem is reformulated by Fenchel-Rockafellar duality, and separated into two subproblems based on pre-selected support.
- Iterative stochastic optimization for large-scale machine learning and statistical inverse problems, David Kozak, 2020, PhD (main advisor: Luis Tenorio, Colorado School of Mines, Applied Math & Stat)
- Stokes, Gauss, and Bayes walk into a bar…, Eric Kightley, 2019, PhD This thesis consists of three distinct projects. The first is a study of microbial aggregate fragmentation, in which we develop a dynamical model of aggregate deformation and breakage and use it to obtain a post-fragmentation density function. The second and third projects deal with dimensionality reduction in machine learning problems. In the second project, we derive a one-pass sparsified Gaussian mixture model to perform clustering analysis on high-dimensional streaming data. The model estimates parameters in dense space while storing and performing computations in a compressed space. In the final project, we build an expert system classifier with a Bayesian network for use on high-volume streaming data. Our approach is specialized to reduce the number of observations while obtaining sufficient labeled training data in a regime of extreme class-imbalance and expensive oracle queries.
- Non-Convex Optimization and Applications to Bilinear Programming and Super-Resolution Imaging, Jessica Gronski, 2019, PhD Bilinear programs and Phase Retrieval are two instances of nonconvex problems that arise in engineering and physical applications, and both occur with their fundamental difficulties. In this thesis, we consider various methods and algorithms for tackling these challenging problems and discuss their effectiveness. Bilinear programs (BLPs) are ubiquitous in engineering applications, economics, and operations research, and have a natural encoding to quadratic programs. They appear in the study of Lyapunov functions used to deduce the stability of solutions to differential equations describing dynamical systems. For multivariate dynamical systems, the problem formulation for computing an appropriate Lyapunov function is a BLP. In electric power systems engineering, one of the most practically important and well-researched subfields of constrained nonlinear optimization is Optimal Power Flow wherein one attempts to optimize an electric power system subject to physical constraints imposed by electrical laws and engineering limits, which can be naturally formulated as a quadratic program. We study the relationship between data flow constraints for numerical domains such as polyhedra and bilinear constraints. The problem of recovering an image from its Fourier modulus, or intensity, measurements emerges in many physical and engineering applications. The problem is known as Fourier phase retrieval wherein one attempts to recover the phase information of a signal in order to accurately reconstruct it from estimated intensity measurements by applying the inverse Fourier transform. The problem of recovering phase information from a set of measurements can be formulated as a quadratic program. This problem is well-studied but still presents many challenges. The resolution of an optical device is defined as the smallest distance between two objects such that the two objects can still be recognized as separate entities. Due to the physics of diffraction, and the way that light bends around an obstacle, the resolving power of an optical system is limited. This limit, known as the diffraction limit, was first introduced by Ernst Abbe in 1873. Obtaining the complete phase information would enable one to perfectly reconstruct an image; however, the problem is severely ill-posed and the leads to a specialized type of quadratic program, known as super-resolution imaging, wherein one attempts to learn phase information beyond the limits of diffraction and the limitations imposed by the imaging device.
- Fast and Reliable Methods in Numerical Linear Algebra, Signal Processing, and Image Processing, James Folberth, 2018, PhD In this dissertation we consider numerical methods for a problem in each of numerical linear algebra, digital signal processing, and image processing for super-resolution fluorescence microscopy. We consider first a fast, randomized mixing operation applied to the unpivoted Householder QR factorization. The method is an adaptation of a slower randomized operation that is known to provide a rank-revealing factorization with high probability. We perform a number of experiments to highlight possible uses of our method and give evidence that our algorithm likely also provides a rank-revealing factorization with high probability. In the next chapter we develop fast algorithms for computing the discrete, narrowband cross-ambiguity function (CAF) on a downsampled grid of delay values for the purpose of quickly detecting the location of peaks in the CAF surface. Due to the likelihood of missing a narrow peak on a downsampled grid of delay values, we propose methods to make our algorithms robust against missing peaks. To identify peak locations to high accuracy, we propose a two-step approach: first identify a coarse peak location using one of our delay-decimated CAF algorithms, then compute the CAF on a fine, but very small, grid around the peak to find its precise location. Runtime experiments with our Cplusplus implementations show that our delay-decimated algorithms can give more than an order of magnitude improvement in overall runtime to detect peaks in the CAF surface when compared against standard CAF algorithms. In the final chapter we study non-negative least-squares (NNLS) problems arising from a new technique in super-resolution fluorescence microscopy. The image formation task involves solving many tens of thousands of NNLS problems, each using the same matrix, but different right-hand sides. We take advantage of this special structure by adapting an optimal first-order method to efficiently solve many NNLS problems simultaneously. Our NNLS problems are extremely ill-conditioned, so we also experiment with using a block-diagonal preconditioner and the alternating direction method of multipliers (ADMM) to improve convergence speed. We also develop a safe feature elimination strategy for general NNLS problems. It eliminates features only when they are guaranteed to have weight zero at an optimal point. Our strategy is inspired by recent works in the literature for l1-regularized least-squares, but a notable exception is that we develop our method to use an inexact, but feasible, primal-dual point pair. This allows us to use feature elimination reliably on the extremely ill-conditioned NNLS problems from our microscopy application. For an example image reconstruction, we use our feature elimination strategy to certify that the reconstructed super-resolved image is unique.
- Randomized Algorithms for Large-Scale Data Analysis, Farhad Pourkamali Anaraki, 2017, PhD  Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningful information from these data sets will bring future advances in fields of science and engineering. However, the complexity and high-dimensionality of modern data sets pose unique computational and statistical challenges. The computational requirements of analyzing large-scale data exceed the capacity of traditional data analytic tools. The challenges surrounding large high-dimensional data are felt not just in processing power, but also in memory access, storage requirements, and communication costs. For example, modern data sets are often too large to fit into the main memory of a single workstation and thus data points are processed sequentially without a chance to store the full data. Therefore, there is an urgent need for the development of scalable learning tools and efficient optimization algorithms in today’s high-dimensional data regimes.A powerful approach to tackle these challenges is centered around preprocessing high-dimensional data sets via a dimensionality reduction technique that preserves the underlying geometry and structure of the data. This approach stems from the observation that high-dimensional data sets often have intrinsic dimension which is significantly smaller than the ambient dimension. Therefore, information-preserving dimensionality reduction methods are valuable tools for reducing the memory and computational requirements of data analytic tasks on large-scale data sets.Recently, randomized dimension reduction has received a lot of attention in several fields, including signal processing, machine learning, and numerical linear algebra. These methods use random sampling or random projection to construct low-dimensional representations of the data, known as sketches or compressive measurements. These randomized methods are effective in modern data settings since they provide a non-adaptive data-independent mapping of high-dimensional data into a low-dimensional space. However, such methods require strong theoretical guarantees to ensure that the key properties of original data are preserved under a randomized mapping. This dissertation focuses on the design and analysis of efficient data analytic tasks using randomized dimensionality reduction techniques. Specifically, four efficient signal processing and machine learning algorithms for large high-dimensional data sets are proposed: covariance estimation and principal component analysis, dictionary learning, clustering, and low-rank approximation of positive semidefinite kernel matrices. These techniques are valuable tools to extract important information and patterns from massive data sets. Moreover, an efficient data sparsification framework is introduced that does not require incoherence and distributional assumptions on the data. A main feature of the proposed compression scheme is that it requires only one pass over the data due to the randomized preconditioning transformation, which makes it applicable to streaming and distributed data settings. The main contribution of this dissertation is threefold: (1) strong theoretical guarantees are provided to ensure that the proposed randomized methods preserve the key properties and structure of high-dimensional data; (2) tradeoffs between accuracy and memory/computation savings are characterized for a large class of data sets as well as dimensionality reduction methods, including random linear maps and random sampling; (3) extensive numerical experiments are presented to demonstrate the performance and benefits of our proposed methods compared to prior works.
Masters theses
- Long Short-Term Memory Networks to Improve Aerodynamic Coefficient Estimation for Aerocapture, Dominic Rudakevych, Professional Masters program / Draper Scholar, 2025  Aerocapture is a method for orbital insertion from a hyperbolic trajectory being considered for NASA’s proposed 2030’s flagship mission to Uranus. By traveling through the planet’s atmosphere to generate drag, aerocapture greatly reduces the fuel needed when firing retrograde thrusters, allowing for larger payloads or a less powerful launch vehicle. Despite thes theoretical benefits, aerocapture has never flown on any planetary missions due to thin margin of error and in-situ corrections necessary to properly execute the maneuver. Critical to the guidance and control algorithms are the aerodynamic coefficients. We propose using a neural network to learn the nonlinear relationship between the raw sensor data and these aerodynamic coefficients. Specifically, we explore how network architectures designed for time dependent data, like Long-Short Term Memory (LSTM) neural networks, can produce aerodynamic coefficient estimates akin to that of a computational fluid dynamics (CFD) based lookup table, while providing more robust coefficient estimation when large environmental perturbations ar experienced. Improving force and moment coefficient estimation would improve aerocapture by providing more accurate aerodynamic coefficients for use in guidance and control algorithms. This work considers multiple sensed data sources and aerodynamic coefficient data along with LSTM network architectures for model training to maximize an aerocapture maneuver’s success rate when tested in a Monte Carlo simulation. Along with this, sensitivity analyses were conducted on model hyperparameters to account for relationship complexity. Results were compared against traditional aerodynamic coefficient lookup tables within the Fully Numeric Predictor-corrector Aerocapture Guidance (FNPAG) algorithm to draw conclusions for the model’s performance.
- A simple randomized algorithm for approximating the spectral norm of streaming data, Spencer Shortt, 2023 (Math dept, MS presentation)
- Regularized Saddle-Free Newton: Saddle Avoidance and Efficient Implementation, Cooper Simpson, 2022, Masters We present a new second-order method for unconstrained non-convex optimization, which we dub Regularized Saddle-Free Newton (R-SFN). This work builds upon a number of recent ideas related to improving the practical performance of the classic Newton’s method. In particular, we develop a nonlinear transformation to the Hessian which ensures it is positive definite at each iteration by approximating the matrix absolute value and regularizing with a scaled gradient norm. While our method applies to C2 objectives with Lipschitz Hessian, our analysis will require the existence of a third continuous derivative. Given this, we show that with an appropriately random initialization our method avoids saddle points almost surely. Furthermore, the form of our nonlinear transformation facilitates an efficient matrix-free approach to computing the update via Krylov based quadrature, making our method scalable to high dimensional problems.
- Stochastic Lanczos Likelihood Estimation of Genomic Variance Components, Richard Border, 2018, Masters Genomic variance components analysis seeks to estimate the extent to which interindividual variation in a given trait can be attributed to genetic similarity. Likelihood estimation of such models involves computationally expensive operations on large, dense, and unstructured matrices of high rank. As a result, standard estimation procedures relying on direct matrix methods become prohibitively expensive as sample sizes increase. We propose a novel estimation procedure that uses the Lanczos process and stochastic Lanczos quadrature to approximate the likelihood for an initial choice of parameter values. Then, by identifying the variance components parameter space with a family of shifted linear systems, we are able to exploit the Krylov subspace shift-invariance property to efficiently compute the likelihood for all additional parameter values of interest in linear time. Numerical experiments using simulated data demonstrate increased performance relative to conventional methods with little loss of accuracy.
- A comparison of spectral estimation techniques, Marc Thomson, 2019, Masters; code at github. Compared with experiments, molecular dynamics (MD) simulations provide a quick and inexpensive way to study the properties of chemical systems. In many cases, it is necessary to extract spectral data from these simulations, such as infrared or Raman spectra. For instance, to validate that the computational system matches a physical system, the spectral ‘fingerprints’ can be examined. For complicated systems, Raman spectroscopy calculations are computationally expensive, providing an incentive to reduce the amount of data required. Currently, spectral estimation from MD simulations relies on the discrete Fourier transform (DFT); however, alternative methods can more precisely model the spectra using fewer data points. These methods are particularly effective when prior knowledge of the spectral shape is considered. Several methods, including the direct regression, Welch power estimation, the regularized resolvent transform (RRT), and a modified version of the filter diagonalization method (FDM) are compared to the DFT when applied to MD simulations of methanol and sodium chloride. We propose a novel modification of the FDM, including use of the LASSO (least absolute shrinkage and selection operator) to improve the method’s accuracy. Moreover, ‘windowing’ present in FDM is modified to produce a significantly more accurate spectrum. The performance of these methods is then compared with each other to determine which methods are prone to include incorrect spectral features or lack correct spectral features. In brief, the modified FDM and RRT far outperformed other methods: the modified FDM produces the lowest rate incorrect spectral peaks while the RRT produces the lowest rate of missing peaks.
- Optimization for High-Dimensional Data Analysis, Derek Driggs, 2017, Masters As modern datasets continue to grow in size, they are also growing in complexity. Data are more often being recorded using multiple sensors, creating large, multidimensional datasets that are difficult to analyze. In this thesis, we explore methods to accelerate low-rank recovery algorithms for data analysis, with an emphasis on Robust Principal Component Analysis (RPCA). We also develop a tensor-based approach to RPCA that preserves the inherent structure of multidimensional datasets, allowing for improved analysis. Both of our approaches use nuclear-norm regularization with Burer-Monteiro factorization (or higher-order generalizations thereof) to transform convex but expensive programs into non-convex programs that can be solved efficiently. We supplement our non-convex programs with a certificate of optimality that can be used to bound the suboptimality of each iterate. We demonstrate that both of these approaches allow for new applications of RPCA in fields involving multidimensional datasets; for example, we show that our methods can be used for real-time video processing as well as the analysis of fMRI brain-scans. Traditionally, these tasks have been considered too demanding for low-rank recovery algorithms.
Undergraduate theses
- Improving Existing Bayesian Optimization Software by Incorporating Gradient Information, Alexey Yermakov (CS and APPM), CS Capstone senior thesis, 2023 This tutorial demonstrates how to use gradient information in Bayesian Optimization. Traditionally, Bayesian Optimization is a zeroth-order method that does not utilize gradient information because oftentimes the black-box functions being optimized does not provide gradients. However, there may be some cases where gradient information is available [1]. As a result, derivative-enabled Bayesian Optimization may use the additional gradient information. We will call zeroth-order Bayesian Optimization ZOBO and first-order Bayesian Optimization FOBO.
- A Generalization of S-Divergence to Symmetric Cone Programming via Euclidean Jordan Algebra (local copy), Zhuochen (Jaden) Wang, 2022, BS Symmetric cone programming encompasses a vast majority of tractable convex optimization problems, including linear programming, semidefinite programming, and second-order cone programming. It turns out that we can generalize many results from semidefinite matrices to symmetric cones under the abstract framework of Euclidean Jordan algebra. In particular, S-divergence was previously proposed as a numerical alternative to Riemannian distance for the Hermitian positive definite cone. The goal of this thesis is to generalize S-divergence to symmetric cones and prove that its nice properties in the matrix case are preserved. Specifically, we wish to show that S-divergence induces a metric on the cone and is geodesically-convex. After an extensive exposition of necessary background, we successfully proved most of our claims, with only one conjecture remaining to be proven.
Professional Masters “culminating experience”
- Austin Wagenknecht, 2022, Digital Signal Processing Methods in Music Information Retrieval
- Jacob Tiede, 2021, Getting Started in Computational Genetics in Python
Miscellany
- Open Problem: Property Elicitation and Elicitation Complexity, Rafael Frongillo, Ian Kash, Stephen Becker. 29th Annual Conference on Learning Theory (COLT), pp. 1655–1658, 2016
- Trig identities that I used in high school and college