Andrew Cheng

About Me

I am a 2nd year PhD student in Computer Science at Harvard University, advised by Melanie Weber. My research lies in the interface of machine learning, optimization, and statistics. My research includes Riemannian optimization and high-dimensional learning dynamics. Also, I am incorporating feature learning of neural networks into regression methods and interested in developing novel optimization algorithms for large-scale models. My research has been supported in part by the NSERC Postgraduate Scholarship, the Harvard Dean’s Competitive Fund for Promising Scholarship, NSF, and FQRNT.

Previously, I was a Master's student at McGill University and the Montreal Institute of Learning Algorithms (MILA), supervised by Courtney Paquette and Elliot Paquette. I completed my Bachelor's degree in Statistics and Computer Science at McGill University.

In 2022 and 2023, I interned as a deep learning researcher at ValenceLabs, where I worked on generative models for drug discovery.

Recent Events

December 2024

Two workshop papers at NeurIPS's OPTML conference

Presented results from our papers "High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems" and "Structured Regularization for Constrained Optimization on the SPD Manifold."

November 2024

Structured Regularization for SPD Optimization with Side Information

Published at IEEE 60th Allerton Conference on Communication, Control, and Computing. Read here .

Research

2024

Structured Regularization on the Positive Definite Manifold

Authors: Andrew Cheng, Melanie Weber

We developed a framework for structured regularization on the positive definite manifold to induce sparsity in the solution or incorporate side information. Our framework preserves properties that allow provable non-asymptotic convergence to the global optimum of non-convex problems on the SPD manifold.

Under Review

Disciplined Geodesically Convex Programming

Authors: Andrew Cheng, Vaibhav Dixit, Melanie Weber

We extended the theory of disciplined convex programming to the geodesically convex setting, providing a framework for optimization on Cartan-Hadamard manifolds and focused on the symmetric positive definite manifold.

Under Review

High Dimensional First Order Mini-Batch Algorithms on Quadratic Problem

Authors: Andrew Cheng, Kiwon Lee, Courtney Paquette

We show that general mini-batch first order algorithms (Nesterov acceleration, SGD+M, etc.) under any quadratic statistics \( \mathcal{R}: \mathbb{R}^d \to \mathbb{R} \) concentrates around a deterministic quantity, which is the solution of the Volterra integral equation. Examples of quadratic statistics includes but not limited to: the random features model, training loss and excess risk for empirical risk minimization (in-distribution and generalization error) for ridge regression. By analyzing the Volterra integral equation, we can analyze the optimal mini-batch size (i.e. critical batch size) and the optimal hyperparameters (learning rate, momentum, etc.) for the mini-batch algorithms.

Workshop (Preprint soon)

2022

Trajectory of Mini-Batch Momentum

Authors: Kiwon Lee, Andrew Cheng, Elliot Paquette, Courtney Paquette

This work, presented at NeurIPS 2022, explores the dynamics of mini-batch momentum in optimization algorithms used for machine learning. The research provides insights into the trajectory of momentum in stochastic optimization and its convergence properties.

Read the Full Paper (NeurIPS 2022)