Yau, Chung Yiu

I am studying the 4th year PhD at CUHK, Department of SEEM, supervised by Prof. Hoi-To Wai. I graduated with a BSc in Computer Science at CUHK in 2021. My research focuses on distributed optimization algorithms for machine learning / deep learning.

Highlights

tldr: We propose the first asynchronous decentralized optimization algorithm that utilizes the primal-dual framework on random graph and randomly sparsified communications. This algorithm operates in practical scenario such as decentralized systems with unstable pairwise communication and asynchronous gradient computation.
This figure demonstrates how FSPDA tolerates different levels of sparse communication on sparse random graph while converging to the same magnitude of stationarity, due to the transient effect of sparsity error. Only consensus error is dominantly affected by the sparsity error.
EMC\(^2\): Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
 Code Poster  Video Slides
tldr: We apply MCMC sampling to draw negative samples for optimizing the global contrastive loss, an upper bound of InfoNCE. Our algorithm EMC\(^2\) improves upon the baselines on small batch size training.
EMC\(^2\) shows fast convergence on the global contrastive loss using a batch size of 4 samples per step, training ResNet-18 on STL-10 subset with SGD.

Publications

tldr: We propose a compressed decentralized optimization that utilize contractive compressor and the primal-dual framework, and analyze its convergence when using exact gradient on nonconvex objective functions.
tldr: A decentralized optimization algorithm that supports time-varying graph, communication compression and asynchronous local updates. This algorithm is constructed from a primal-dual framework and closely connected to the class of gradient tracking algorithms.
Network Effects in Performative Prediction Games
tldr: We study the existence of equilibrium solutions in a networked performative prediction game, with performative distribution shift on one graph and cooperative aggregation on another graph.
tldr: A decentralized optmization algorithm bulit upon gradient tracking that supports communication compression such as sparsification and quantization.
tldr: An analysis to the lower bound on communication cost of distributed optimization for optimizing overparameterized problem and a class of algorithms with matching upper bound in terms of model dimension.
tldr: We study the performative prediction problem on a networked setting, where learner's data distribution has local distribution shift while multiple learners on the network seek a consensal solution.
DAP-BERT: Differentiable Architecture Pruning of BERT
 Code
tldr: A model pruning method for BERT by optimizing a knowledge distillation objective.