Yau, Chung Yiu

I am studying the 4th year PhD at CUHK, Department of SEEM, supervised by Prof. Hoi-To Wai. I graduated with a BSc in Computer Science at CUHK in 2021. My research focuses on distributed optimization algorithms for machine learning / deep learning.

Highlights

tldr: We propose the first asynchronous decentralized optimization algorithm utilizing the primal-dual framework on random graph and randomly sparsified communications. This algorithm operates in practical scenario such as decentralized systems with unstable pairwise communication and asynchronous gradient computation.
EMC\(^2\): Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
 Code Poster  Video Slides
tldr: We apply MCMC sampling to draw negative samples for optimizing an upper bound of the InfoNCE contrastive loss objective, improving the performance over the baselines on small batch size training.
EMC\(^2\) shows fast convergence on a batch size of 4 samples per step, training ResNet-18 on STL-10 subset with SGD.

Publications

tldr: A decentralized optimization algorithm that supports time-varying graph, communication compression and asynchronous local updates. This algorithm is constructed from a primal-dual framework and closely connected to the class of gradient tracking algorithms.
Network Effects in Performative Prediction Games
tldr: We study the existence of equilibrium solutions in a networked performative prediction game, with performative distribution shift on one graph and cooperative aggregation on another graph.
tldr: A decentralized optmization algorithm bulit upon gradient tracking that supports communication compression such as sparsification and quantization.
tldr: An analysis to the lower bound on communication cost of distributed optimization for optimizing overparameterized problem and a class of algorithms with matching upper bound in terms of model dimension.
tldr: We study the performative prediction problem on a networked setting, where learner's data distribution has local distribution shift while multiple learners on the network seek a consensal solution.
DAP-BERT: Differentiable Architecture Pruning of BERT
 Code
tldr: A model pruning method for BERT by optimizing a knowledge distillation objective.