Me
Tsai-Shien Chen
Ph.D. Student at UC Merced
tsaishienchen [at] gmail.com

Biography

I am a second-year Ph.D. Student at University of California, Merced, advised by Ming-Hsuan Yang. Currently, I am also a research intern at Creative Vision team in Snap Inc., where I work with Aliaksandr Siarohin and Sergey Tulyakov. My recent research interests are controllable video synthesis and creation. Previously, I obtained my M.S. and B.S. degrees from National Taiwan University, where I worked with Shao-Yi Chien. If you would like to learn more about me, here is my [CV] (updated in March 2024) or reach out to me at tsaishienchen [at] gmail.com!

2023 May - Now
Research Intern @ Snap Inc.
2022 Aug. - Now
Ph.D. Student @ UC Merced
2019 Sep. - 2022 March
Master Student @ NTU

Selected Publications

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
[ website ] [ arXiv ] [ code ]
We introduce Panda-70M, a large-scale video dataset with high-quality automatic caption annotations.
Computer Vision and Pattern Recognition (CVPR), 2024
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
[ website ] [ arXiv ]
We introduce Snap Video, a transformer based Text-to-Video model, allowing us to efficiently train a T2V model with billions of parameters for the first time.
Computer Vision and Pattern Recognition (CVPR), 2024
[Highlight, acceptance rate: 2.8%]
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
[ website ] [ arXiv ] [ paper ]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
arXiv preprint, 2023
Incremental False Negative Detection for Contrastive Learning
[ OpenReview ] [ arXiv ] [ paper ] [ slides ] [ poster ]
We highlight the unfavorable effect from false negatives for self-supervised contrastive learning. To address the issue, we introduce IFND. Following the training process, when the embedding space becomes more semantically structural, IFND would incrementally detect more reliable false negatives and explicitly remove them during contrastive learning.
International Conference on Learning Representations (ICLR), 2022
Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network
[ website ] [ arXiv ] [ paper ] [ video ] [ slides ] [ code ]
In this paper, we propose SPAN to predict the spatial attention map for each vehicle view given only image-level label for training. We also introduce a distance metric emphasizing on the difference in co-occurrence vehicle views.
European Conference on Computer Vision (ECCV), 2020
[Oral, acceptance rate: 2.1%]
Viewpoint-Aware Channel-Wise Attentive Network for Vehicle Re-Identification
[ paper ] [ video ] [ slides ]
We propose VCAM to enable our framework channel-wisely reweighing the importance of each feature map according to the viewpoint of input vehicle image. By the aid of VCAM, we obtain promising results on 2020 AI City Challenge. We also explore the interpretability of how VCAM actually improves the performance.
Computer Vision and Pattern Recognition (CVPR) Workshops, 2020

Top