Biography

Welcome! I am a third-year Ph.D. candidate at University of California, Merced, advised by amazing Ming-Hsuan Yang. I am also a research intern at Snap, where I am privileged to work with Aliaksandr Siarohin, Sergey Tulyakov, Jun-Yan Zhu, and Kfir Aberman. My research aims at building advanced video generation models with groundbreaking applications. Previously, I did my M.S. and B.S. at National Taiwan University. If you would like to learn more about me, here is my [CV] (updated in Feb 2025) or reach out to me at tsaishienchen [at] gmail.com!

I am honored to receive Graduate Student Opportunity Program Fellowship.

2023 May - Now

Research Intern @ Snap

Creative Vision

2022 Aug. - Now

Ph.D. Candidate @ UC Merced

Vision and Learning Lab

2019 Sep. - 2022 March

M.S. @ NTU

Media IC & System Lab

Selected Publications

Check the full publication list in [CV]

Multi-subject Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, Sergey Tulyakov

[ website ] [ arXiv ] [ code ] [ video ] [ slides ] [ poster ]

A video model with built-in multi-subject, open-set personalization capabilities for both foreground objects and background.

CVPR, 2025

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

[ website ] [ arXiv ] [ code ] [ video ] [ slides ] [ poster ]

A large-scale video dataset with high-quality automatic caption annotations.

CVPR, 2024

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

[ website ] [ arXiv ] [ video ]

A FiT based T2V model, allowing efficient training on billions of parameters.

CVPR, 2024 [Highlight]

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang

[ website ] [ arXiv ]

A conditional diffusion model, generating a video from a starting image frame and a set of strokes.

arXiv preprint, 2023

Incremental False Negative Detection for Contrastive Learning

Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, Ming-Hsuan Yang

[ OpenReview ] [ arXiv ] [ slides ] [ poster ]

Following the training process of contrastive learning when the embedding space becomes more semantically structural, we incrementally detects more reliable false negatives and explicitly remove them.

ICLR, 2022

Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network

Tsai-Shien Chen, Chih-Ting Liu, Chih-Wei Wu, Shao-Yi Chien

[ website ] [ arXiv ] [ code ] [ video ] [ slides ]

Predict the spatial attention map for each vehicle view given only image-level label for training, and introduce a distance metric emphasizing on the difference in co-occurrence vehicle views.

ECCV, 2020 [Oral]

Viewpoint-Aware Channel-Wise Attentive Network for Vehicle Re-Identification

Tsai-Shien Chen, Man-Yu Lee, Chih-Ting Liu, Shao-Yi Chien

[ arXiv ] [ video ] [ slides ]

A framework channel-wisely reweighing the importance of each feature map according to the viewpoint of input vehicle image.

CVPR Workshops, 2020

Top