Me
Tsai-Shien Chen
Ph.D. Student at UC Merced
Captured at the beautiful Santa Monica Beach, where I spent my summers of 2023 and 2024. Sending prayers to everyone impacted by the LA wildfires. 🙏

Biography

Welcome! I am a third-year Ph.D. Student at University of California, Merced, advised by amazing Ming-Hsuan Yang. I am also a research intern at Snap, where I am privileged to work with Aliaksandr Siarohin, Sergey Tulyakov, Jun-Yan Zhu, and Kfir Aberman. My research aims at building advanced video generation models with groundbreaking applications. Previously, I did my M.S. and B.S. at National Taiwan University. If you would like to learn more about me, here is my [CV] (updated in Jan 2025) or reach out to me at tsaishienchen [at] gmail.com!

I am honored to receive Graduate Student Opportunity Program Fellowship.

2023 May - Now
Research Intern @ Snap
2022 Aug. - Now
Ph.D. Student @ UC Merced
2019 Sep. - 2022 March
Master Student @ NTU

Selected Publications

Check the full pubications list in [CV]
Multi-subject Open-set Personalization in Video Generation
[ website ] [ arXiv ] [ code ]
A video model with built-in multi-subject, open-set personalization capabilities for both foreground objects and background.
arXiv preprint, 2025
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
[ website ] [ arXiv ] [ code ] [ video ] [ slides ] [ poster ]
A large-scale video dataset with high-quality automatic caption annotations.
CVPR, 2024
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
[ website ] [ arXiv ] [ video ]
A FiT based T2V model, allowing efficient training on billions of parameters.
CVPR, 2024 [Highlight]
Motion-Conditioned Diffusion Model for Controllable Video Synthesis
[ website ] [ arXiv ]
A conditional diffusion model, generating a video from a starting image frame and a set of strokes.
arXiv preprint, 2023
Incremental False Negative Detection for Contrastive Learning
[ OpenReview ] [ arXiv ] [ slides ] [ poster ]
Following the training process of contrastive learning when the embedding space becomes more semantically structural, we incrementally detects more reliable false negatives and explicitly remove them.
ICLR, 2022
Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network
[ website ] [ arXiv ] [ code ] [ video ] [ slides ]
Predict the spatial attention map for each vehicle view given only image-level label for training, and introduce a distance metric emphasizing on the difference in co-occurrence vehicle views.
ECCV, 2020 [Oral]
Viewpoint-Aware Channel-Wise Attentive Network for Vehicle Re-Identification
[ arXiv ] [ video ] [ slides ]
A framework channel-wisely reweighing the importance of each feature map according to the viewpoint of input vehicle image.
CVPR Workshops, 2020

Top