Research Scientist - Seed Multimodal Interaction and World Model

San Jose·Computer vision·data
Apply on ByteDance (TikTok) →

About the Team Established in 2023, the ByteDance Seed team is dedicated to pioneering new paths toward artificial general intelligence. We aspire to advance the frontier of intelligence to drive progress for both technology and society. With a long-term vision for the AI sector, the Seed team's research spans MLLM, GenMedia, AI for Science, and Robotics. We maintain a global presence with laboratories and career opportunities across China, Singapore, and the United States. To date, we have launched industry-leading general foundation models and cutting-edge multimodal capabilities. Our technology powers over 50 application scenarios — including Doubao, Jimeng, TRAE, Dola and Dreamnia — and serves enterprise customers through Volcano Engine and BytePlus. Third-party data shows that the Doubao App ranks first in user volume in the Chinese market, while Doubao foundation models lead the industry in average daily token consumption. The Seed Multimodal Interaction and World Model team is dedicated to developing models that have boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products Responsibilities - Research and development large-scale multimodal foundation models - Develop unified modeling frameworks that integrate video, audio, and language, with a focus on visual latent reasoning - Explore Reinforcement Learning-based approaches to bridge understanding and generation for multimodal visual reasoning - Collaborate with researchers to evaluate models on tasks involving world modeling, reasoning, and instruction-conditioned generation

More open roles at ByteDance (TikTok)