Tech Lead, Research Scientist/Engineer - AI Infrastructure

San Jose·Algorithm·engineering
Apply on ByteDance (TikTok) →

We are seeking an experienced Research Scientist or Engineer to help define and build the next generation of AI infrastructure. In this role, you will work at the intersection of large-scale systems, AI, and emerging hardware to design infrastructure that enables reliable, efficient, and scalable AI workloads at ByteDance. You will work closely with tech leaders, architects, and product teams to translate evolving AI requirements into robust infrastructure architectures. The role involves identifying emerging trends in AI algorithms and systems, designing scalable system architectures, and driving innovations that improve performance, reliability, and cost efficiency across the AI stack. About the team: We are a lean architect & research team responsible for defining the next generation of AI infrastructure at ByteDance. AI is a fast-evolving horizon — pretraining, RL, and agentic workloads each reshape the requirements faster than traditional cloud abstractions can absorb — and our team is built to keep pace rather than simply react. We approach the problem as an end-to-end AI factory: a tightly coupled production system spanning data, applications, software infrastructure, chips, energy, and the broader supply chain. In this role, you will work at the intersection of large-scale systems, AI, emerging hardware, and the cognitive foundations of intelligent agents — including next-generation AI memory systems informed by cognitive science and psychology — designing scalable architectures and driving innovations across the full AI factory stack. Responsibilities: AI Factory Architecture - Design and evaluate scalable architectures across the full AI factory — compute, storage, networking, chips, power, and the data and application layers — for large-scale training, RL, and inference workloads. Develop technical proposals for supply-chain and energy constraints alongside silicon and software trade-offs. Research & Technology Exploration - Track emerging trends across AI systems, distributed training and RL, and hardware acceleration, as well as adjacent fields such as cognitive science and psychology that inform AI memory and reasoning substrates. Build prototypes and share insights through technical reports. AI Memory & System Performance Optimization - Analyze and optimize performance across the ML stack — scheduling, networking, storage, training and RL frameworks, and emerging AI memory systems for long-horizon agents — through benchmarking and bottleneck analysis. Cross-Team Technical Alignment - Work across research, engineering, hardware, data-center, and product teams to translate AI workload requirements into scalable solutions and drive cross-team initiatives spanning the full AI factory.

More open roles at ByteDance (TikTok)