Tech Lead, AML Orchestration

San Jose·R&D·other
Apply on ByteDance (TikTok) →

About the Team The Applied Machine Learning (AML) team builds the next-generation machine learning algorithms and platforms that power ByteDance’s recommendation systems, ads ranking, and search ranking. We drive significant impact on ByteDance’s core businesses, focusing on scalable infrastructure, efficient orchestration, and world-class ML systems. Role Overview We are seeking an Tech Lead, AML Orchestration to own and advance ByteDance’s distributed orchestration platforms. This leader will oversee a team of Machine Learning Engineers specializing in orchestration and scheduling, guiding the technical strategy for resource efficiency, distributed training, and online inference systems. The role requires deep expertise in large-scale distributed systems, orchestration frameworks, and cross-team collaboration. Responsibilities - Lead, mentor, and grow a team of orchestration-focused ML engineers; set technical vision and ensure engineering excellence. - Design and optimize distributed orchestration and scheduling strategies across large-scale Kubernetes/Godel environments, ensuring efficiency, reliability, and scalability. - Drive initiatives for autoscaling, resource multiplexing, and preemption across heterogeneous workloads and clusters, including multi-datacenter and multi-cloud setups. - Partner with framework, platform and research teams to build next-generation distributed training and serving systems for ultra-large, high-dimensional recommendation models. - Architect robust and elastic online orchestration frameworks for large-scale inference, supporting evolving recommendation and ads models. - Stay ahead of trends in orchestration, scheduling, and distributed computing, incorporating best practices and emerging technologies.

More open roles at ByteDance (TikTok)