Backend Engineer - AML Framework Development (Search, Ads, and Recommendation Direction)

Singapore·R&D·engineering
Apply on ByteDance (TikTok) →

About The Team The mission of our AML team is to push the next-generation AI infrastructure and recommendation platform for the ads ranking, search ranking, live & e-Commerce ranking in our company. We also drive substantial impact on core businesses of the company. Responsibilities - Responsible for the iteration of the underlying architecture of the large model inference engine and end-to-end GPU performance optimization, through means such as operator fusion and compilation optimization, deeply optimizing GPU memory access, computing pipeline, and Stream asynchronous scheduling, eliminating inference computing bottlenecks, improving single-card inference throughput, and reducing inference latency. - Adapt to all series of GPU/NPU hardware architectures, refine the universality of the inference engine and hardware adaptability, and build a high-performance, low-loss underlying base for large model inference. - Lead the design, development, and optimization of distributed parallel solutions for large model inference scenarios, with a focus on implementing multi-dimensional parallel strategies such as tensor parallelism (TP), pipeline parallelism (PP), sequence parallelism, and MoE expert parallelism, to address core issues such as multi-card splitting and deployment of ultra-large models, high cross-card communication overhead, load imbalance, and low parallel efficiency. - Follow up on cutting-edge technologies such as global large model inference, GPU high-performance computing, distributed parallelism, and cache optimization, benchmark against mainstream inference frameworks such as vLLM and TensorRT-LLM, complete the implementation of solutions and technological innovation, continuously iterate and optimize the performance and cost advantages of the inference system, and build the core technological barriers of the team.

More open roles at ByteDance (TikTok)