Research Scientist, ByteBrain Infrastructure Operation

San Jose·Infrastructure·engineering
Apply on ByteDance (TikTok) →

Team Introduction ByteBrain is ByteDance’s AI for Infrastructure (AI4Infra) platform, dedicated to improving the efficiency, reliability, and intelligence of large-scale infrastructure systems through AI and machine learning. ByteBrain supports a wide range of infrastructure domains, including AI data center supply chains, databases, storage systems, networking, containers and virtualization, and big data platforms, powering infrastructure optimization at massive scale. This role sits at the intersection of: Operations Research × AIOps × AI for Infra You will have the opportunity to solve some of the most challenging optimization problems behind large-scale AI datacenters while pioneering the next generation of AI-powered decision-making systems, where LLMs, and optimization algorithms work together to improve efficiency, resource utilization, and operational intelligence across ByteDance's global infrastructure. Responsibilities - Design and develop AI, machine learning, and optimization algorithms to improve the efficiency, reliability, and performance of large-scale infrastructure systems. Areas may include AIOps, operations research, software engineering, and system optimization. - Drive the deployment, scaling, and continuous improvement of algorithms in production environments, supporting large-scale services. - Identify optimization opportunities and emerging challenges from real-world infrastructure scenarios, translating them into impactful research and engineering solutions. - Conduct cutting-edge research and publish high-quality papers in top-tier conferences and journals.

More open roles at ByteDance (TikTok)