Tech Lead Cloud Site Reliability Engineer - DCS Cloud

Seattle·R&D·engineering
Apply on ByteDance (TikTok) →

Our Infrastructure Engineering team supports the company's fast growth by building and operating hyper-scale datacenters, managing the life cycle of server fleet, providing cloud solutions, and developing various infrastructure services and making sure they are scalable and are reliable. We have three subgroups for this role: - Cloud Host Delivery, Delivery & Standardization - Cloud Host Operation, Operation Efficiency & Reliability - Cloud Management & Security Responsibilities - What You'll Do - Design, build, scale, and operate ByteDance’s global infrastructure, including large-scale systems spanning public and private clouds. - Develop tools, automation frameworks, visualizations, and monitoring systems to streamline operations and drive optimization of global infrastructure. - Create, manage, and standardize cloud AMIs/images for use across multiple environments, ensuring strict alignment with the company's global compliance standards. - Thrive in a fast-paced environment, engaging in technical operations and on-call rotations to address incidents related to cloud, OS, network, performance, and reliability. - Drive improvements across the entire infrastructure lifecycle, from ideation and design through development, deployment, user support, and continuous refinement.

More open roles at ByteDance (TikTok)