Site Reliability Engineer, Hybrid Cloud Operation and Delivery - Data Infrastructure
Our team is responsible for infrastructure systems of hybrid cloud, including products in IaaS/PaaS/SaaS/AI models. We strive to be a leading Site Reliability Engineering (SRE) team in the industry, driving reliability, scalability, and performance at scale. As part of the SRE team, you will tackle complex, large-scale challenges, leveraging your expertise in coding, algorithms, complexity analysis, and distributed system design. We foster a culture of diversity, intellectual curiosity, and open collaboration. Engineers are empowered with strong ownership, autonomy, and the opportunity to work across a wide range of impactful projects. What you will be doing: - Responsible for delivery products in hybrid cloud scenarios, including cloud platform planning, software deployment, resource expansion, etc. Collaborate with R&D teams to complete project delivery. - Responsible for the operation of cloud platform environments for internal and external customers, including daily alarm handling, on-call support, change, as well as ensuring stability of cloud platform during important event periods. - Participate in stability construction of cloud products with R&D team, and continuously improve capabilities in high availability architecture, disaster recovery, alarm monitoring, etc, based on the experience we get from large-scale systems on site. - Continuously promote the improvement of hybrid cloud serviceability, participate in the standardized SOW of O&M and delivery for new product versions, and build the SRE serviceability acceptance standards to improve implementation efficiency.