Lead Performance and Optimization Engineer

at AMD
IN,Bangalore-Design Center·Engineering·engineering
Apply on AMD →

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. Lead Performance and Optimization Engineer THE ROLE: We are seeking a Performance Engineer with strong expertise in serverclass CPUs, CPU microarchitecture, and ML inference, responsible for benchmarking, analysing, and optimizing CPU inference performance using EPYCoptimized ML libraries (e.g., ZenDNN) with common frameworks (PyTorch, TensorFlow, ONNX Runtime). The role includes handson work in performance debugging, OS/BIOS tuning, thread/core affinity, multiinstance execution, and Python/scriptingbased automation. . THE PERSON: The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD. KEY RESPONSIBILITIES: Performance Engineering & Optimization Run and optimize ML inference workloads on CPUs using EPYC optimized libraries ( ZenDNN ), improving throughput/latency across single instance and multi instance scenarios. Configure and tune NUMA, HugePages , SMT, power/performance modes, CPU isolation, scheduler settings, scaling governors, and other OS/ BIOS parameters. Design and validate thread/core affinity strategies for single instance , multi instance , multi socket , and framework level multi instance execution models. Optimize workload behaviour through NUMA aware locality, thread scheduling/pinning, batch size tuning, operator level parallelism, and other CPU focused techniques. Contribute to multi instance execution framework development, including policies for instance partitioning, core allocation, memory distribution, and orchestration of parallel runs on large EPYC systems. Benchmarking & Analysis Develop and run structured benchmarks across EPYC SKUs, core counts, caching/topology variations, sockets, and diverse batch sizes. Analyze scaling for single instance vs. multi instance execution, instance placement strategies, and workload isolation. Use perf, VTune , ftrace / trace cmd , PMU counters, flame graphs to identify bottlenecks in compute, memory, thread scheduling, or instance level competition. Perform root cause analysis for regressions in latency, throughput, multi instance efficiency, memory bandwidth, and pipeline behaviour . ML Inference Domain Knowledge Understand how ML frameworks execute models on CPU, including tensor shapes/layouts, operator behavior , threading models, kernel dispatch, scheduling strategies, and multi instance runtime interactions. Interpret how model architecture and operator composition influence performance across single and multiple concurrent inference instances. Collaborate with ZenDNN and kernel/ops teams to relay findings and help guide kernel/ operator level improvements. Automation & Tooling Build automation pipelines for single instance and multi instance benchmarking, profiling, orchestration, scaling studies, and regression detection. Develop Python/Bash tooling that manages instance spawning, CPU core partitioning, memory pinning, performance data capture, reporting, and visual dashboards. Maintain reproducible experiment workflows for both single instance and multi instance configurations. Required Skills & Qualifications Strong understanding of CPU architecture: pipelines, caches, TLB, NUMA, SMT/HT, vector units (AVX2/AVX 512/VNNI/BF16/INT8), and memory hierarchy. 8 to 12 years in performance engineering, systems optimization, or low level execution on Linux. Hands on experience with Linux tuning and server class OS/ BIOS configuration. Proficiency with perf, VTune , PMU counters, ftrace / trace cmd , flame graphs, and multi instance profiling. Strong knowledge of ML inference execution (tensors, operators, threading models) on CPU backends. Strong Python and Bash for automation and performance tooling. Experience in multi core scaling, thread affinity, scheduler behavior , concurrency techniques, and multi instance execution strategies. Familiarity with PyTorch , TensorFlow, ONNX Runtime for running inference workloads. #LI-PK1 Benefits offered are described: AMD benefits at a glance . AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

More open roles at AMD