ML Evaluation Specialist, Human Data
At Apple, we don’t just build products — we build experiences fueled by world-class data. The
Human-centered AI team within Apple Services Engineering is looking for an ML Evaluation Specialist, Human Data to join our Data Quality and Operations division to spearhead complex, multi-stakeholder operations that specialize in data collection, curation, annotation, and human evaluation efforts across Apple Music, App Store, TV+, Podcasts, and Books.
In this role, you will own the operational strategy and continuous improvement of large-scale, multilingual human data programs, from designing onboarding scaffolds that progressively build annotator calibration, to analyzing annotator behavior patterns to identify where automation can offload low-judgment decisions, to enforcing quality frameworks that close the loop between annotator struggle and task redesign. You will identify where human judgment is essential and where it could be better directed, then build the scaffolding, automation, and feedback systems that let annotators focus their cognitive energy where it matters most. Because this work cuts across engineering, data science, research, procurement, and legal, a critical part of the role is serving as the connective tissue between teams who each own a piece of this space, aligning on shared standards, surfacing gaps, and ensuring that insights from the annotation layer inform upstream decisions about task design and tooling. You will bring a point of view on human data best practices and translate it into scalable, human-centered approaches that make generative AI features safer and more reliable.
The ideal candidate brings a rare combination of technical depth and program execution skills. You are comfortable designing and deploying sophisticated data pipelines in the morning, and then seamlessly transitioning to present comprehensive quality rectification strategies to stakeholders in the afternoon. You care deeply about data quality and human alignment, have a creative and systematic approach to finding and fixing problems, and find motivation in wide-ranging work whose impact shows up in everyday Apple experiences.