Senior Software Engineer specializing in LLM evaluation and AI code generation quality assessment, backed by 10+ years
of production engineering across backend systems, cloud infrastructure, and CI/CD automation.
Currently designing
evaluation frameworks across multiple assessment platforms, having built 130+ evaluation tasks, scoring rubrics, and
golden reference implementations that benchmark AI coding agents against production-grade repositories in Python, C++,
Java, and Go, identifying systematic reasoning failures that directly inform model improvement.
Deep production
engineering background provides the technical foundation for credible code quality assessment, including performance
optimization (55% response time improvement on a 10k+ daily transaction CRM), legacy system modernization (.NET to
ASP.NET Core), and training 200+ professionals in AWS and cloud infrastructure.