Lead teams of 3-15 researchers with budgets up to $1.2M. Coordinate cross-functional efforts spanning 5+ ORNL divisions and industry partners (HPE, IBM, AMD).
27 peer-reviewed publications including SC21 Best Paper and SC25 Distinguished Paper. SC25 Proceedings Vice Chair. Regular speaker at SC, ISC, and SMC conferences on HPC operations and AI systems.
Author of research that established “Operational Data Analytics” as a recognized HPC subfield. Publications span storage systems, energy efficiency, digital twins, and AI for scientific computing.
Co-founded ODA community (now in major CFPs). Organized SC Birds-of-a-Feather sessions for 4 years. ExaDigiT collaboration spans 25+ institutions. Active in EE HPC WG and multiple PC roles.
Python daily driver for 7+ years in production. Built 100 GiB/day Kafka/Spark pipelines for Frontier power monitoring. TB-scale analytics with Pandas/Dask. Test-driven development with pytest.
8+ years working on Summit/Frontier at ORNL. Led NVMe vendor evaluation for 4,600+ node deployment (5 vendors). System telemetry collection and analysis; deep knowledge of HPC system architecture, workloads, power delivery, and cooling behavior.
Early Kubernetes adopter at ORNL (2018). GitLab CI/CD with runners on K8s and HPC. Production deployments: Redis, PostgreSQL, Kafka, MinIO, Apache Druid, Spark clusters.
Production LLM agent systems with LangChain and Claude SDK. RAG pipelines with ChromaDB. Applied these to HPC operational analytics and report automation.
Prometheus/Grafana for HPC monitoring. Industrial telemetry (BACnet, Modbus) for facility systems. Built air-gapped telemetry bridge for secure data transfer. LD_PRELOAD instrumentation expertise.
Linux kernel I/O optimization for SSDs (2x IOPS, ATC'14). Led NVMe evaluation across 5 vendors. 91k LoC cross-platform C instrumentation framework. Production experience on Linux, AIX, HP-UX, Solaris.