Experience

  1. Research Scientist, Analytics & AI Methods at Scale (AAIMS) Group

    Oak Ridge National Laboratory (ORNL)
    • Founding member of AAIMS group; technical lead for operational data analytics and AI systems. Lead teams of 3-5 core researchers with coordination across 15-20 cross-functional collaborators.
    • SC21 Best Paper for Summit power efficiency analysis across 27,000+ GPUs.
    • Built Frontier power monitoring infrastructure processing 100+ GiB/day via Kafka/Spark pipelines, enabling real-time operational insights.
    • Core contributor to ExaDigiT digital twin framework (R&D 100 Award), coordinating 25+ institutions for exascale facility modeling.
    • Designed LLM-based systems for predictive analytics and operational data queries, achieving 26% accuracy improvement over baseline.
  2. Research Associate / Postdoctoral Research Associate

    Oak Ridge National Laboratory (ORNL)
    • Developed HPC storage middleware optimizations for burst buffer systems, published at IPDPS'19.
    • Led NVMe vendor evaluation for Summit/Frontier deployment, testing 5 vendors across 4,600+ node configurations.
    • Designed Cooling Intelligence system for Summit, projecting 20% energy savings through predictive thermal management.
    • Built foundational telemetry infrastructure and analytics pipelines for ORNL Leadership Computing Facility operations.
  3. Research Assistant

    Seoul National University, South Korea
    • Developed cross-layer SSD optimizations, integrating custom FTLs, OS enhancements, and FPGA-based emulation and prototyping
    • Designed high-performance SSD storage architectures for HPC, reducing tail latency in key-value store
    • Developed a custom key-value storage engine with Samsung SSD garbage collection APIs, improving latency consistency demonstrating 6-9x reduction in 99.9999 percentile read latency
  4. Research Engineer | Software Engineer

    TmaxSoft
    • Designed and developed a non-intrusive middleware transaction instrumentation framework (LD_PRELOAD-based), enabling end-to-end performance monitoring of enterprise applications. Built function-hooking transaction latency monitoring modules for products such as BEA Tuxedo, TmaxSoft Tmax, and Oracle using function interception, lock-free shared memory based IPC.
    • Led the application instrumentation layer deployment effort of the LG Display Zero Failure Project (LG Display Ltd.), delivering a function intercept-based middleware application transaction monitoring system to their mission critical Manufacturing Execution System (MES).
  5. Software Developer

    Samsung Networks, South Korea (merged into Samsung SDS)
    • Maintained and enhanced NMSPlus 3.0–3.1, a network monitoring system collecting SNMP, ping & Netflow statistics from Cisco, Alcatel, and Juniper devices.
    • Developed SNMP-based data collection modules for ATM switches and L4 switches, expanding network monitoring capabilities.

Education

  1. Ph.D., Electrical Engineering and Computer Science (MA & Ph.D. integrated)

    Seoul National University
    Dissertation: “OS I/O Stack Optimizations for Flash Solid-State Drives”, Supervised by Heonyoung Yeom.
    Read Thesis
  2. B.Sc., Computer Science

    Korea University
Capabilities
Leadership & Impact
Project & Team Leadership

Lead teams of 3-15 researchers with budgets up to $1.2M. Coordinate cross-functional efforts spanning 5+ ORNL divisions and industry partners (HPE, IBM, AMD).

Technical Communication

27 peer-reviewed publications including SC21 Best Paper and SC25 Distinguished Paper. SC25 Proceedings Vice Chair. Regular speaker at SC, ISC, and SMC conferences on HPC operations and AI systems.

Technical Writing

Author of research that established “Operational Data Analytics” as a recognized HPC subfield. Publications span storage systems, energy efficiency, digital twins, and AI for scientific computing.

Cross-Org Coordination & Community Building

Co-founded ODA community (now in major CFPs). Organized SC Birds-of-a-Feather sessions for 4 years. ExaDigiT collaboration spans 25+ institutions. Active in EE HPC WG and multiple PC roles.

Technical Skills
Python & Data Engineering

Python daily driver for 7+ years in production. Built 100 GiB/day Kafka/Spark pipelines for Frontier power monitoring. TB-scale analytics with Pandas/Dask. Test-driven development with pytest.

HPC & Systems

8+ years working on Summit/Frontier at ORNL. Led NVMe vendor evaluation for 4,600+ node deployment (5 vendors). System telemetry collection and analysis; deep knowledge of HPC system architecture, workloads, power delivery, and cooling behavior.

Kubernetes & Infrastructure

Early Kubernetes adopter at ORNL (2018). GitLab CI/CD with runners on K8s and HPC. Production deployments: Redis, PostgreSQL, Kafka, MinIO, Apache Druid, Spark clusters.

AI/ML Systems

Production LLM agent systems with LangChain and Claude SDK. RAG pipelines with ChromaDB. Applied these to HPC operational analytics and report automation.

Monitoring & Telemetry

Prometheus/Grafana for HPC monitoring. Industrial telemetry (BACnet, Modbus) for facility systems. Built air-gapped telemetry bridge for secure data transfer. LD_PRELOAD instrumentation expertise.

Systems & Storage

Linux kernel I/O optimization for SSDs (2x IOPS, ATC'14). Led NVMe evaluation across 5 vendors. 91k LoC cross-platform C instrumentation framework. Production experience on Linux, AIX, HP-UX, Solaris.