I am a principal-level software engineer with more than 20 years of experience building GPU-accelerated media pipelines, SDKs, and computer vision systems. My background spans major technology companies including AMD, Microsoft, Amazon, and Skype, where I have consistently worked on performance-critical products that move from research prototypes into production.
I specialize in C/C++ performance optimization, GPU programming with CUDA and ROCm/HIP, and modern machine learning engineering. My work also includes Transformers, RAG, LLMs, PyTorch, Docker, and CI/CD, with a strong focus on practical deployment and measurable performance gains.
At AMD, I led architecture and delivery for SmartAccess Video, a distributed GPU processing SDK that improved transcoding speed by more than 60%. I also owned profiling and optimization efforts across CPU and GPU bottlenecks, using tools such as GPUView, CodeXL, and VTune.
My experience includes technical leadership and cross-functional collaboration across enterprise integrations, media platforms, and real-time communication systems. I have worked on products and partnerships that supported large-scale launches, improved playback quality, and enabled hardware-accelerated implementations across multiple device ecosystems.
In consulting and independent work, I have delivered low-latency AR/VR streaming infrastructure, optimized video pipelines for gaming latency, redesigned scene analysis systems, and deployed automated content moderation solutions. These projects reflect my ability to combine systems engineering, applied AI, and product delivery.
More recently, I have built AI and ML projects such as autonomous job search agents, RAG assistants, and local-first computer vision tools. I continue to focus on agentic AI, LLM workflows, and production-ready ML systems while maintaining my core strength in GPU systems and media engineering.
Architected and led SmartAccess Video, AMD’s distributed GPU processing SDK. Achieved more than 60% transcoding speed improvements through intelligent workload distribution across integrated and discrete GPUs. Drove prototypes to production with engineering teams and ISV partners. Owned performance profiling and optimization across CPU and GPU bottlenecks.
Grew enterprise integrations by 25% through technical evangelism, MDM/EMM portal partnerships, and developer enablement programs. Supported enterprise accounts in Digital Twins and IoT implementations and advised on REST API architecture for multi-tenancy and unattended access.
Led end-to-end integration of Amazon Video Player on living-room SoC devices. Supported Prime Video’s launch in more than 200 countries. Delivered next-generation players with DASH and HEVC support and improved monetization through optimized ad insertion.
Owned feature development and optimization across Skype’s real-time communication pipeline. Coordinated cross-functional teams and hardware partners. Drove performance improvements in video encoding and decoding through hardware-accelerated implementations.
Led a team of 8 to 12 engineers delivering AMD’s GPU-based video codec pipeline from pre-silicon emulation through production. Architected and implemented H.264, MPEG-2, VC-1, MVC, H.263, and MPEG-4 codec components. Developed error recovery and concealment algorithms for video decoding firmware.
Delivered low-latency AR/VR streaming infrastructure, optimized Microsoft Xbox video pipelines for real-time gaming latency, built GPU-accelerated image and video filtering, redesigned video scene analysis pipelines, and deployed automated content moderation systems integrating ML models.
Jobicy
617 professionals pay to access exclusive and experimental features on Jobicy
Free
USD $0/month
For people just getting started
Plus
USD $8/month
Everything in Free, and: