👨‍💻 Software Engineer

Gaurang Vishwakarma

Specializing in ML Systems, Distributed Systems, and Infrastructure

Production C++ & Python • Distributed LLM Training • Kubernetes & HPC • Open Source Systems at Scale

500K+

Project Users Worldwide

HPC Clusters worked on

Years Open Source Experience

View My Work 🚀YouTube Channel 📺

Technical Skills

AI and ML

PyTorch
LLMs
LoRA
Fine-Tuning
RAG
GPU Clusters

HPC and System Tools

MPI
Slurm
CUDA/ROCm
Linux
Kubernetes
Ansible

Development

Modern C++
Python
React/Next + JavaScript
Bash Shell Scripting
CI/CD
Docker/Singularity

Systems Engineering

Building ML Systems at Scale

Focused on building and optimizing large-scale ML systems, distributed computing infrastructure, and production-grade tooling for high-performance workloads. Experience spanning modern ML frameworks, container orchestration, and HPC clusters.

Core Expertise

ML Systems: Distributed LLM training, LoRA fine-tuning, RAG architectures, PyTorch optimization
Infrastructure: Kubernetes, Docker, Slurm, Terraform, Ansible, CI/CD pipelines
Systems Programming: Production C++ and Python, Bash automation, performance optimization
Open Source Leadership: Maintainer of Linux distribution serving 500K+ users globally

HPC & Accelerator Infrastructure

Production experience across multiple world-class supercomputing clusters: ARCHER2, Cirrus, DKRZ Levante, PSC Bridges-2, EIDF GPU Cluster, EIDF Cerebras Cluster.

Hands-on benchmarking and optimization across diverse hardware accelerators including NVIDIA A100, H100, H200, AMD MI210, MI300X, and Cerebras CS-3.

High Performance Computing Infrastructure

Featured Projects

LLM LoRA Fine-Tuning & Performance Optimization

Distributed LoRA fine-tuning pipelines for large language models, optimized for performance across heterogeneous accelerator clusters. Focused on scalable training, reproducibility, and benchmarking across GPU and wafer-scale systems.

PyTorchDistributed SystemsSlurmKubernetesDocker

Learn More →

Arka Linux GUI - Open Source OS

Open-source Linux operating system and GUI stack used by 500,000+ users globally as a daily driver. Focused on system reliability, modular build pipelines, and long-term maintainability across diverse hardware. Features a vibrant community on our support platforms.

C++Qt/QMLPythonSystems EngineeringOpen Source

View Project →

Scholar Sense - RAG System

Production-grade Retrieval-Augmented Generation (RAG) system for large-scale document analysis. Built scalable NLP pipelines for embedding, indexing, and LLM-based inference using containerized microservices.

PythonRAGChromaDBFlaskKubernetes

Learn More →

oschat - Real-Time Communication Platform

High-performance real-time communication platform supporting thousands of concurrent WebSocket connections. Designed for low-latency messaging, horizontal scalability, and production cloud deployment.

TypeScriptNext.jsWebSocketKubernetesGCP

Learn More →

Beyond Engineering

Outside of building distributed systems and optimizing ML pipelines, I'm a performing drummer with 30+ live shows across indoor venues and arenas. Music brings the same creative problem-solving and rhythm that drives great engineering.