Skip to main content
👨‍💻 Software Engineer

Gaurang Vishwakarma

Specializing in ML Systems, Distributed Systems, and Infrastructure

Production C++ & Python • Distributed LLM Training • Kubernetes & HPC • Open Source Systems at Scale

500K+
Project Users Worldwide
6
HPC Clusters worked on
5+
Years Open Source Experience
Gaurang Vishwakarma

Technical Skills

AI and ML

  • PyTorch
  • LLMs
  • LoRA
  • Fine-Tuning
  • RAG
  • GPU Clusters

HPC and System Tools

  • MPI
  • Slurm
  • CUDA/ROCm
  • Linux
  • Kubernetes
  • Ansible

Development

  • Modern C++
  • Python
  • React/Next + JavaScript
  • Bash Shell Scripting
  • CI/CD
  • Docker/Singularity

Systems Engineering

Building ML Systems at Scale

Core Expertise

  • ML Systems: Distributed LLM training, LoRA fine-tuning, RAG architectures, PyTorch optimization
  • Infrastructure: Kubernetes, Docker, Slurm, Terraform, Ansible, CI/CD pipelines
  • Systems Programming: Production C++ and Python, Bash automation, performance optimization
  • Open Source Leadership: Maintainer of Linux distribution serving 500K+ users globally

HPC & Accelerator Infrastructure

Production experience across multiple world-class supercomputing clusters: ARCHER2, Cirrus, DKRZ Levante, PSC Bridges-2, EIDF GPU Cluster, EIDF Cerebras Cluster.

Hands-on benchmarking and optimization across diverse hardware accelerators including NVIDIA A100, H100, H200, AMD MI210, MI300X, and Cerebras CS-3.

High Performance Computing Infrastructure

Featured Projects

LLM LoRA Fine-Tuning & Performance Optimization

Distributed LoRA fine-tuning pipelines for large language models, optimized for performance across heterogeneous accelerator clusters. Focused on scalable training, reproducibility, and benchmarking across GPU and wafer-scale systems.

PyTorchDistributed SystemsSlurmKubernetesDocker
Learn More →

Arka Linux GUI - Open Source OS

Open-source Linux operating system and GUI stack used by 500,000+ users globally as a daily driver. Focused on system reliability, modular build pipelines, and long-term maintainability across diverse hardware. Features a vibrant community on our support platforms.

C++Qt/QMLPythonSystems EngineeringOpen Source
View Project →

Scholar Sense - RAG System

Production-grade Retrieval-Augmented Generation (RAG) system for large-scale document analysis. Built scalable NLP pipelines for embedding, indexing, and LLM-based inference using containerized microservices.

PythonRAGChromaDBFlaskKubernetes
Learn More →

oschat - Real-Time Communication Platform

High-performance real-time communication platform supporting thousands of concurrent WebSocket connections. Designed for low-latency messaging, horizontal scalability, and production cloud deployment.

TypeScriptNext.jsWebSocketKubernetesGCP
Learn More →

Beyond Engineering

Outside of building distributed systems and optimizing ML pipelines, I'm a performing drummer with 30+ live shows across indoor venues and arenas. Music brings the same creative problem-solving and rhythm that drives great engineering.

Live drumming performance
Live music show
Stage performance