Explore my work in hardware systems, RTL development, embedded systems, and more.
An agentic AI framework that automates FPGA timing closure on Vivado Design Checkpoints (DCPs). It combines a Python wrapper around RapidWright with Vivado and RapidWright MCP servers, allowing an AI agent to analyze timing reports and iteratively apply optimization strategies. Supported strategies include high-fanout net splitting, pblock-based replacement, and physical optimization. A dashboard lets users either run the AI-guided flow or manually pick optimizations to recover negative slack post P&R.
More Info
A custom RTL parser for NASDAQ ITCH market data streams, written and verified in SystemVerilog. Supports 6 of the 22 ITCH message types and implements cycle-accurate logic for extracting more than 10 distinct fields, including multi-byte fields that span word boundaries in the incoming stream. Verified with a self-checking SystemVerilog testbench containing over 150 directed and randomized test cases covering field decoding, error handling, and message alignment across the supported message set.
More Info
A full RV32I RISC-V CPU implementation that combines two VLSI design methodologies: a hand-laid-out custom datapath from a previous project, and an auto-place-and-routed control unit synthesized from SystemVerilog. The custom standard cell library was packaged into Liberty (.lib) and LEF formats so Synopsys Design Compiler could synthesize the controller and Cadence Innovus could place and route it. The PnR'd controller was then imported back into Virtuoso and integrated with the bitsliced datapath to form the complete CPU. Includes an additional flow where the entire CPU is auto-PnR'd using the custom register file packaged as a hard macro.
More Info
A full-custom VLSI implementation of a single-cycle RV32I RISC-V processor datapath, designed entirely in Cadence Virtuoso from the transistor level up. The datapath is built as 32 stacked instances of a single hand-laid-out bitslice containing the register file, ALU, comparison unit, and PC logic. All standard cells (NAND, NOR, AND, OR, XOR, inverters, flip-flops, muxes, etc.) were custom-designed with manually drawn schematics and layouts, verified through DRC and LVS. The project demonstrates an end-to-end physical design flow from individual CMOS gates up to a working processor.
More Info
A Unix-like operating system kernel written from scratch in C and RISC-V assembly for 64-bit RISC-V. Includes a full memory subsystem with Sv39 virtual memory and lazy allocation, processes with fork() and exec(), preemptive scheduling, a complete syscall interface, and a custom FAT-inspired filesystem (NGFS) built on top of a block cache and VirtIO block driver. User-facing components include UNIX-style pipes, an ELF loader, a feature-rich shell supporting pipes, redirection, and background execution, and a set of standard user programs (cat, ls, wc, xargs, touch, rm, date, init).
More Info
A 3D datacenter telemetry and monitoring system that simulates an 8-rack topology and captures power, thermal, network, airflow, and storage metrics at 1 Hz. The system implements health-severity logic across five operational domains to track, classify, and visualize incidents in real time, with severity transitions reflected in the 3D view as the simulation runs. Telemetry is exported through Prometheus and surfaced on three purpose-built Grafana dashboards, with careful attention to data consistency between the simulation core, metrics layer, and visualization stack.
More Info
A CUDA implementation of GPT-2's forward pass (inference) optimized for NVIDIA GPUs. Implements and tunes all core transformer kernels from scratch — multi-head self-attention with causal masking, layer normalization, matrix multiplication, GeLU activation, residual connections, and the input embedding/encoder — to run an autoregressive decoder-only language model end-to-end on the GPU. Profiled and iteratively optimized using NVIDIA's profiling tools, with kernel-level optimizations such as shared-memory tiling, memory coalescing, kernel fusion, and warp-level primitives. The result is a working GPT-2 you can prompt for text completion.
More Info
An FPGA-based rhythm game built from scratch in SystemVerilog, featuring 640×480 VGA output at 60 Hz, PS/2 keyboard input, and 44.1 kHz audio playback. Core RTL modules, including the VGA controller, audio pipeline, keyboard interface, and game-logic state machines, were all custom-designed and integrated on the FPGA fabric. Supports SD-card-based song loading for up to 10 playable tracks, with timing-accurate note scheduling synchronized to the audio clock. A MicroBlaze soft CPU was also explored as an alternative path for audio scheduling and asset management.
More Info
A compact 8-channel, 24-bit ADC daughterboard designed for high-precision analog signal acquisition at 1 kHz sample rate. The board exposes both SPI and I2C interfaces for host communication, includes RC input filtering on each channel for unipolar single-ended measurements, and was custom-routed for clean analog and digital signal integrity. The final design runs on a single 3.3 V supply, uses 0603 passives throughout, and features castellated edges for easy reflow-mounting onto carrier boards, reducing the overall footprint while keeping the module reusable across projects.
More Info
A pet-owner resource website built as a multi-page web app, designed to help current and prospective pet owners manage the day-to-day realities of pet ownership. Includes a vaccine diary for tracking shots, a name picker, species-specific information pages for dogs, cats, birds, and aquatic pets, and embedded maps for finding nearby vets, pet-friendly restaurants, and walking areas. Also features curated guides on common pet scams and adoption advice.
More Info
An educational web platform built to help middle and high school students at DPS Sharjah (Grades 8 to 10) excel in their exams through curated study resources and sample papers. The site is organized subject by subject, with dedicated pages for each course, downloadable practice documents, and a clean navigation structure designed for quick access during exam prep.
More Info