FPGA AI Hardware Accelerator

SystemVerilog Xilinx AXI4 Vitis HLS

A pipelined systolic array AI accelerator designed for FP16 matrix multiplication, achieving 1 output per cycle throughput.

Architecture

The core is a parameterizable systolic array — the SystemVerilog generator can produce any N×N configuration with automatic data forwarding between processing elements. FP16 multiply-accumulate blocks were generated using Vitis HLS.

The accelerator connects to on-chip BRAM through a Xilinx AXI BRAM Controller, bridging the systolic array and memory over a standard AXI4 interface.

block diagram / resource utilization

Toolchain

Used Verilator, Yosys, and OpenSTA for rapid iteration and early timing analysis during development. Final synthesis and timing closure done in Vivado targeting Xilinx silicon.

Results

Sustained 1 output/cycle throughput for matrix multiply operations. Parameterizable design allows scaling to different array sizes depending on target FPGA resource budget.

← back to projects