Thursday, October 30, 2025
View ProjectThis digital design project addresses a critical performance bottleneck in high-speed FPGA network acceleration: external memory latency. In 100 Gb/s monitoring systems, updating flow counters requires a Read-Modify-Write (RMW) sequence. Traditional pipelines frequently stall during these cycles, especially when multiple requests target the same address, leading to significant throughput degradation.
To solve this, I engineered a novel ALU architecture in SystemVerilog that utilizes a specialized Transaction Table to aggregate operations. By exploiting mathematical associativity rather than data locality, the system masks memory latency and maintains high throughput even under high-collision workloads.
Technical Implementation & Architecture
The core innovation is a local Transaction Table that intercepts and stores partial sums from multiple updates targeting the same memory address. This allows the ALU to accept new requests immediately without waiting for the slow external memory handshake to complete.
Key engineering highlights include:
Quantitative simulation results under a high-collision workload (6 consecutive operations targeting the same address) demonstrate a massive performance gain over standard pipelined architectures:


