Advanced Encryption Standard (AES) implementations on Field Programmable Gate
Arrays (FPGA) commonly focus on maximizing throughput at the cost of utilizing
high volumes of FPGA slice logic. High resource usage limits systems' abilities
to implement other functions (such as video processing or machine learning)
that may want to share the same FPGA resources. In this paper, we address the
shared resource challenge by proposing and evaluating a low-area, but
high-throughput, AES architecture. In contrast to existing work, our
DSP/RAM-Based Low-CLB Usage (DRAB-LOCUS) architecture leverages block RAM tiles
and Digital Signal Processing (DSP) slices to implement the AES Sub Bytes, Mix
Columns, and Add Round Key sub-round transformations, reducing resource usage
by a factor of 3 over traditional approaches. To achieve area-efficiency, we
built an inner-pipelined architecture using the internal registers of block RAM
tiles and DSP slices. Our DRAB-LOCUS architecture features a 12-stage pipeline
capable of producing 7.055 Gbps of interleaved encrypted or decrypted data, and
only uses 909 Look Up tables, 593 Flip Flops, 16 block RAMs, and 18 DSP slices
in the target device.