

## Applications Development with FPGAs Simulate but Verify



G. Wenes, J. Maltby, D. Strenski *CUG 2005* Albuquerque, NM

### **CRA**

## Simulate but Verify

- Applications Development with FPGAs
  - Reconfigurable Computing (RC)
    - Applications acceleration with dedicated or special-purpose HW
    - Programmable environment
  - Not a familiar or common programming paradigm in HPC

- Emerging Class of Applications in Complex Network Analysis (CNA)
  - Verification, rather than simulation, at their core
  - Roots in circuit design but ...
    - Much indebted to theoretical computer science
  - HPC class type of applications
    - EDA
    - KDD









## XD1 Architecture and FPGAs



RAY

×



**Chassis Rear** 



## Cray XD1 System Architecture



Compute

- 12 AMD Opteron 32/64 bit, x86 processors
- High Performance Linux

### RapidArray Interconnect

- 12 communications processors
- 1 Tb/s switch fabric Active Management
- Dedicated processor
  Application Acceleration
  - 6 co-processors (FPGAs)

### Processors directly connected via integrated switch fabric

## **Application Acceleration FPGA**

Application Accelerator



### **Application Acceleration**

- Reconfigurable Computing
- Tightly coupled to Opteron
- FPGA acts like a programmable coprocessor
- Performs vector operations
- Well-suited for:
  - Searching, sorting, signal processing, audio/video/image manipulation, encryption, error correction, coding/decoding, packet processing, random number generation.

## SuperLinear speedup for key algorithms



## **Applications Acceleration**





## **Application Acceleration FPGA**

Well suited for:

Searching, sorting, signal processing, audio/video/image manipulation, encryption, error correction, coding/decoding, packet processing, random number generation ...

But also:



Seismic imaging, molecular dynamics, bioinformatics, ...

Fine-grained parallelism applied for 100x potential speedup





## **Reconfigurable Computing**

The Barriers to Reconfigurable Computing

- Starving the FPGA
  - Bandwidth and latency to the FPGA limited by PCI bus
- FPGA, Processor Interaction
  - Job scheduling, Linux integration, memory mapping
- Programming Tools
  - Programming hardware requires special tools, special expertise









## Processor to FPGA and vice versa



- Since the Acceleration FPGA is connected to the local processing node through its HyperTransport I/O bus, the FPGA can be accessed directly using reads and writes.
- Additionally, a node can also transfer large blocks of data to and from the Acceleration FPGA using a simple DMA engine in the FPGA's RapidArray Transport Core.
- The Acceleration FPGA can also directly access the memory of a processor. Read and write requests can be performed in bursts of up to 64 bytes.
- The Acceleration FPGA can access processor memory without interrupting the processor.
- Memory coherency is maintained by the processor.





## Processor to FPGA and vice versa



|              | array | pointer | memcpy |
|--------------|-------|---------|--------|
| Write (MB/s) | 1260  | 1320    | 1320   |
| Read(MB/s)   | 5.94  | 5.95    | 6.01   |

J. Tripp et al, FCCM'05

01010 01010 0100 -



## **Reconfigurable Computing**

The Barriers to Reconfigurable Computing

- Starving the FPGA
  - Bandwidth and latency to the FPGA limited by PCI bus
- FPGA, Processor Interaction
  - Job scheduling, Linux integration, memory mapping
- Programming Tools
  - Programming hardware requires special tools, special expertise



## **FPGA Linux API**

- Administration Commands
  - fpga\_open allocate and open fpga
  - fpga\_close
  - fpga\_load

AMO A

- **Control Commands** 
  - fpga\_start
  - fpga\_stop
- Status Commands
  - fpga\_status
- get status of fpga

stop fpga

- Data Commands
  - fpga\_put
- put data to fpga ram

- close allocated fpga

load binary into fpga

– start fpga (release from reset)

- fpga\_get get data from fpga ram
- Interrupt/Blocking Commands
  - fpga\_intwait blocks process waits for fpga interrupt

# Programmer sees get/put and message passing programming model



## **Reconfigurable Computing**

The Barriers to Reconfigurable Computing

- Starving the FPGA
  - Bandwidth and latency to the FPGA limited by PCI bus
- FPGA, Processor Interaction
  - Job scheduling, Linux integration, memory mapping
- Programming Tools
  - Programming hardware requires tools, expertise





## **Applications Development Framework**



### 

A#0

N

1

01010 01010

×

## **FPGA** Development Flow





## **Three-Phase Implementation**

- Traditional Programming Model
  - VHDL, Verilog
- Off-The-Shelf Libraries
  - Cray and third party acceleration libraries
  - Prepackaged, turnkey *applications*
- High-Level Compilers
  - C, Graphical, Matlab, ...

- Leverage of existing IP Cores (Xilinx, OpenCore.org) or industry/academic initiatives(OpenFPGA)
- Academic collaborators





## First Level Abstraction: VHDL

### http://www.eda.org/fphdl/

### **Floating-Point HDL Packages Home Page**

Working on a floating point synthesis package for VHDL and Verilog based on <u>IEEE 754</u>. We are a task force assigned to the <u>1076.3 working group</u> and will be releasing our code as part of that IEEE PAR.

### **Group Objectives:**

Create a parameterized package for variable width floating point in both VHDL and Verilog.

|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Vhdl\R15CMCU\vhdl\v_iodecoder.vhd]<br>Project Build Document Macro Tools Window Help                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | - 🗆 X     |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |           |
| C.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | <pre>rd_tifr &lt;= '0';<br/>rd_mcure &lt;= '0';<br/>rd_tcer0 &lt;= '0';</pre>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |           |
| rd_gimsk ieee<br>rd_io ieee                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <br>d: 51 |
| Image: state | <pre>rd_io, wr_io: in std_logic;<br/>rd_sres, wr_sres: out std_logic;<br/>rd_sres, wr_simsk, rd_timsk, wr_tifr: out std_logic;<br/>rd_wret, wr_neurc, rd_teer0, wr_teer0, rd_tent0,wr_tent0 : out std_logic;<br/>rd_portb, wr_portb, rd_ddrb, wr_ddrb, rd_pinb : out std_logic;<br/>rd_portb, wr_portc, rd_ddrc, wr_ddrc, rd_pinc : out std_logic;<br/>rd_portd, wr_portd, rd_ddrd, wr_ddrd, rd_pind : out std_logic;<br/>rd_portd, wr_portd, rd_ddrd, wr_ddrd, rd_pind : out std_logic;<br/>rd_portd, wr_portd, rd_ddrd, wr_ddrd, rd_pind : out std_logic;<br/>);</pre> |           |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |           |

## Higher Level Abstractions: C Languages

### Requirements

- At a minimum:
  - Multi-threaded
  - Communication and synchronization channels
  - Bit level manipulations
- Others:
  - Good-to-excellent QoR
  - Target architectures
  - Standards?





# Higher Level Abstractions: Mobius (www.codetronix.com)

- Pascal-like CSP based language
  - Types ,records, arrays, fp arithmetic
- Synchronization and communication by handshaking over channels
- Generate HW, SW or HW/SW code
- General purpose & dataflow algorithms





## **FPGA** Applications: **DES**







## **FPGA** Applications: **AES**





## **FPGA** Applications







The Supercomputer Company

## **FPGA** Applications: 1D Convolution



 $P^*(i) = \sum_{n=-N}^{N} w(n) P(i-n)$ 

## **FPGA** Applications: 1D Convolution



**Convolution** Unit







## **Example: Smith-Waterman**







## **Biosciences: Molecular Design**

- RC for Molecular Modeling & Docking
  - Reduced Precision
    - Fidelity of observed macro phenomena
    - High frequency/low frequency
  - Critical resource: hard multipliers
  - O(N logN) methods







## Sample Applications for Acceleration

Kirchhoff Pre-stack Time Migration

- Estimated or preliminary speed-ups of 10-50 versus 2.4 GHz Pentium 4
  - 50M Kirchhoff summations per second
- Power consumption < x2
- Footprint ~ x1.0



(TAMU)

### Cost Analysis

- Commodity industry cost structure is driven by
  - Cost of infrastructure
  - Cost of operation (power, cooling, ...)
- Move processing infrastructure closer to acquisition infrastructure





## **Verification Languages**





## **Verification Languages and LTL**

- Linear Temporal Logic (LTL)
  - The usual Boolean propositional logic (AND, OR, NOT)
  - Temporal operators (NEXT, UNTIL)
  - Quantifiers (FORALL, EXIST-ONE)
- In EDA:
  - Mathematical basis of verification languages
    - Mathematical syntax akward
- In CS:
  - Concurrency and computer-aided SW verification
- In CNA (complex network analysis):
  - Control (output regulation, stabilization, ...) of complex systems



## What is Network Analysis...?

### Goal:

Develop comprehensive, mathematically rigorous, and empirically grounded framework within which to understand complex networks and apply understanding to real world problems.

### **Applications of interest:**

- Social systems (e.g., terrorist networks, WMD programs, socioeconomic systems);
- Biological networks (e.g., gene regulatory networks, metabolic, protein interaction);
- Technological systems (e.g., EP grids
- Information systems (e.g., www);

### **Analysis:**

₩
 ₩
 ₩
 ₩
 ₩
 ₩
 ₩

- Information extraction
- Network control



## What are Networks ...?

Nodes:

- Extremely simple dynamic systems
  - (low cognition agents, Boolean), ...

### Edges:

• Uni/bi-directional

### Interaction (Control, Communication, and Synchronization):

• Linear Temporal Logic (LTL)

### Questions asked:

- Is there a solution?
- Finite number of transitions?
- Is there a path?
- What is the shortest path?



## **Gene Regulatory Networks in LTL**

### Drosophila gene regulatory network

Standard model for segment polarity gene network admits finite bisimulation which preserves state space equilibrium structure and, therefore, gene expression patterns.



gene interaction network

sample vertex update rules



### Drosophila gene regulatory network (Colbaugh, Glass analys<u>is)</u>

\_ 🗆 🗙

wg

en







### mutant equilibrium





normal equilibrium

Stop

Start