Co-processor Offloading

Applied to Passive Coherent Location

With Doppler and Bearing Data


Joseph Milburn Masters Dissertation


AIM: To accelerate a filtering algorithm for tracking aircraft by offloading key computationally expensive sections to a custom hardware architecture coprocessor



Filtering Algorithm:

The algorithm, developed by Dr Norman Morrison, uses non-linear differential correction to develop a state vector for a target using doppler and bearing data as input.

The algorithm is suitable for use with a low cost Passive Coherent Location (PCL) system. PCL radar systems have low procurement, operation and maintenance costs as they substitute high cost hardware with algorithmic complexity. This is of particular benefit when deployed for aircraft traffic control in 3rd world African countries. Many African airports lack radar assisted air traffic control, which is a contributing factor to Africa's high aircraft accident to flying hours ratio, the highest in the world.


Air Traffic control with a PCL system

The doppler and bearing input is fed into an Expanding Memory Polynomial filter of degree 1 for initialization. EMP filtering finds the straight line that best fits the observations in the sense of least squares. An initial state vector is developed from the smoothed data, specifying the aircraft's position and velocity in two dimensions.

After initialization, control is passed to the differential correction algorithm. The algorithm uses the following inputs:

         Y(n): a set of radar observations (doppler and bearing) gathered over time by a network of receivers.

         rcvr_fd_σ and rcvr_ψ_σ : the observation variable variances for each receiver

         Xbar and Xbardot: the initial approximation to the state vector from track initialization

         Φ(n,n-1): a transition matrix translating state between time instances where: Xn = Φ(n,n-1) . Xn-1

Differential correction combines data from a number of radar receivers using the minimum variance rule. New data is incorporated as it is received in order to continually correct the model in an optimal fashion. The model produced by is a polynomial which describes the trajectory of the aircraft in 3dimensions.

The algorithm is computationally intensive and involves high dimension matrix-matrix multiplication.

Research objectives:

         Identify the most computationally intensive areas of the algorithm.

         Port the algorithm to a hardware platform with a High Performance Computing (HPC) component

         Test the performance of the algorithm on the HPC system as compared to the base system

Hardware and Implementation:

The traditional method of increasing computing performance by increasing clock speed and transistor density of a von Neumann architecture CPU is now facing a brick wall limit. This limit is both in terms of the physical size of transistor technology and clock speed due to excessive power consumption and heat dissipation. Therefore innovations in computer architecture must substitute for the traditional method. One approach to increasing compute performance that has received attention in recent times is the use of co-processor accelerator boards. These boards offer dramatic power and space benefits over other forms of HPC while offering impressive performance increases over conventional serial computing architectures.

The co-processor used for this project was the ClearSpeed Advance e620 accelerator board, which fits into the PCIExpress (PCIe) slot of a conventional computing platform. The e620 is said to be the fastest and most power efficient double-precision 64-bit floating point processor in the world. The e620 consists of two CSX600 embedded parallel processors communicating via the ClearConnect busbridge ports. One CSX600 contains 96 execution cores or processing elements (PEs), each of which is

a Very Long Instruction Word (VLIW) core. The card has 1GB of local DDR2-400 SDRAM shared between the two CSX600s via a common 32 or 64 bit address space. On one end of the ClearConnect bus is an FPGA which implements the host interface which may be changed for performance or functionality upgrades. Under standard operation the board is capable of 50GFLOPS of sustained performance and draws 25W which illustrates its impressive performance per watt metric. The card costs around US$8000 before volume discounts.

The CSX600 processor It is programmed in C, and supports Level 3 BLAS, FFTW and LAPACK. Code is enabled to run on the CSX600 by making Clearspeed library function calls. The runtime environment of the Clearspeed software package uses heuristics to determine how to split processing between the host CPU and the advance accelerator card.


Clearspeed Advance architecture block diagram (


CSX600 MTAP architecture (

The hardware being used is installed at the Centre for High Performance Computing (CHPC), a division of the Council for Scientific Research (CSIR) in South Africa.






Radar Remote Sensing Group
Department of Electrical Engineering
University of Cape Town

Contact Details:

Joseph Milburn




CHPC Contact:


CSIR Campus

Lower Hope St.


Cape Town

South Africa

Rosebank 7700

Phone: 021 658 2758


RRSG Contact:


Room 6.17 Menzies 6th floor

Radar Remote Sensing Group
Department of Electrical Engineering
University of Cape Town
Private Bag
Rondebosch 7701
South Africa

Phone: +27 (0)21 650 3756