(PhD06) Modeling Performance for Reconfigurable Computing with Structured Parallelism
Performance Analysis and Optimization
TimeMonday, June 17th1pm - 6pm
DescriptionThe research examines how reconfigurable computing can be integrated in the context of structured parallelism.
Algorithmic skeletons have been proposed to decouple the semantics of an algorithm from its implementation. They allow a programmer to specify the algorithm in an abstract and concise way without hardware-dependent instructions. A runtime then uses the description in form of skeletons to use the provided hardware parallelism. One source of hardware parallelism are reconfigurable devices, such as FPGAs. They can offer speedups or energy savings compared to CPUs and GPUs but require additional configuration-phases, some even allow partial reconfiguration.
There are a some challenges when it comes to the integration of FPGAs.
1. The runtime for high-level codes on FPGAs is hard to estimate. One solution could be to assume the worst-case memory-bound performance.
2. Partial reconfiguration adds complexity that needs to be accounted for by the cost model.
3. When the CPU and the FPGA share the same connection to the memory, they inevitably influence each others execution, adding more complexity to consider.
Research questions are the following:
- Given a composition of algorithmic skeletons, how can the execution be mapped to the hardware, especially if CPUs and FPGAs are available at the same time.
- Does the memory bottleneck leave useful HPC applications where FPGAs should be used with high-level tools when targeting performance?
- What is a cost model for offloading to reconfigurable devices in a high-level and possibly distributed programming context?
The implementation is build around an already existing C++ PGAS library. It used C++17 ranges to describe the algorithmic skeletons.