February 2nd, Wednesday 14:15, Jacobs Building Room 303

Title: Task Superscalar Multiprocessors

Lecturer : Yoav Etsion

Lecturer homepage : http://www.bsc.es/staff.php?p_id=369

Affiliation : Barcelona Supercomputing Center (BSC)

 

Parallel programming is notoriously difficult and is still considered an artisan's job. Recently, the shift towards on-chip parallelism has brought this issue to the front stage. Commonly referred to as the Programmability Wall, this problem has already motivated the development of simplified parallel programming models, and most notably task-based models.

In this talk, I will present Task Superscalar Multiprocessors, a conceptual multiprocessor organization that operates by dynamically uncovering task-level parallelism in a sequential stream of tasks. Task superscalar multiprocessors target an emerging class of task-based dataflow programming models, and thus enables programmers to exploit manycore systems effectively, while simultaneously simplifying their programming model.

The key component in the design is the Task Superscalar Pipeline, an abstraction of instruction-level out-of-order pipelines that operates at the task-level and can be embedded into any manycore fabric to manage cores as functional units. Like out-of-order pipelines that dynamically uncover parallelism in a sequential instruction stream and drive multiple functional units, the task superscalar pipeline uncovers task-level parallelism in a stream of tasks generated by a sequential thread. Utilizing intuitive programmer annotations of task inputs and outputs, the task superscalar pipeline dynamically detects inter-task data dependencies, identifies task-level parallelism, and executes tasks out-of-order. I will describe the design of the task superscalar pipeline, and discuss how it tackles the scalability limitations of instruction-level out-of-order pipelines.

Finally, I will present simulation results that demonstrate the design can sustain a decode rate faster than 60ns per task and dynamically uncover data dependencies among as many as ~50,000 in-flight tasks, using 7MB of on-chip eDRAM storage. This configuration achieves speedups of 95-255x (average 183x) over sequential execution for nine scientific benchmarks, running on a simulated multiprocessor with 256 cores.

Short bio: Yoav Etsion is a Juan de la Cierva Fellow and a Senior Researcher at the Barcelona Supercomputing Center (BSC), where he is a member of the Heterogeneous Architectures Group. His research interests span all aspects of computing systems, and specifically computer architecture, operating systems, and parallel systems. Yoav received his BSc, MSc, and PhD in 1998, 2002, and 2009, respectively, all in Computer Science, and all from the Hebrew University.