Project Abstract

The OptiMMA project will enable the mapping of emerging, dynamic software applications on complex Multi-Processor Systems-on-Chip (MP-SoC).

The main research objective is to realize a breakthrough in the development of scalable, dynamic resource requests in embedded software. This is achieved through the use of a runtime resource manager that mediates between embedded software and the hardware platform. The resource manager manages -at runtime- the memory storage, energy consumption, bandwidth and computation resources of the embedded system. The economic actors in Flanders that are targeted by this technology are actors that specialize in multimedia and telecommunication applications on mobile devices, medical imaging devices, embedded software design, hardware platforms design, design tools, etc.


During the first 2.5 years of the project we have realized a prototype tool-assisted approach that aims at optimizing applications for heterogeneous systems-on-chip. The input to the toolchain consists of a software description and a platform descripton. The software description eventually is a high-level synchronous data flow model annotated with timing informations. This model can either be given by hand (with timing estimates in that case) or is constructed starting from an existing application. The platform description is a high-level description detailing the processing elements, memories and interconnect that describe the target hardware.

The software and hardware description are fed into an interactive tool to identify different runtime scenario’s for the application. For each scenario the design-time exploration tool determines the most optimal resource mappings of software tasks to processing element (including what voltage level to run the processor at), data to memory, and communication to network element. The optimizations takes into account the energy consumption savings while satisfying all application constraints like QoS and throughput requirements and platform constraints like Processing elements, memory and communication resource capacity constraints.

The toolchain was validated on two application case studies: a 3D game engine and a medical imaging application.

First of all we have applied it on a Wavelet Subdivsion Surface (WSS) based 3D scalable graphics game engine (3D-WSS) to demonstrate the research objectives on generic heterogeneous multiprocessor SoC. We used the toolchain starting from the source code of the application and mapped it to an extended TI- OMAP3530 Multiprocessor comprised of four RISC processors (Strong ARM 1100x) and two VLIWs with eight FUs each (TI-C64X+) connected by a bus. The StrongARM 1100x supports DVFS (Dynamic Voltage Frequency Scaling) knobs which can run at (1.48V, 133MHz) and (1.96V, 200MHz). The TI-C64X+ also supports DVFS knobs which can run at (1.8V, 500MHz) and (1.2V, 200MHz).

Using the toolchain we implemented the customizable runtime components for the processing elements, memories and shared bus communication, including the runtime scenario detection logic to detect in which scenario the application is present at run-time, and run-time resource managers for Processing elements, memories and communication.

Our experimental results for the game engine show energy savings from 50% up to a factor of 8 while satisfying all application constraints.

In another experiment we focused on the performance of the constraint-based optimization process and compared it to state-of-the-art design-time otimization tools. These can roughly be subdivided into two groups, the industrial tools and the academic tools. The former explore mappings of tasks to processing elements and take worst case assumptions for memories and communication. The academic tools go one step further and take either memories or communication cost into account while using worst-case assumption for the other. This contrasts to our approach that simultaneously takes everything into account. Fo software we used a medical imaging application that helps physicians detect brain tumors by extracting contours from images of the brain (cavity detector application). We used the toolchain to do the design-time exploration. The hardware platform was the one we described earlier. Our evaluation shows that compared to industrial tools we gain from ∼70% to 4× on performance axis or between 25% to 70% on energy axis. Compared to academic tools we gain between 30% to 2× on performance axis or between 5% to 25% on energy axis. While producing better results our tools are also faster in producing these results.



 



Home Project Scope Partners Meetings Archive Dissemination WIKI Reflections

 Copyright or other proprietary statement goes here.
For problems or questions regarding this Web site contact [ProjectEmail].

webstatistics