Jornada de Seguimiento de Proyectos, 2009 Programa Nacional de Tecnologías Informáticas Hardware Multitasking on 1, 2 and 3 Dimension FPGA Architectures: Task Scheduling and Placement Techniques and Defragmentation Strategies (TIN2006-03274) Daniel Mozos Facultad de Informática, Universidad Complutense de Madrid Abstract The main goal of this project is to develop a framework of techniques to allow efficient hardware multitasking on dynamically reconfigurable FPGAs. The target architecture models range from commercial architectures such as Virtex, 1D reconfigurable, to academic architectures with very promising features, such as 3dimensional ones. It also includes 2D architectures that nowadays focus the interest of many researchers. The set of techniques we want to develop will include enhanced versions of the basic set of scheduling strategies of a previous project, but now taking into account task data dependencies and task pre-emption. We will also study the allocation of space for task configuration and data on the different architecture models. Moreover, we will tackle the free space fragmentation problem and we will propose solutions involving local and global defragmentation processes. Finally, we will develop techniques to reduce the penalty due to the reconfiguration latency, such as a memory hierarchy and task prefetch and preload. Keywords: Dinamically reconfigurable hardware, multitasking, reconfiguration latency 1 Project goals The main objective of this project is to develop the appropriated support to take full advantage of the implementation of an efficient hardware multitasking system on top of dynamically reconfigurable FPGAs. This technology is frequently used in embedded systems, and it is especially interesting to deal with the exigent features of multimedia applications on portable devices. The specific goals of this project are: 1.- Extending the basic scheduling policies that were developed in our previous project taking into account inter-task dependencies, time restrictions, and task priorities, and, in particular, the possibility of evicting a task in order to make room for another one with more priority. In this case, TIN2006-03274 the state of the evicted task must be saved in order to properly continue its execution as soon as enough resources are available. 2. - Developing a memory manager that assigns the memory resources available in the FPGAs to the running tasks. This manager will guarantee that all the tasks have access to their data no matter where they have been allocated. In order to guarantee this access and the I/O requirements, different communication network topologies will be study. 3.- Studying the implications of working with 3D FPGAs and developing the basic infrastructure to manage a hardware multitasking system in this new technology. The first step will be to select the most appropriated model to represent the architecture and the task in three dimensions. Then, we will analyse different possibilities to allocate the tasks in the 3D resources, taking into account that a task must always have access to a communication network, and that the space must be managed in a near optimal way in order to prevent fragmentation, and to find a suitable allocation for each task as fast as possible. Hence, we will try to develop a task allocation heuristic that minimise fragmentation but taking into account that its complexity must be small enough to be applicable at run-time. 4. - Proposing architectural improvement and methodologies to reduce the penalisation due to dynamic reconfigurations. The reconfiguration process introduces two different types of penalisations. On the one hand, its latency generates delays in the system execution. On the other hand, in order to carry out a reconfiguration large amount of data is moved from an external memory to the FPGA, and this process involves a significant energy penalisation. Both penalisations can be reduced if a configuration memory hierarchy is included in the system. Thus, those configurations that probably are going to be executed in the near future can be stored in a on-chip configuration cache that will be faster and more energy efficient. In addition, we will also study the possibility of working with multi-context FPGAs. Finally, we will extend our previous scheduling and replacement techniques to optimise the use of the configuration memory hierarchy. 5.- Developing a fragmentation manager that monitors the fragmentation level of the FPGA and takes actions to reduce it. These actions can be triggered by the user or by the manager itself if the system surpasses a given threshold. In addition, we will consider the possibility of applying a defragmentation process to the whole platform or just to a certain region. Time scheduling MODULE M01 M02 M03 M04 M05 Year 1 Year 2 Year 3 TIN2006-03274 2 Level of success reached in the Project The level of success reached in the project after two years has been very high. I would like to show some modifications in the time scheduling of the different modules due to small changes in the researchers. These changes are temporal movements between modules. The modules MO1 and MO2 have been shifted to the last 18 months of the project and, the modules MO4 and MO5 have been moved forward to the first two years of the project. So at this moment, we have finished modules MO4 and MO5; MO3 is very advanced; and MO1 and MO2 are in their first stages. Following, I will explain the scientific results of every module: MO3: We have developed a task allocation algorithm that is able to decide where to locate the tasks that make up an application at run time. One of the main problems in these algorithms is the maintaining of the information about the free space on the 3D FPGA. We have used a vertex list that represents the perimeter of the free space. To find a place where to allocate a new task the algorithm only travels the vertices that are in this list. In the cases, when a task can be located in more than one point we have developed a set of heuristics that try to choose this point maximizing a cost function. The definition of a suitable cost function is other of our achievements. The final goal when defining the cost function is that the 3D FPGA will be able to allocate as most tasks as possible. MO4: In order to reduce the reconfiguration overhead, we have studied two problems: the reconfiguration latency and the reconfiguration energy penalisation. Referring to the reconfiguration latency, our goal is to drastically reduce reconfiguration overhead, making partial reconfiguration on dinamically reconfigurable hardware resources effective even for highly dynamic applications that demand frequent partial reconfigurations and have very tight deadlines. To this end, our reconfiguration manager applies two different techniques at runtime: prefetch scheduling and replacement. In our experiments these two techniques have eliminated from 93% to 100% of the initial execution time overhead. The second problem tackled in this module is the reduction of the reconfiguration energy consumption. Carrying out run-time reconfigurations often involves a costly reconfiguration overhead not only in execution time but also in energy consumption. Our approach tries to reduce the reconfiguration energy overhead as well. To this end, a configuration memory hierarchy is proposed, with a shared memory layer consisting of a module optimised for performance combined with a module optimised for energy-efficient accesses. For this hierarchy, we have developed a mapping algorithm that decides where to load each configuration in order to achieve significant energy savings without introducing any performance degradation. In our experiments, our algorithm found an optimal mapping for performance, although consuming 22.5% less energy per configuration loaded. TIN2006-03274 MO5: We have developed a Defragmentation Manager that considers two different defragmentation heuristics, each one for a different kind of situation: • First, a routine, preventive defragmentation will be initiated if an alarm is fired when a high fragmentation FPGA status is detected. This preventive defragmentation is desired but not urgent, and will be performed only if time constraints for currently running tasks are not too severe. • Second, an urgent on-demand defragmentation will be initiated, if an arriving task cannot find a suitable location in the FPGA, though there is enough free area to accommodate it. This emergency defragmentation will try to get room by moving a single currently running task. The results of these two defragmentation heuristics are very interesting reducing considerably the number of rejected tasks on complex and dynamic applications. As it was mentioned previously modules MO1 and MO2 have not produced significant results because they are in their first stages. 3 Result indicators • Publications So far the Project has been very successful in terms of publications in top-level international conferences and journals. The publications in this period include: • 5 papers in international journals listed in the JCR index. • 12 papers in international conferences • 4 papers in national conferences The full list of publications is given in the references section at the end of this report. • Human resource training Seven Master´s Theses were successfully completed by members of the project team under the supervision of researchers of the team: • C. González Calvo. Un gestor de ejecución de grafos de tareas para sistemas multitarea dinámicamente reconfigurables. Universidad Complutense de Madrid, jun. 2008. Supervisor: J. Resano Ezcaray. • J. A. Clemente Barreira. Planificación de grafos de tareas para sistema multi-proceso dinámicamente reconfigurables. Universidad Complutense de Madrid, sept. 2008. Supervisor: J. Resano Ezcaray. • L. Sánchez Conde. Gestión de área reconfigurable y jerarquía de memoria para multitarea HW sobre FPGAs. Universidad Complutense de Madrid, jun. 2007. Supervisor: D. Mozos Muñoz. • J. A. Valero Martín. Colocación de tareas hardware en sistemas reconfigurables de tres dimensiones. Universidad Complutense de Madrid, jun. 2007. Supervisor: Julio Septién del Castillo. TIN2006-03274 • M. A. García de Dios. Entorno de desarrollo para ubicación de tareas en multitarea hardware 2D. Universidad Complutense de Madrid, jun. 2007. Supervisor: Hortensia Mecha López. • R. Sánchez Delgado. Gestor de tareas para hardware dinámicamente reconfigurable 2D. Universidad Complutense de Madrid, jun. 2007. Supervisor: H. Mecha López. • C. Gallego Carricondo. Métricas de fragmentación y estrategias de defragmentación sobre hardware dinámicamente reconfigurable. Universidad Complutense de Madrid, jun. 2007. Supervisor: D. Mozos Muñoz. Three Ph. D. Theses developed by members of the group are in their final steps: • J. Tabero Godino, Técnicas de ubicación de tareas y defragmentación en sistemas dinámicamente reconfigurables, Presentation date: March, 15th. 2009; supervisors: J. Septién del Castillo, H. Mecha López. • S. Román Navarro, Planificación de ejecución multitarea hardware en dispositivos dinámicamente reconfigurables basada en particiones. Expected presentation date: June, 2009; supervisors: J. Septién del Castillo, H. Mecha López, D. Mozos Muñoz. • E. Pérez Ramo, Técnicas de reducción del consumo sobre hardware dinámicamente reconfigurable. Expected presentation date: June, 2009; supervisors: J. Resano Ezcaray, D. Mozos Muñoz. • Collaboration with international research groups Our group has a strong research relationship with the Research Center Interuniversity MicroElectronics Center (IMEC) at Lovaina, Bélgica. This collaboration has produced several internships of members of our group at IMEC and in this period two joint publications in journal papers and one in an international conference as can be seen in the following list of references. • Collaboration with national research groups Our group has a research relationship with the Instituto Nacional de Técnica Aeroespacial, INTA. This collaboration has produced several joint publications, and the development of the Ph. D. Thesis of J. Tabero Godino, researcher of this institution. 4 References Journal papers (indexed in the ISI JCR): [FeMo07] J. Fernández-Conde, D. Mozos. “Efficient scheduling for mobile time-constrained environments”, IET Electronics Letters, vol. 43 nº 22, 25th October 2007, pp.12141215. TIN2006-03274 [RCGM08] J. J. Resano, J.A. Clemente, C. González, D. Mozos, F. Catthoor. “Efficiently scheduling run-time reconfigurations”, ACM Transactions on Design Automation of Electronic Systems vol. 13, pp. 58-70, 2008. [RMMS08] S. Román, D. Mozos, H. Mecha, J. Septién. “Constant Complexity Scheduling for Hardware Multitasking in 2D Reconfigurable FPGAs”. IET Computer and Digital Techniques, vol.2, n.6, pp. 401-412, 2008. [RRMC07] E.P. Ramo, J. Resano, D. Mozos, F. Catthoor. “Memory hierarchy for highperformance and energy-aware reconfigurable systems”, IET Computer and Digital Techniques, vol. 1, n. 5, pp. 565-571, 2007. [TSMM07] J. Tabero, J. Septién, H. Mecha, D. Mozos. “Allocation Heuristics and Defragmentation Measures for Reconfigurable Systems Management Integration”. Integration. The VLSI Journal, vol 41/2, pp. 281-296, Feb. 2008. International conferences: [CGRM08a] J.A.Clemente, C. González, J.Resano, D.Mozos, “Task-graph management for reconfigurable multi-tasking systems”, 4th International Workshop on Reconfigurable Communication Centric System-on-Chips (ReCoSoc) 2008, Barcelona, Jul. 2008. [CGRM08c] J.A. Clemente, C. González, J. Resano, D. Mozos. “ A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems”, Proc. IEEE International Conference on Reconfigurable Computing and FPGAs, pp. 79-84, Cancún, México, diciembre 2008. [FeMo08] J. Fernández-Conde, D. Mozos, “Pull vs. Hybrid: Comparing Scheduling Algorithms for AsymmetricTime-Constrained Environments”, Proceedings of the International Conference on Wireless Networks (ICWN'08), Las Vegas, 2008 [GMSR08a] A. L. González, H. Mecha, J. Septién, S. Román, D. Mozos, “Synthesis of Relocatable Tasks and Implementation of a Task Communication Bus in a General Purpose Hardware System”, Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA'08 [MaMo08] J. J. Marfil, D. Mozos, “Optimizing Reconfigurable Hardware for Genomic Sequences Comparison”, IV IEEE Southern Programmable Logic Conference, SPL’08, Bariloche, Argentina, pp. 225-228, 2008. [PiVM08] C. Piñeiro, C. Valbuena, H. Mecha, “Polynomic curve based representation system implemented using FPGAs”, IV Southern Programmable Logic Conference, SPL’08, Bariloche, Argentina, Marzo 2008. [PRMC07] E. Perez-Ramo, J. Resano, D. Mozos, F. Cathoor, “Reducing the Reconfiguration Overhead: A Survey of Techniques”. International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA’07 Las Vegas, Junio, 2007 [RCGG07] J. Resano, J. A. Clemente, C. Gonzalez, J. L. Garcia, D. Mozos, “HW implementation of a task manager for reconfigurable systems”. International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA’07 Las Vegas, Junio, 2007 [SMMT08] J. Septién, D. Mozos, H. Mecha, J. Tabero, M.A. García, “Perimeter Quadrature-based metric for estimating FPGA fragmentation in 2D HW multitasking”, Proceedings of the International Parallel and Distributed Processing Symposium, 2008 [SSMM08] L. Sanchez, J. Septien, D. Mozos, H. Mecha, “FPGA Resource Management Using Internal RAM as Data Cache”, Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA'08 TIN2006-03274 [VSMM08] J.A. Valero, J. Septien, D. Mozos, H. Mecha, “Resource Management for Hw Multitasking in Three Dimensional FPGAs”. Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA'08. [VSMM09] J.A. Valero, J. Septien, D. Mozos, H. Mecha, “3D FPGA resource management and fragmentation metric for HW multitasking”, accepted in the International Parallel and Distributed Processing Symposium, 2009. National conferences: [GCRM07] C. González, J.A. Clemente, J. Resano, D. Mozos, Un sistema para la gestión eficiente del HW reconfigurable, Jornadas de Computación Reconfigurable y Aplicaciones, Zaragoza 2007, pp. 163-167. [GMMS07] A. González, H. Mecha, D. Mozos, J. Septién, “Un sistema HW reconfigurable y flexible para la ejecución de aplicaciones de propósito general. Obtención de tareas reubicables y bus de interconexiónentre las mismas”, Jornadas de Computación Reconfigurable y Aplicaciones, Zaragoza 2007. [GMMS08] A. L. González, Mecha López H, Mozos D., Septién del Castillo J.," Implementación de un sistema multitarea en Virtex iV“, Jornadas de Computación Reconfigurable y Aplicaciones, JCRA’08, Madrid, Sep. 2008. [CGRM08b] J.A.Clemente, C. González, J. Resano, D. Mozos. “Implementaciones HW y SW de un gestor de ejecución de grafos de tareas en un sistema multitarea reconfigurable”, VIII Jornadas sobre Computación Reconfigurable y Aplicaciones (JCRA) 2008, Madrid, Sep. 2008.