Laboratorio de Tecnologías de Información Computer Organization Arquitectura de Computadoras Arturo Díaz Pérez Centro de Investigación y de Estudios Avanzados del IPN Laboratorio de Tecnologías de Información adiaz@cinvestav.mx Arquitectura de Computadoras Organization- 1 Levels of Organization Laboratorio de Tecnologías de Información SPARCstation 20 Computer Workstation Design Target: 25% of cost on Processor 25% of cost on Memory (minimum memory size) Rest on I/O devices, power supplies, box Arquitectura de Computadoras Processor Memory Devices Control Input Datapath Output Organization- 2 The SPARCstation 20 Laboratorio de Tecnologías de Información SPARCstation 20 Memory SIMMs Memory Controller SIMM Bus MBus MBus Slot 1 MBus Slot 0 MSBI Disk SBus Slot 1 SBus Slot 3 SBus Slot 0 SBus Slot 2 SEC Keyboard Floppy & Mouse Disk Arquitectura de Computadoras MACIO SBus Tape SCSI Bus External Bus Organization- 3 The Underlying Interconnect Laboratorio de Tecnologías de Información SPARCstation 20 SIMM Bus Memory Controller Standard I/O Bus: SCSI Bus Processor/Mem Bus: MBus Sun’s High Speed I/O Bus: SBus MSBI SEC MACIO Low Speed I/O Bus: External Bus Arquitectura de Computadoras Organization- 4 Processor and Caches SPARCstation 20 Laboratorio de Tecnologías de Información MBus Module Processor MBus MBus Slot 1 MBus Slot 0 Registers Datapath Internal Cache Control External Cache Arquitectura de Computadoras Organization- 5 Memory Laboratorio de Tecnologías de Información SIMM Slot 7 SIMM Slot 6 SIMM Slot 5 SIMM Slot 4 SIMM Slot 3 SIMM Slot 2 SIMM Slot 1 Memory Controller SIMM Slot 0 SPARCstation 20 Memory SIMM Bus DRAM SIMM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM Arquitectura de Computadoras Organization- 6 Input and Output (I/O) Devices Laboratorio de Tecnologías de Información ♦ SCSI Bus: Standard I/O Devices ♦ SBus: High Speed I/O Devices SPARCstation 20 ♦ External Bus: Low Speed I/O Device Disk SBus Slot 1 SBus Slot 3 SBus Slot 0 SBus Slot 2 Tape SBus SEC MACIO Keyboard Floppy & Mouse Disk Arquitectura de Computadoras SCSI Bus External Bus Organization- 7 Standard I/O Devices Laboratorio de Tecnologías de Información SPARCstation 20 ♦ SCSI = Small Computer Systems Interface ♦ A standard interface (IBM, Apple, HP, Sun ... etc.) ♦ Computers and I/O devices communicate with each other ♦ The hard disk is one I/O device resides on the SCSI Bus Arquitectura de Computadoras Disk Tape SCSI Bus Organization- 8 High Speed I/O Devices Laboratorio de Tecnologías de Información SPARCstation 20 ♦ SBus is SUN’s own high speed I/O bus ♦ SS20 has four SBus slots where we can plug in I/O devices ♦ Example: graphics accelerator, video adaptor, ... etc. ♦ High speed and low speed are relative terms Arquitectura de Computadoras SBus Slot 1 SBus Slot 3 SBus Slot 0 SBus Slot 2 SBus Organization- 9 Slow Speed I/O Devices Laboratorio de Tecnologías de Información SPARCstation 20 ♦ The are only four SBus slots in SS20--”seats” are expensive ♦ The speed of some I/O devices is limited by human reaction time--very very slow by computer standard ♦ Examples: Keyboard and mouse ♦ No reason to use up one of the expensive SBus slot Keyboard Floppy & Mouse Disk Arquitectura de Computadoras External Bus Organization- 10 Summary Laboratorio de Tecnologías de Información ♦ All computers consist of five components ■ Processor: (1) datapath and (2) control ■ (3) Memory ■ (4) Input devices and (5) Output devices ♦ Not all “memory” are created equally ■ Cache: fast (expensive) memory are placed closer to the processor ■ Main memory: less expensive memory--we can have more ♦ Interfaces are where the problems are - between functional units and between the computer and the outside world ♦ Need to design against constraints of performance, power, area and cost Arquitectura de Computadoras Organization- 11 Summary: Computer System Components Laboratorio de Tecnologías de Información Proc Caches Busses Memory adapters Controllers I/O Devices: Disks Displays Keyboards Networks ♦ All have interfaces & organizations Arquitectura de Computadoras Organization- 12 Laboratorio de Tecnologías de Información Processor Architecture Review Arquitectura de Computadoras Arturo Díaz Pérez Centro de Investigación y de Estudios Avanzados del IPN Laboratorio de Tecnologías de Información adiaz@cinvestav.mx Arquitectura de Computadoras Organization- 13 Levels of Representation temp = v[k]; High Level Language Program v[k] = v[k+1]; v[k+1] = temp; Compiler lw $15, lw $16, sw sw Assembly Language Program Assembler Machine Language Program Laboratorio de Tecnologías de Información 0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0($2) 4($2) $16, 0($2) $15, 4($2) 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 Machine Interpretation Control Signal ° Specification ° Arquitectura de Computadoras ALUOP[0:3] <= InstReg[9:11] & MASK Organization- 14 Execution Cycle Instruction Laboratorio de Tecnologías de Información Obtain instruction from program storage Fetch Instruction Determine required actions and instruction size Decode Operand Locate and obtain operand data Fetch Execute Result Compute result value or status Deposit results in storage for later use Store Next Instruction Arquitectura de Computadoras Determine successor instruction Organization- 15 Top 10 80x86 Instructions ° Rank instruction Laboratorio de Tecnologías de Información Integer Average Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% ° Simple instructions dominate instruction frequency Arquitectura de Computadoras Organization- 16 Machine Organization Arquitectura de Computadoras Laboratorio de Tecnologías de Información Organization- 17 Basic Processor Architecture Laboratorio de Tecnologías de Información What is in a microprocessor today ? ♦ Integer Unit (was 32 bits, going to 64) ■ ■ ■ ■ Register File ALU Logical / Shifts PC Unit ♦ Floating Point Unit (64 bits) ■ Register File ■ Adder / Multiplier / Divide ♦ Virtual Memory Support ■ TLB ♦ Memory System (split I/D) ■ Fast cache memory, and associated controller Arquitectura de Computadoras Organization- 18 Block Diagram ICache ICache Laboratorio de Tecnologías de Información ITLB ITLB DTLB DTLB PC Bus DCache DCache Addr Bus Integer IntegerUnit Unit Inst Bus Data Bus Floating FloatingPoint PointUnit Unit Quite Simplified •No external interface Arquitectura de Computadoras Organization- 19 Integer Unit Laboratorio de Tecnologías de Información Core of the machine ♦ Main part is a 32 bit (moving to 64 bit) dataflow ♦ Register file ■ ■ ■ ■ ■ ■ Holds intermediate results Almost all machines have at least 32 registers Some have register windows (Sparc, 2900) Multi-ported Need 2 read / 1 write for each instruction Bypass logic for pipelinig Arquitectura de Computadoras Organization- 20 Integer Unit Laboratorio de Tecnologías de Información ♦ Execute Unit ■ Shifter (bits / bytes) ■ ALU ■ Integer Mult / Div ♦ Ld/St interface ■ Address generation ■ MDRout, MDRin, Addr registers ♦ Sequences instructions (Program Counter) ■ Needs an incrementer and adder ■ Ports to transfer PC to / from registers ■ Some registers for holding state Arquitectura de Computadoras Organization- 21 Floating Point Unit Laboratorio de Tecnologías de Información Usually performs IEEE compatible FP Has lots of hard stuff in it Denom, FP exceptions, rounding modes Hardware often only does the common case, trap to software ♦ Register file ■ 16 to 32 double precision (64bit) registers ♦ Adder ■ Often pipelined ■ Contains large shifters to align numbers as well as an adder ♦ Multiplier ■ Build tree multipliers / pipelined ■ Sometimes partial trees and iterate ♦ Divider ■ Either SRT algorithm, or iterative using the multiplier Arquitectura de Computadoras Organization- 22 Virtual Memory Laboratorio de Tecnologías de Información All modern processor use virtual addresses ♦ Internal operations generate a virtual address ■ Address needs to be mapped to a physical memory location ■ Mapping is done by the Operating System » Contains protection information too » Allows OS to move virtual memory to disk » Allows OS to run multiple programs on same machine ♦ Problems it causes for the hardware ■ Need to translate address before memory fetch » Need to store all the translation » Translation must be fast ■ Sometimes the requested address is not in memory ■ Sometimes the requested translation in not where you want it » Both cause a machine exception that need to be handled Organization- 23 Arquitectura de Computadoras Memory Translation Laboratorio de Tecnologías de Información Translation addresses is usually done using a small cache ♦ Store frequently used translations in a Translation Lookaside Buffer ■ ■ ■ ■ ■ Really a translation cache Usually pretty small, 64-1K entries Stores mapping from virtual page # to physical page # Page 4K byte and getting larger New TLB support super-pages (very large pages) CAM Virtual Page Arquitectura de Computadoras RAM Physical Page Protection Bits Organization- 24 Memory Translation CAM Virtual Page Laboratorio de Tecnologías de Información RAM Physical Page Protection Bits ■ Problem is what happens on TLB miss ? ■ Take an exception, or hardware FSM to reload ? Arquitectura de Computadoras Organization- 25 Memory System Laboratorio de Tecnologías de Información ♦ Usually multi-level with caches ♦ Need memory to keep up with processor ■ Can’t use DRAM ■ Use a fast SRAM to hold working set of program ♦ Most accesses to this fast memory ♦ If data is not present, cache misses ■ Fetch data from memory (or larger cache) ♦ First level cache often integrated on chip ■ Separate I/D caches for more bandwidth Physical Addr Tag Word0 Word1 Word2 Word3 Cmp Hit ? Arquitectura de Computadoras Mux Data Organization- 26 Machine Performance Laboratorio de Tecnologías de Información ♦ Depends on the average time between instruction fetches = Ninst * CPI * Tcycle ♦ Indirectly related to how long it takes to complete an instruction ■ Can start next instruction before previous one is finished ■ Relation is set by the amount of ILP and pipeline structure ♦ Also depends on the memory system design ■ What percentage of refs. hit in the cache ■ How long does it take when they miss Arquitectura de Computadoras Organization- 27 Pipelining Laboratorio de Tecnologías de Información A way of exploiting instruction level parallelism 1 Arquitectura de Computadoras 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Organization- 28 Pipelining Laboratorio de Tecnologías de Información ♦ Time per instruction ■ TPI = CPI * CPU cycle time ♦ Speedup TPI without pipeline Speedup = = Number of pipeline stages TPI with pipeline ■ Requires all stages to be perfectly balanced ■ No latch overhead ■ Real speedup will be less ♦ Not visible to programmer ■ That is in the ideal case ■ Instruction scheduling depends on the pipeline Arquitectura de Computadoras Organization- 29 Pipelined Execution Laboratorio de Tecnologías de Información ♦ Ideally we get this: Instruction Number Instruction i Instruction i+1 Instruction i+2 Instruction i+3 Instruction i+4 1 IF 2 ID IF 3 EX ID IF Clock number 4 5 6 7 8 MEM WB EX MEM WB ID EX MEM WB IF ID EX MEM WB IF ID EX MEM 9 WB ♦ But in real life there are pipeline hazards: ■ Structural » Some resource is not available this cycle ■ Data » Data needed has not been produced yet ■ Control » Which instruction to execute is not known Arquitectura de Computadoras Organization- 30 Modern Processor Architecture Laboratorio de Tecnologías de Información R10000 233 Mhz Arquitectura de Computadoras Organization- 31 Today Conventional Microprocessors Laboratorio de Tecnologías de Información ♦ Instructions sets ■ CISC, RISC, ♦ Advanced memory systems ■ (caches, memory, virtual memory) ♦ Advanced Instruction Level Parallelism ■ (pipelining, superscalar, vectors, VLIW) ♦ Storage systems (I/O) ♦ Interconnection Technology ♦ Basic parallel processing ■ Double core ■ Quad core Arquitectura de Computadoras Organization- 32 In 10 Years! Laboratorio de Tecnologías de Información R10000 233 Mhz Arquitectura de Computadoras Organization- 33 Using the Silicon PE PE FFT Laboratorio de Tecnologías de Información MMX VIZ RC5 M More Cache PE PE PE PE PE PE CISC PE PE PE M M MPP Vector Arquitectura de Computadoras 64-way Superscalar Reconfigurable Logic M Reconfigurable ProcessorOrganization- 34 Summary Laboratorio de Tecnologías de Información ♦ Modern processors have a pipeline architecture ♦ All stages in a pipeline must be balanced ♦ Several resources used for different purposes ♦ Not all instructions take the same time ■ Clocks per instruction ♦ Memory access instruction are among the most frequent (34%) ♦ Main idea to increase performance is to exploit instruction level parallelism ♦ Performance is major concern in designing modern processors since more space is available Arquitectura de Computadoras Organization- 35