Revisión de organización y arquitectura

Anuncio
Laboratorio de
Tecnologías de Información
Computer Organization
Arquitectura de Computadoras
Arturo Díaz Pérez
Centro de Investigación y de Estudios Avanzados del IPN
Laboratorio de Tecnologías de Información
adiaz@cinvestav.mx
Arquitectura de Computadoras
Organization- 1
Levels of Organization
Laboratorio de
Tecnologías de Información
SPARCstation 20
Computer
Workstation Design Target:
25% of cost on Processor
25% of cost on Memory
(minimum memory size)
Rest on I/O devices,
power supplies, box
Arquitectura de Computadoras
Processor
Memory
Devices
Control
Input
Datapath
Output
Organization- 2
The SPARCstation 20
Laboratorio de
Tecnologías de Información
SPARCstation 20
Memory SIMMs
Memory
Controller
SIMM Bus
MBus
MBus
Slot 1
MBus
Slot 0
MSBI
Disk
SBus
Slot 1
SBus
Slot 3
SBus
Slot 0
SBus
Slot 2
SEC
Keyboard
Floppy
& Mouse
Disk
Arquitectura de Computadoras
MACIO
SBus
Tape
SCSI
Bus
External Bus
Organization- 3
The Underlying Interconnect
Laboratorio de
Tecnologías de Información
SPARCstation 20
SIMM Bus
Memory
Controller
Standard I/O Bus:
SCSI Bus
Processor/Mem Bus:
MBus
Sun’s High Speed I/O Bus:
SBus
MSBI
SEC
MACIO
Low Speed I/O Bus:
External Bus
Arquitectura de Computadoras
Organization- 4
Processor and Caches
SPARCstation 20
Laboratorio de
Tecnologías de Información
MBus Module
Processor
MBus
MBus
Slot 1
MBus
Slot 0
Registers
Datapath
Internal
Cache
Control
External Cache
Arquitectura de Computadoras
Organization- 5
Memory
Laboratorio de
Tecnologías de Información
SIMM Slot 7
SIMM Slot 6
SIMM Slot 5
SIMM Slot 4
SIMM Slot 3
SIMM Slot 2
SIMM Slot 1
Memory
Controller
SIMM Slot 0
SPARCstation 20
Memory SIMM Bus
DRAM SIMM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
Arquitectura de Computadoras
Organization- 6
Input and Output (I/O) Devices
Laboratorio de
Tecnologías de Información
♦ SCSI Bus: Standard I/O Devices
♦ SBus: High Speed I/O Devices
SPARCstation 20
♦ External Bus: Low Speed I/O Device
Disk
SBus
Slot 1
SBus
Slot 3
SBus
Slot 0
SBus
Slot 2
Tape
SBus
SEC
MACIO
Keyboard
Floppy
& Mouse
Disk
Arquitectura de Computadoras
SCSI
Bus
External Bus
Organization- 7
Standard I/O Devices
Laboratorio de
Tecnologías de Información
SPARCstation 20
♦ SCSI = Small Computer Systems Interface
♦ A standard interface (IBM, Apple, HP, Sun ...
etc.)
♦ Computers and I/O devices communicate with
each other
♦ The hard disk is one I/O device resides on the
SCSI Bus
Arquitectura de Computadoras
Disk
Tape
SCSI
Bus
Organization- 8
High Speed I/O Devices
Laboratorio de
Tecnologías de Información
SPARCstation 20
♦ SBus is SUN’s own high speed I/O bus
♦ SS20 has four SBus slots where we can plug in
I/O devices
♦ Example: graphics accelerator, video adaptor, ...
etc.
♦ High speed and low speed are relative terms
Arquitectura de Computadoras
SBus
Slot 1
SBus
Slot 3
SBus
Slot 0
SBus
Slot 2
SBus
Organization- 9
Slow Speed I/O Devices
Laboratorio de
Tecnologías de Información
SPARCstation 20
♦ The are only four SBus slots in SS20--”seats” are
expensive
♦ The speed of some I/O devices is limited by
human reaction time--very very slow by computer
standard
♦ Examples: Keyboard and mouse
♦ No reason to use up one of the expensive SBus
slot
Keyboard
Floppy
& Mouse
Disk
Arquitectura de Computadoras
External Bus
Organization- 10
Summary
Laboratorio de
Tecnologías de Información
♦ All computers consist of five components
■ Processor: (1) datapath and (2) control
■ (3) Memory
■ (4) Input devices and (5) Output devices
♦ Not all “memory” are created equally
■ Cache: fast (expensive) memory are placed closer to the
processor
■ Main memory: less expensive memory--we can have more
♦ Interfaces are where the problems are - between
functional units and between the computer and the
outside world
♦ Need to design against constraints of performance,
power, area and cost
Arquitectura de Computadoras
Organization- 11
Summary: Computer System
Components
Laboratorio de
Tecnologías de Información
Proc
Caches
Busses
Memory
adapters
Controllers
I/O Devices:
Disks
Displays
Keyboards
Networks
♦ All have interfaces & organizations
Arquitectura de Computadoras
Organization- 12
Laboratorio de
Tecnologías de Información
Processor Architecture Review
Arquitectura de Computadoras
Arturo Díaz Pérez
Centro de Investigación y de Estudios Avanzados del IPN
Laboratorio de Tecnologías de Información
adiaz@cinvestav.mx
Arquitectura de Computadoras
Organization- 13
Levels of Representation
temp = v[k];
High Level Language
Program
v[k] = v[k+1];
v[k+1] = temp;
Compiler
lw $15,
lw $16,
sw
sw
Assembly Language
Program
Assembler
Machine Language
Program
Laboratorio de
Tecnologías de Información
0000
1010
1100
0101
1001
1111
0110
1000
1100
0101
1010
0000
0($2)
4($2)
$16, 0($2)
$15, 4($2)
0110
1000
1111
1001
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine Interpretation
Control Signal
°
Specification
°
Arquitectura de Computadoras
ALUOP[0:3] <= InstReg[9:11] & MASK
Organization- 14
Execution Cycle
Instruction
Laboratorio de
Tecnologías de Información
Obtain instruction from program storage
Fetch
Instruction
Determine required actions and instruction size
Decode
Operand
Locate and obtain operand data
Fetch
Execute
Result
Compute result value or status
Deposit results in storage for later use
Store
Next
Instruction
Arquitectura de Computadoras
Determine successor instruction
Organization- 15
Top 10 80x86 Instructions
° Rank instruction
Laboratorio de
Tecnologías de Información
Integer Average Percent total executed
1
load
22%
2
conditional branch
20%
3
compare
16%
4
store
12%
5
add
8%
6
and
6%
7
sub
5%
8
move register-register
4%
9
call
1%
10
return
1%
Total
96%
° Simple instructions dominate instruction frequency
Arquitectura de Computadoras
Organization- 16
Machine Organization
Arquitectura de Computadoras
Laboratorio de
Tecnologías de Información
Organization- 17
Basic Processor Architecture
Laboratorio de
Tecnologías de Información
What is in a microprocessor today ?
♦ Integer Unit (was 32 bits, going to 64)
■
■
■
■
Register File
ALU
Logical / Shifts
PC Unit
♦ Floating Point Unit (64 bits)
■ Register File
■ Adder / Multiplier / Divide
♦ Virtual Memory Support
■ TLB
♦ Memory System (split I/D)
■ Fast cache memory, and associated controller
Arquitectura de Computadoras
Organization- 18
Block Diagram
ICache
ICache
Laboratorio de
Tecnologías de Información
ITLB
ITLB
DTLB
DTLB
PC Bus
DCache
DCache
Addr Bus
Integer
IntegerUnit
Unit
Inst Bus
Data Bus
Floating
FloatingPoint
PointUnit
Unit
Quite Simplified
•No external interface
Arquitectura de Computadoras
Organization- 19
Integer Unit
Laboratorio de
Tecnologías de Información
Core of the machine
♦ Main part is a 32 bit (moving to 64 bit) dataflow
♦ Register file
■
■
■
■
■
■
Holds intermediate results
Almost all machines have at least 32 registers
Some have register windows (Sparc, 2900)
Multi-ported
Need 2 read / 1 write for each instruction
Bypass logic for pipelinig
Arquitectura de Computadoras
Organization- 20
Integer Unit
Laboratorio de
Tecnologías de Información
♦ Execute Unit
■ Shifter (bits / bytes)
■ ALU
■ Integer Mult / Div
♦ Ld/St interface
■ Address generation
■ MDRout, MDRin, Addr registers
♦ Sequences instructions (Program Counter)
■ Needs an incrementer and adder
■ Ports to transfer PC to / from registers
■ Some registers for holding state
Arquitectura de Computadoras
Organization- 21
Floating Point Unit
Laboratorio de
Tecnologías de Información
Usually performs IEEE compatible FP
Has lots of hard stuff in it
Denom, FP exceptions, rounding modes
Hardware often only does the common case, trap to software
♦ Register file
■ 16 to 32 double precision (64bit) registers
♦ Adder
■ Often pipelined
■ Contains large shifters to align numbers as well as an adder
♦ Multiplier
■ Build tree multipliers / pipelined
■ Sometimes partial trees and iterate
♦ Divider
■ Either SRT algorithm, or iterative using the multiplier
Arquitectura de Computadoras
Organization- 22
Virtual Memory
Laboratorio de
Tecnologías de Información
All modern processor use virtual addresses
♦ Internal operations generate a virtual address
■ Address needs to be mapped to a physical memory location
■ Mapping is done by the Operating System
» Contains protection information too
» Allows OS to move virtual memory to disk
» Allows OS to run multiple programs on same machine
♦ Problems it causes for the hardware
■ Need to translate address before memory fetch
» Need to store all the translation
» Translation must be fast
■ Sometimes the requested address is not in memory
■ Sometimes the requested translation in not where you want it
» Both cause a machine exception that need to be handled Organization- 23
Arquitectura de Computadoras
Memory Translation
Laboratorio de
Tecnologías de Información
Translation addresses is usually done using a small
cache
♦ Store frequently used translations in a Translation
Lookaside Buffer
■
■
■
■
■
Really a translation cache
Usually pretty small, 64-1K entries
Stores mapping from virtual page # to physical page #
Page 4K byte and getting larger
New TLB support super-pages (very large pages)
CAM
Virtual Page
Arquitectura de Computadoras
RAM
Physical Page
Protection Bits
Organization- 24
Memory Translation
CAM
Virtual Page
Laboratorio de
Tecnologías de Información
RAM
Physical Page
Protection Bits
■ Problem is what happens on TLB miss ?
■ Take an exception, or hardware FSM to reload ?
Arquitectura de Computadoras
Organization- 25
Memory System
Laboratorio de
Tecnologías de Información
♦ Usually multi-level with caches
♦ Need memory to keep up with processor
■ Can’t use DRAM
■ Use a fast SRAM to hold working set of program
♦ Most accesses to this fast memory
♦ If data is not present, cache misses
■ Fetch data from memory (or larger cache)
♦ First level cache often integrated on chip
■ Separate I/D caches for more bandwidth
Physical Addr
Tag Word0 Word1 Word2 Word3
Cmp
Hit ?
Arquitectura de Computadoras
Mux
Data
Organization- 26
Machine Performance
Laboratorio de
Tecnologías de Información
♦ Depends on the average time between instruction
fetches
= Ninst * CPI * Tcycle
♦ Indirectly related to how long it takes to complete an
instruction
■ Can start next instruction before previous one is finished
■ Relation is set by the amount of ILP and pipeline structure
♦ Also depends on the memory system design
■ What percentage of refs. hit in the cache
■ How long does it take when they miss
Arquitectura de Computadoras
Organization- 27
Pipelining
Laboratorio de
Tecnologías de Información
A way of exploiting instruction level parallelism
1
Arquitectura de Computadoras
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Organization- 28
Pipelining
Laboratorio de
Tecnologías de Información
♦ Time per instruction
■ TPI = CPI * CPU cycle time
♦ Speedup
TPI without pipeline
Speedup =
= Number of pipeline stages
TPI with pipeline
■ Requires all stages to be perfectly balanced
■ No latch overhead
■ Real speedup will be less
♦ Not visible to programmer
■ That is in the ideal case
■ Instruction scheduling depends on the pipeline
Arquitectura de Computadoras
Organization- 29
Pipelined Execution
Laboratorio de
Tecnologías de Información
♦ Ideally we get this:
Instruction Number
Instruction i
Instruction i+1
Instruction i+2
Instruction i+3
Instruction i+4
1
IF
2
ID
IF
3
EX
ID
IF
Clock number
4
5
6
7
8
MEM WB
EX MEM WB
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM
9
WB
♦ But in real life there are pipeline hazards:
■ Structural
» Some resource is not available this cycle
■ Data
» Data needed has not been produced yet
■ Control
» Which instruction to execute is not known
Arquitectura de Computadoras
Organization- 30
Modern Processor Architecture
Laboratorio de
Tecnologías de Información
R10000 233 Mhz
Arquitectura de Computadoras
Organization- 31
Today Conventional Microprocessors
Laboratorio de
Tecnologías de Información
♦ Instructions sets
■ CISC, RISC,
♦ Advanced memory systems
■ (caches, memory, virtual memory)
♦ Advanced Instruction Level Parallelism
■ (pipelining, superscalar, vectors, VLIW)
♦ Storage systems (I/O)
♦ Interconnection Technology
♦ Basic parallel processing
■ Double core
■ Quad core
Arquitectura de Computadoras
Organization- 32
In 10 Years!
Laboratorio de
Tecnologías de Información
R10000 233 Mhz
Arquitectura de Computadoras
Organization- 33
Using the Silicon
PE
PE
FFT
Laboratorio de
Tecnologías de Información
MMX
VIZ
RC5
M
More Cache
PE
PE
PE
PE
PE
PE
CISC
PE
PE
PE
M
M
MPP
Vector
Arquitectura de Computadoras
64-way Superscalar
Reconfigurable
Logic
M
Reconfigurable
ProcessorOrganization- 34
Summary
Laboratorio de
Tecnologías de Información
♦ Modern processors have a pipeline architecture
♦ All stages in a pipeline must be balanced
♦ Several resources used for different purposes
♦ Not all instructions take the same time
■ Clocks per instruction
♦ Memory access instruction are among the most frequent
(34%)
♦ Main idea to increase performance is to exploit instruction
level parallelism
♦ Performance is major concern in designing modern
processors since more space is available
Arquitectura de Computadoras
Organization- 35
Descargar