COMPUTER COMPONENTS Pham Thanh Giang Institute of Information Technology ptgiang@ioit.ac.vn COMPONENTS OF A COMPUTER Same components for all kinds of computer Desktop, server, embedded Input/output includes User-interface devices Display, keyboard, mouse Storage devices Hard disk, CD/DVD, flash Network adapters For communicating with other computers HARDWARE & SOFTWARE Hardware All of the electronic and mechanical equipment in a computer is called the hardware. Examples include: • • • • • • • • • Motherboard Hard disk RAM Power supply Processor Case Monitor Keyboard Mouse HARDWARE & SOFTWARE Software The term software is used to describe computer programs that perform a task or tasks on a computer system. Software can be grouped as follows: • System software - Operating System etc. • Utility programs - Antivirus etc. •Applications Software - Word, SolidWorks etc. PC COMPONENTS Computer system - collection of electronic and mechanical devices operating as a unit. The main parts are: 1.System unit 2.Monitor 2 3.Keyboard 5 4.Mouse 5.Speakers 3 1 4 SYSTEM UNIT The system unit is the main container for system devices. It protects the delicate electronic and mechanical devices from damage. Typical system unit devices include: • Motherboard • CPU (Processor) • Memory •Disk drives • Ports - USB etc. • Power supply • Expansion cards - sound card, network card, graphics card etc. PERIPHERALS Peripherals are devices that connect to the system unit using cables or wireless technologies. Typical peripherals include: • Monitor • Keyboard •Printer • Plotter • Scanner • Speakers Plotter PROCESSOR (CPU) An integrated circuit (IC) supplied on a single silicon chip. It’s function is to control all the computers functions. The main processor manufacturers are: • AMD - Athlon and Turion (mobile) • Intel - Pentium and Centrino (mobile) AMD Processor COMPUTER PROGRAM Computer program - a series of instructions. When a program is run, the processor carries out these instructions in an orderly fashion. Typical instructions include: • Arithmetic - addition, subtraction etc • Logical - comparing data and acting according to the result • Move - move data from place to place within the computer system - memory to the processor for addition - memory to a printer or disk drive etc. PROCESSOR SPEED Processor speed - measured in megahertz (MHz) or Gigahertz (GHz) - the speed of the system clock (clock speed) within the processor and it controls how fast instructions are executed: • 1 MHz - 1 million clock ticks every second • 1 GHz - 1 billion clock ticks every second Latest trend - multi-core processors can have two, three or four processor cores on a single chip. RANDOM ACCESS MEMORY (RAM) •Primary storage - main computer memory. Data, programs currently in use are held in RAM •Volatile - contents of memory are lost if the computer is turned off •Module - memory IC’s on a circuit board Memory Module IC’s MEMORY Memory is sold in modules: • DIMM’s (dual inline memory module) for desktop computers • SODIMM’s (small outline dual inline memory module) for notebook computers. DIMM Module SODIMM Module MEMORY DIMM’s and SODIMM’s are available in modules of 256MB, 512MB, 1GB, 2GB, 4GB, 8GB The current technology is called DDR (double data ram) and there are three types: DDR1, DDR2, DDR3 Any particular computer system is only compatible with one type. Module capacity Module name Module type Module speed MOTHERBOARD Mainboard or system board - the main circuit board for the computer system. All device in the computer system will either be part of the motherboard or connected to it. Memory Sockets Processor Socket Chipset PCI Slots Ports Graphics Slot PROCESSOR SOCKET Processor socket - different processors require different sockets and a motherboard must be chosen to suit the processor intended for use: •Socket 478 - Intel Pentium IV • Socket 775 - Intel Dual Core and Core Duo • Socket 754 - AMD Athlon •Socket 939 - AMD Athlon 64 • Socket AM2 - AMD Athlon X2 CHIPSET Chipset - controls data flow around the computer. It consists of two chips: •Northbridge - data flow between memory and processor - data flow between the processor and the graphic's card • Southbridge - controls data flow to the devices - USB, IDE, SATA, LAN and Audio - controls PCI slots and onboard graphics BUSES Buses - a path through which data can be sent to the different parts of the computer system. Main buses: Processor Front Side Bus Graphics Slot PC-Express or AGP Northbridge Graphics Bus RAM Memory Bus All Memory Internal Bus Southbridge PCI Slots PCI Bus IDE SATA USB LAN Audio PCI Bus Onboard Graphics POWER SUPPLY A computer power supply has a number of functions: • Converts Alternating current (AC) Direct current (DC) • Transforms mains voltage (240 Volts) to the voltages required by the computer. The main voltages are: • 12 volts for the disk drives as they have motors • 3.3 and 5 volts for the circuit boards in the computer POWER SUPPLY • Uses advances power management (APM) to allow the computer go into a standby mode • Some have a switch to toggle between 240 volt supplies and 110 volt supplies. • The main connections are: 3 1 4 2 1 Main connector Connects to the motherboard and supplies the 3.3 and 5 volt supply for the board. 2 SATA connector Connects SATA drives 3 Berg connector Connects floppy disk drives 4 Molex connector Connects IDE hard drives and optical drives. PORTS Computer ports are interfaces between peripheral devices and the computer. They are mainly found at the back of the computer but are often also built into the front of the computer chassis for easy access. Ports at the back of the computer Ports at the front of the computer COMPUTER PERFORMANCE 21 PERFORMANCE AND COST Which computer is fastest? Not so simple Scientific simulation – FP performance Program development – Integer performance Database workload – Memory, I/O MEASURING EXECUTION TIME Elapsed time Total response time, including all aspects Processing, I/O, OS overhead, idle time Determines system performance CPU time Time spent processing a given job Discounts I/O time, other jobs’ shares Comprises user CPU time and system CPU time Different programs are affected differently by CPU and system performance DEFINING PERFORMANCE What is important to whom? Computer system user Minimize elapsed time for program: tresp = tend – tstart Called response time Computer center manager Maximize completion rate = #jobs/second Called throughput WHAT IS PERFORMANCE FOR US? For computer architects CPU time = time spent running a program Intuitively, bigger should be faster, so: Performance = 1/X, where X is response time: CPU execution, etc. Elapsed time = CPU time + I/O wait We will concentrate on CPU time IMPROVE PERFORMANCE Improve (a) response time or (b) throughput? Faster CPU Helps both (a) and (b) Add more CPUs Helps (b) and perhaps (a) due to less queueing CPU CLOCKING Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s Clock frequency (rate): cycles per second e.g., 4.0GHz = 1/ 0.25ns = 4000MHz = 4.0×109Hz CPU TIME Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Number of clock cycles Clock cycle CPU Time CPU Clock Cycles Clock Cycle Time CPU Clock Cycles Clock Rate CPU TIME EXAMPLE Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 × clock cycles How fast must Computer B clock be? Clock Rate B Clock Cycles B 1.2 Clock Cycles A CPU Time B 6s Clock Cycles A CPU Time A Clock Rate A 10s 2GHz 20 109 1.2 20 109 24 109 Clock Rate B 4GHz 6s 6s INSTRUCTION COUNT AND CPI Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix Clock Cycles Instructio n Count Cycles per Instructio n CPU Time Instructio n Count CPI Clock Cycle Time Instructio n Count CPI Clock Rate CPI EXAMPLE Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? CPU Time CPU Time A B Instructio n Count CPI Cycle Time A A I 2.0 250ps I 500ps Instructio n Count CPI Cycle Time B B I 1.2 500ps I 600ps B I 600ps 1.2 CPU Time I 500ps A CPU Time A is faster… …by this much CPI IN MORE DETAIL If different instruction types take different numbers of cycles n Clock Cycles (CPIi Instructio n Count i ) i1 Weighted average CPI n Clock Cycles Instructio n Count i CPI CPIi Instructio n Count i1 Instructio n Count Relative frequency CPI EXAMPLE Alternative compiled code sequences using instructions in type INT, FP, MEM Type INT FP MEM CPI for type 1 2 3 IC in Program 1 2 1 2 5 IC in Program 2 4 1 1 6 Program 1: IC = 5 Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Program 2: IC = 6 Clock Cycles = 4×1 + 1×2 + 1×3 =9 Avg. CPI = 9/6 = 1.5 IRON LAW PROCESSOR PERFORMANCE Time Execution time = --------------Program = Instructions Program Cycles X X Instructions Time Cycle (code size) (CPI) (cycle time) Architecture --> Implementation --> Realization Compiler Designer Processor Designer Chip Designer IRON LAW Instructions/Program Instructions executed, not static code size Determined by algorithm, compiler, ISA Cycles/Instruction Determined by ISA and CPU organization Overlap among instructions reduces this term Time/cycle Determined by technology, organization, clever circuit design IRON LAW EXAMPLE Machine A: clock 1ns, CPI 2.0, for program x Machine B: clock 2ns, CPI 1.2, for program x Which is faster and how much? Time/Program = instr/program x cycles/instr x sec/cycle Time(A) = N x 2.0 x 1 = 2N Time(B) = N x 1.2 x 2 = 2.4N Compare: Time(B)/Time(A) = 2.4N/2N = 1.2 So, Machine A is 20% faster than Machine B for this program IRON LAW EXAMPLE Keep clock(A) @ 1ns and clock(B) @2ns For equal performance, if CPI(B)=1.2, what is CPI(A)? Time(B)/Time(A) = 1 = (Nx2x1.2)/(Nx1xCPI(A)) CPI(A) = 2.4 IRON LAW EXAMPLE Keep CPI(A)=2.0 and CPI(B)=1.2 For equal performance, if clock(B)=2ns, what is clock(A)? Time(B)/Time(A) = 1 = (N x 2.0 x clock(A))/(N x 1.2 x 2) clock(A) = 1.2ns IRON LAW Example 1: How much is execution time of a program which executes 3 billion instructions in a processor. The processor spends 2 cycles on each instruction and is working at 3GHz. Example2: A program contains 50 billion instruction whose composition is as follows: 10 billion branch instructions, CPI=4 15 billion load instructions, CPI=2 5 billion store instructions, CPI=3 20 billion integer-type instructions, CPI=1 Evaluate the execution time for above program. 39 SUMMARY Time and performance: Machine A n times faster than Machine B If Time(B)/Time(A) = n Iron Law: Performance = Time/program = = Instructions Program (code size) X Cycles X Instruction (CPI) Other Metrics: MIPS and MFLOPS Beware of peak and omitted details Time Cycle (cycle time) OTHER METRICS MIPS and MFLOPS Million Instructions Per Second Million Floating Point Operations Per Second MIPS = instruction count/(execution time x 106) = clock rate/(CPI x 106) But MIPS has serious shortcomings OTHER METRICS MFLOPS = FP ops in program/(execution time x 106) Assuming FP ops independent of compiler and ISA Often safe for numeric codes: matrix size determines # of FP ops/program However, not always safe: Missing instructions (e.g. FP divide) Optimizing compilers Relative MIPS and normalized MFLOPS Adds to confusion 4 PROBLEMS WITH MIPS E.g. without FP hardware, an FP op may take 50 single-cycle instructions With FP hardware, only one 2-cycle instruction Thus, adding FP hardware: – CPI increases (why?) – Instructions/program decreases (why?) – Total execution time decreases BUT, MIPS gets worse! 50/50 => 2/1 50 => 1 50 => 2 50 MIPS => 2 MIPS PROBLEMS WITH MIPS Ignores program Usually used to quote peak performance Ideal conditions => guaranteed not to exceed! When is MIPS ok? Same compiler, same ISA E.g. same binary running on CPU type, such as: AMD Jaguar, Intel Core i7 Why? Instr/program is constant and can be factored out AMDAHL’S LAW Motivation for optimizing common case Speedup = old time / new time = new rate / old rate Let an optimization speed fraction f of time by a factor of s New_time = (1-f) x old_time + f x (old_time/s) Speedup = old_time / new_time Speedup = old_time / ((1-f) x old_time + f x (old_time/s)) Speedup 1 1 f f oldtime f f oldtime oldtime s 1 1 f f s AMDAHL’S LAW EXAMPLE Your boss asks you to improve performance by: Improve the ALU used 95% of time by 10% Improve memory pipeline used 5% of time by 10x f 95% 5% 5% s 1.10 10 ∞ Speedup 1.094 1.047 1.052 Speedup 1 f 1 f s AMDAHL’S LAW: LIMIT 1 1 lim s f 1 f 1 f s Speedup Make common case fast: 10 9 8 7 6 5 4 3 2 1 0 0 0.2 0.4 0.6 f 0.8 1 AMDAHL’S LAW: LIMIT 1 •Consider uncommon case! •If (1-f) is nontrivial 1 lim s f 1 f 1 f s –Speedup is limited! •Particularly true for exploiting parallelism in the large, where large s is not cheap –GPU with e.g. 1024 processors (shader cores) –Parallel portion speeds up by s (1024x) –Serial portion of code (1-f) limits speedup E.g. 10% serial portion: 1/0.1 = 10x speedup with 1000 cores 4 POWER TRENDS In CMOS IC technology Power Capacitive load Voltage 2 Frequency ×30 5V → 1V ×1000 REDUCING POWER Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction Pnew Cold 0.85 (Vold 0.85) 2 Fold 0.85 4 0.85 0.52 2 Pold Cold Vold Fold The power wall We can’t reduce voltage further We can’t remove more heat How else can we improve performance? CONCLUDING REMARKS Cost/performance is improving Due to underlying technology development Hierarchical layers of abstraction In both hardware and software Instruction set architecture The hardware/software interface Execution time: the best performance measure Power is a limiting factor Use parallelism to improve performance