1 of 28

Процесор са више језгара

2 of 28

Питања перформанси

  • Муров закон!
    • Побољшати организацију
    • Повећати фреквенцију ГТ
  • Паралелизам
    • Pipelining
    • Superscalar
    • Simultaneous multithreading (SMT)
  • Опадајући приноси
    • Комплекснији систем тражи више логичких кола
    • Већи чип
      • Тешко за контролу

3 of 28

Алтернативе

4 of 28

Intel Hardware �Trends

5 of 28

Увећана комплексност

  • Дисипација расте експоненцијално са густином чипа и фреквенцијом ГТ
    • Већа површина чипа за Л2 кеш
    • Мање, ефикасније
      • 2015
    • 100 милијарди транзистора на 300mm2 чипу
      • Кеш 100MB
      • 1 милијарда транзистора за логичка кола
  • Pollack’s rule:
    • Performance is roughly proportional to square root of increase in complexity
      • Комплексност удвостручена - 40% више перформансе
  • Вишејезгарни систем даје скоро линеарну зависност комплексност-перформансе
  • Једно језгро неће цео кеш потрошити

6 of 28

Снага

7 of 28

8 of 28

9 of 28

Chip Utilization of Transistors

10 of 28

Софтвер

  • Увећање кроз паралелизам
  • Серисјки код који је непаралелизујући прави проблеме
  • 10% таквог кода на 8 процесорском систему даје само 4.7 већу перформансу!
  • Комуникација, координација и кохерентност кеша гуше!!!

11 of 28

Добре апликације

  • ДБ
  • Трансакције су независне
  • Multi-threaded апликације
    • Lotus Domino, Siebel CRM
  • Вишепроцесне applications
    • Oracle, SAP, PeopleSoft
  • Java апликације
    • Java VM is multi-thread with scheduling and memory management
    • Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat
  • Апликације у више инстанци
    • One application running multiple times
  • E.g. Value Game Software

12 of 28

Multicore Organization Alternatives

13 of 28

Дељени L2 кеш

  • Смањени промашаји
  • Нема редундантности
  • За нити понекад не важи принцип локалности
  • Дељена меморија олакшава међупроцесну комуникацију

14 of 28

Individual Core Architecture

  • Intel Core Duo uses superscalar cores
  • Intel Core i7 uses simultaneous multi-threading (SMT)
    • Scales up number of threads supported
      • 4 SMT cores, each supporting 4 threads appears as 16 core

15 of 28

Intel x86 Multicore Organization -�Core Duo (1)

  • 2006
  • Two x86 superscalar, shared L2 cache
  • Dedicated L1 cache per core
    • 32KB instruction and 32KB data
  • Thermal control unit per core
    • Manages chip heat dissipation
    • Maximize performance within constraints
    • Improved ergonomics
  • Advanced Programmable Interrupt Controlled (APIC)
    • Inter-process interrupts between cores
    • Routes interrupts to appropriate core
    • Includes timer so OS can interrupt core

16 of 28

Intel x86 Multicore Organization -�Core Duo (2)

  • Power Management Logic
    • Monitors thermal conditions and CPU activity
    • Adjusts voltage and power consumption
    • Can switch individual logic subsystems
  • 2MB shared L2 cache
    • Dynamic allocation
    • MESI support for L1 caches
    • Extended to support multiple Core Duo in SMP
      • L2 data shared between local cores or external
  • Bus interface

17 of 28

Intel x86 Multicore Organization -�Core i7

  • November 2008
  • Four x86 SMT processors
  • Dedicated L2, shared L3 cache
  • Speculative pre-fetch for caches
  • On chip DDR3 memory controller
    • Three 8 byte channels (192 bits) giving 32GB/s
    • No front side bus
  • QuickPath Interconnection
    • Cache coherent point-to-point link
    • High speed communications between processor chips
    • 6.4G transfers per second, 16 bits per transfer
    • Dedicated bi-directional pairs
    • Total bandwidth 25.6GB/s

18 of 28

ARM11 MPCore

  • Up to 4 processors each with own L1 instruction and data cache
  • Distributed interrupt controller
  • Timer per CPU
  • Watchdog
    • Warning alerts for software failures
    • Counts down from predetermined values
    • Issues warning at zero
  • CPU interface
    • Interrupt acknowledgement, masking and completion acknowledgement
  • CPU
    • Single ARM11 called MP11
  • Vector floating-point unit
    • FP co-processor
  • L1 cache
  • Snoop control unit
    • L1 cache coherency

19 of 28

ARM11 �MPCore �Block �Diagram

20 of 28

ARM11 MPCore Interrupt Handling

  • Distributed Interrupt Controller (DIC) collates from many sources
  • Masking
  • Prioritization
  • Distribution to target MP11 CPUs
  • Status tracking
  • Software interrupt generation
  • Number of interrupts independent of MP11 CPU design
  • Memory mapped
  • Accessed by CPUs via private interface through SCU
  • Can route interrupts to single or multiple CPUs
  • Provides inter-process communication
    • Thread on one CPU can cause activity by thread on another CPU

21 of 28

DIC Routing

  • Direct to specific CPU
  • To defined group of CPUs
  • To all CPUs
  • OS can generate interrupt to:
    • All but self
    • Self
    • Other specific CPU
  • Typically combined with shared memory for inter-process communication
  • 16 interrupt ids available for inter-process communication

22 of 28

Interrupt States

  • Inactive
    • Non-asserted
    • Completed by that CPU but pending or active in others
  • Pending
    • Asserted
    • Processing not started on that CPU
  • Active
    • Started on that CPU but not complete
    • Can be pre-empted by higher priority interrupt

23 of 28

Interrupt Sources

  • Inter-process Interrupts (IPI)
    • Private to CPU
    • ID0-ID15
    • Software triggered
    • Priority depends on target CPU not source
  • Private timer and/or watchdog interrupt
    • ID29 and ID30
  • Legacy FIQ line
    • Legacy FIQ pin, per CPU, bypasses interrupt distributor
    • Directly drives interrupts to CPU
  • Hardware
    • Triggered by programmable events on associated interrupt lines
    • Up to 224 lines
    • Start at ID32

24 of 28

ARM11 MPCore Interrupt Distributor

25 of 28

Recommended Reading

  • Stallings chapter 18
  • ARM web site

26 of 28

Intel Core i& Block Diagram

27 of 28

Intel Core Duo Block Diagram

28 of 28

Performance Effect of Multiple Cores