1 of 43

Video/Graphics of Modern Desktop Board & its Linux programming

Dr A Sahu

Dept of Comp Sc & Engg.

IIT Guwahati

1

2 of 43

Outline

  • Intel 945 Motherboard architecture
  • GMCH
  • ICH7 (8254,8259,8237)
  • PCI and PCI Express
  • Video Ram, In build GPU
  • DirectX, OpenGL, OpenCL
  • Advance GPU from ATI and AMD
    • Introduction to Nvidia Cuda Programming

2

3 of 43

Intel 945 Express Chipset

3

4 Serial ATA Ports

Integrated Matrix Storage Technology

6 PCI Slots

BIOS

Support

Intel HD Audio

8 high Speed USB Ports

6 PCI Express*

x1 slot

Intel Pro 100/1000 LAN

Intel Active

Mngement Tech.

82801 GR

ICH7 (io cont. hub sys7)

South Bridge

Intel

Pentium D Processor

DDR2

DDR2

Support for Media Ext Card

Intel GMA 950 Graphics

PCI Express*

x16 Graphics

82945

GMCH/MCH

North Bridge

4 of 43

82945 : GMCH/MCH

  • Graphics and Memory Controller Hub
  • Graphics Interface (GI) and PCI Express for Graphics card support
  • Host Interface (HI)
    • Connect to processor and support HT, IntrDelivery, 12 in-order queue, etc.
  • System Memory Interface (SMI)
    • Connected to two channel DDR2
  • Direct Media Interface (DMI)
    • Connect to ICH7

4

5 of 43

82801: ICH7

  • IO Controller HUB version 7 (South Bridge)
  • Enhance DMA controller, IC and timer
    • Two cascaded 8259 PIC
    • One 82C54 PIT (Motorola)
    • One 8237 DMA
  • Low Pin count (LPC) Interface
  • PCI and PCI express (Peripheral Component. Int)
  • AC97 & HD Audio Codec
  • Serial Peripheral Interface (SPI) Support
  • Firm wire support (BIOS)
  • ACPI, SATA, USBs

5

6 of 43

Introduction

  • Peripherals : HD monitor
  • Interfaces : Intermediate Hardware
    • Nvidia GPU card
  • Interfaces : Intermediate Software/Program
    • Nvidia GPU driver

6

Intel

Pentium D Processor

DDR2

DDR2

Support for Media Ext Card

Intel GMA 950 Graphics

PCI Express*

x16 Graphics

82945

GMCH/MCH

North Bridge

7 of 43

Migration from Char to Graphics/Video

  • Char display (80x25 char, 5x7pixel=400x175)
  • CRT Monitor (400x600, 640x480,600x800)
  • LCD Monitor (1024x768,1280x1024,…)
  • Graphics visually more appealing
  • Display Line, Circle, Rectangle, Curve, Polygon
    • Character using this primitives
    • True type font

7

RED ARROW

Circle

8 of 43

Multiplexed 1024x768 pixel display

8

1024x768 Pixel LCD

0 1 2 3 4 ….. …1023

0

1

2

767

R

Row Ctr

Col

Ctr

CLK > 1024x768x50Hz

B

G

8x3=24 Bits

Frame Buffer

Refresh screen 50 time a Sec

9 of 43

Frame Buffer (24 Bit Pixel)

9

Pixels in Frame Buffer

Pixels on

the Screen

24 Bit Per Pixels

Graphical representation of 24 bit color

10 of 43

Graphics Cards

  • GPU : specialized processor that accelerates 3D or 2D graphics primitives operations
  • Lots of Floating point operations
  • Accelerates Primitives
    • Line, circle, polygon, mesh, projection, sphere,

10

11 of 43

Graphics System

11

3D application

3D API: OpenGL

DirectX/3D

3D API Commands

CPU-GPU Boundary

GPU Command

& Data Stream

GPU

Command

Primitive

Assembly

Rastereisation

Interpolation

Raster

Operation

Frame Buffer

Programmable

Fragment

Processors

Programmable

Vertex

Processor

Vertex Index

Stream

Assembled polygon, line

& points

Pixel

Location

Stream

Pixel

Updates

Transformed

Fragments

Rastorized Pretransformed

Fragments

transformed

Vertices

Pretransformed

Vertices

12 of 43

Graphics System

12

Memory

System

Texture

Memory

Frame

Buffer

Vertex

Processing

Pixel

Processing

Vertices

(x,y,z)

Pixel

R, G,B

Vertex

Shadder

Pixel

Shadder

13 of 43

Access to video memory

  • We create a Linux device-driver that gives applications access to graphics frame-buffer
  • Accessing Frame buffer through PCI Express slot
  • Assume a Graphics card is installed in your system

13

14 of 43

14

The role of a device-driver

user

application

standard

“runtime”

libraries

call

ret

user space

kernel space

Operating System

kernel

syscall

sysret

device-driver

module

call

ret

hardware device

out

in

i/o memory

RAM

A device-driver is a software module

that controls a hardware device

in response to OS kernel requests

relayed, often, from an application

15 of 43

Raster Display Technology

15

The graphics screen is a two-dimensional array of picture elements (‘pixels’)

Each pixel’s color is an individually programmable mix of red, green, and blue

These pixels are redrawn sequentially, left-to-right, by rows from top to bottom

16 of 43

Special “dual-ported” memory

16

VRAM

RAM

CPU

CRT

16-MB of VRAM

2048-MB of RAM

17 of 43

How much VRAM is needed?

  • This depends on
    • the total number of pixels
    • the number of bits-per-pixel
  • The total number of pixels
    • Determined by the screen’s width and height
    • 1280-by-960= 1,228,800 pixels
  • The number of bits-per-pixel (“color depth”) is a programmable parameter (varies from 1 to 32)
  • Certain types of applications also need to use extra VRAM
    • for multiple displays, or for “special effects” like computer game animations

17

18 of 43

How ‘truecolor’ works

18

R

B

G

alpha

red

green

blue

0

8

16

24

pixel

longword

The intensity of each color-component within a pixel is an 8-bit value

0.5, 0, 1, 0

0, 0.5, 0

Alpha represent pre-multiplied valued

19 of 43

x86 uses “little-endian” order

19

B

G

R

A

B

G

R

A

B

G

R

VRAM

0 1 2 3

Video Screen

4 5 6 7

8 9 10

“truecolor” graphics-modes use 4-bytes per picture-element

20 of 43

Some operating system issues

  • Linux is a “protected-mode” operating system
  • I/O devices normally are not directly accessible
  • Linux on x86 platforms uses “virtual memory”
  • Privileged software must “map” the VRAM
  • A device-driver module is needed: ‘vram.c
  • We can compile it using: $ mmake vram
  • Device-node: # mknod /dev/vram c 98 0
  • Make it ‘writable’: # chmod a+w /dev/vram

20

21 of 43

Our ‘vram.c’ module

  • It’s a character-mode Linux device-driver
  • It implements four device-file ‘methods’:
    • read()’: lets a program read from video memory
    • write()’: lets a program write to video memory
    • llseek()’: lets a program ‘move’ the file’s pointer
    • mmap()’: lets a program ‘map’ vram to user-space
  • It also implements a pseudo-file that lets users view the RADEON X300 graphics controller’s PCI Configuration Space parameter-values:

$ cat /proc/vram

21

22 of 43

What is PCI?

  • It’s an acronym for “Peripheral Component Interconnect” and refers to a collection of industry standards for devices used in PCs
  • An Intel-sponsored initiative (from 1992-9) having several ambitious goals:
      • Reduce diversity inherent in legacy PC devices
      • Improve speed and efficiency of data-transfers
      • Eliminate (or reduce) platform dependencies
      • Simplify adding/removing peripheral adapters
      • Lower PC’s total consumption of electrical power

22

23 of 43

23

PCI Configuration Space

PCI Configuration Space Body

(48 doublewords – variable format)

64

doublewords

PCI Configuration Space Header

(16 doublewords – fixed format)

A non-volatile parameter-storage area for each PCI device-function

24 of 43

24

Example: Header Type 0

Status

Register

Command

Register

Device

ID

Vendor

ID

BIST

Cache

Line

Size

Class Code

Class/SubClass/ProgIF

Revision

ID

Base Address 0

Subsystem

Device ID

Subsystem

Vendor ID

CardBus CIS Pointer

reserved

capabilities

pointer

Expansion ROM Base Address

Minimum

Grant

Interrupt

Pin

reserved

Latency

Timer

Header

Type

Base Address 1

Base Address 2

Base Address 3

Base Address 4

Base Address 5

Interrupt

Line

Maximum

Latency

31 0

31 0

16 doublewords

Dwords

1 - 0

3 - 2

5 - 4

7 - 6

9 - 8

11 - 10

13 - 12

15 - 14

25 of 43

Examples of VENDOR-IDs

  • 0x8086 – Intel Corporation
  • 0x1022 – Advanced Micro Devices, Inc
  • 0x1002 – Advanced Technologies, Inc (My office machine)
  • 0x10EC – RealTek, Incorporated
  • 0x10DE – Nvidia Corporation
  • 0x10B7 – 3Com Corporation
  • 0x101C – Western Digital, Inc
  • 0x1014 – IBM Corporation
  • 0x0E11 – Compaq Corporation
  • 0x1057 – Motorola Corporation
  • 0x106B – Apple Computers, Inc
  • 0x5333 – Silicon Integrated Systems, Inc

25

26 of 43

Examples of DEVICE-IDs

  • 0x5347: ATI RAGE128 SG
  • 0x4C58: ATI RADEON LX
  • 0x5950: ATI RS480
  • 0x436E: ATI IXP300 SATA
  • 0x438C: ATI IXP600 IDE
  • 0x5B60: ATI Radeon HD 3200 Graphics

See this Linux header-file for lots more examples:

</usr/src/linux/include/linux/pci_ids.h>

26

27 of 43

Defined PCI Class Codes

  • 0x00: Legacy Device (i.e., built before class-codes were defined)
  • 0x01: Mass Storage controller
  • 0x02: Network controller
  • 0x03: Display controller
  • 0x04: Multimedia device
  • 0x05: Memory Controller
  • 0x06: Bridge device
  • 0x07: Simple Communications controller
  • 0x08: Base System peripherals
  • 0x09: Input device
  • 0x0A: Docking stations
  • 0x0B: Processors
  • 0x0C: Serial Bus controllers
  • 0x0D: Wireless controllers
  • 0x0E: Intelligent I/O controllers
  • 0x0F: Encryption/Decryption controllers
  • 0x10: Satellite Communications controllers
  • 0x11: Data Acquisition and Signal Processing controllers

27

28 of 43

Example of Sub-Class Codes

  • Class Code 0x01: Mass Storage controller
    • 0x00: SCSI controller
    • 0x01: IDE controller
    • 0x02: Floppy Disk controller
    • 0x03: IPI controller
    • 0x04: RAID controller
    • 0x80: Other Mass Storage controller

28

29 of 43

29

Example of Sub-Class Codes

  • Class Code 0x02: Network controller
    • 0x00: Ethernet controller
    • 0x01: Token Ring controller
    • 0x02: FDDI controller
    • 0x03: ATM controller
    • 0x04: ISDN controller
    • 0x80: Other Network controller

30 of 43

Example of Sub-Class codes

  • Class Code 0x03: Display Controller
    • 0x00: VGA-compatible controller
    • 0x01: XGA controller
    • 0x02: 3D controller
    • 0x80: Other display controller

30

31 of 43

Hardware details may differ

  • Graphics controllers use vendor-specific mechanisms to perform similar operations
  • There’s a common core of compatibility with IBM’s VGA (Video Graphics Array) developed in the mid-1980s
  • But since IBM’s loss of market dominance, each manufacturer has added enhancements which employ incompatible programming interfaces
  • You need a vendor’s manual! (Download from vendor site)

31

32 of 43

The ‘frame-buffer’

  • Today’s PCI graphics systems all provide a dedicated amount of display memory to control the screen-image’s pixel-coloring
  • But how much memory will vary with price
  • And its location within the CPU’s physical address-space can’t be predicted because it depends upon what other PCI devices are installed (and mapped) during startup

32

33 of 43

The ‘base address’ fields

  • The PCI Configuration Header has several so-called Base Address fields, and vendors use one of these to hold the frame-buffer’s starting address and to indicate how much vram the video controller can actually use
  • The Linux kernel provides driver-writers with some convenient functions for getting the location and size of the frame-buffer

33

34 of 43

ATI Radeon uses Base Address 0

  • Our ‘vram.c’ module’s initialization routine employs these kernel helper-functions:

34

#include <linux/pci.h>

struct pci_dev *devp; // for a variable that will point to

//a kernel-structure

// get a pointer to the PCI device’s Linux data-structure

devp = pci_get_device( VENDOR_ID, DEVICE_ID, NULL );

if ( !devp ) return –ENODEV; // device is not present

// get starting address and length for memory-resource 0

vram_base = pci_resource_start( devp, 0 );

vram_size = pci_resource_len( devp, 0 );

35 of 43

Reading from ‘vram’

  • You can use our ‘fileview’ utility to see the current contents of the video frame-buffer

$ fileview /dev/vram

  • Our ‘vram.c’ driver’s ‘read()’ method gets invoked when an application-program attempts to ‘read’ from the ‘/dev/vram’ device-file
  • The read-method is implemented by our driver using ‘ioremap()’ (and ’iounmap()’) to temporarily map a 4KB-page of physical vram to the kernel’s virtual address-space

35

36 of 43

I/O ‘memcpy()’ functions

  • Linux provides a ‘platform-independent’ way to do copying from an i/o-device’s memory into an application’s buffer (or vice-versa):
    • A ‘read’ copies from vram to a user’s buffer

memcpy_fromio( buf, vaddr, len );

    • A ‘write’ copies to vram from a user’s buffer

memcpy_toio( vaddr, buf, len );

36

37 of 43

‘mmap()’

  • This is a standard UNIX system-call that lets an application ‘map’ a file into its virtual address-space, where it can then treat the file as if it were an ordinary array
  • See the man-page: $ man mmap
  • This same system-call can also work on a device-file if that device’s driver provided ‘mmap()’ among its file-operations

37

38 of 43

The user-role

  • In the application-program, six arguments get passed to the ‘mmap()’ library-function

int mmap( (void*)baseaddress,

int memorysize,

int accessattributes,

int flags,

int filehandle,

int offset );

38

39 of 43

The driver-role

  • In the kernel, those six arguments will get validated and processed, then the driver’s ‘mmap()’ callback-function will be invoked to supply missing information and perform further sanity-checks and do appropriate page-mapping actions:

int mmap( struct file *file,

struct vm_area_struct *vma );

39

40 of 43

Our driver’s code

40

int mmap( struct file *file, struct vm_area_struct *vma )

{

// extract the paramers we will need from the ‘vm_area_struct’

unsigned long region_length = vma->vm_end – vma->vm_start;

unsigned long region_origin = vma->vm_pgoff * PAGE_SIZE;

unsigned long physical_addr = fb_base + region_origin;

unsigned long user_virtaddr = vma->vm_start;

// sanity check: mapped region cannot extend past end of vram

if ( region_origin + region_length > fb_size ) return –EINVAL;

// tell the kernel not to try ‘swapping out’ this region to the disk

vma->vm_flags |= VM_RESERVED;

// tell the kernel to exclude this region from any core dumps

vma->vm_flags |= VM_IO;

41 of 43

Driver’s code continued

41

// invoke a helper-function that will set up the page-table entries

if ( remap_pfn_range( vma, user_virtaddr, physical_addr >> 12,

region_length, vma->vm_page_prot ) ) return –EAGAIN;

return 0; // SUCCESS

}

42 of 43

Demo: ‘rotation.cpp’

  • This application-program will demonstrate use of our ‘vram.c’ device-driver’s ‘read()’, ‘write()’ and ‘llseek()’ methods (i.e., device-file operations)
  • It will perform a rotation of the color-components (R,G,B) in every displayed ‘truecolor’ pixel: R 🡪 G G 🡪 B B 🡪 R
  • After 3 times the screen will look normal again

42

43 of 43

Thanks

43