## **DRAM Technology**

- Data stored as a charge in a capacitor
  - Single transistor used to access the charge
  - Must periodically be refreshed
    - Read contents and write back
    - Performed on a DRAM "row"





## **Advanced DRAM Organization**

- Bits in a DRAM are organized as a rectangular array
  - DRAM accesses an entire row
  - Burst mode: supply successive words from a row with reduced latency
- Double data rate (DDR) DRAM Relative to the externally supplied clock (SDRAM)
  - Transfer on rising and falling clock edges
- Quad data rate (QDR) DRAM
  - Separate DDR inputs and outputs



### **DRAM Generations**

|      | <del>-</del> | i         |
|------|--------------|-----------|
| Year | Capacity     | \$/GB     |
| 1980 | 64Kbit       | \$1500000 |
| 1983 | 256Kbit      | \$500000  |
| 1985 | 1Mbit        | \$200000  |
| 1989 | 4Mbit        | \$50000   |
| 1992 | 16Mbit       | \$15000   |
| 1996 | 64Mbit       | \$10000   |
| 1998 | 128Mbit      | \$4000    |
| 2000 | 256Mbit      | \$1000    |
| 2004 | 512Mbit      | \$250     |
| 2007 | 1Gbit        | \$50      |



#### **DRAM Performance Factors**

- Row buffer
  - Allows several words to be read and refreshed in parallel
- Synchronous DRAM Operates in Synchrony with external clock

#### **SDRAM**

- Allows for consecutive accesses in bursts without needing to send each address
- Improves bandwidth
- DRAM banking (Interleaves Memory Banks)
  - Allows simultaneous access to multiple DRAMs
  - Improves bandwidth

## **Increasing Memory Bandwidth**



a. One-word-wide memory organization

- 4-word wide memory
  - Miss penalty = 1 + 15 + 1 = 17 bus cycles
  - Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle
- 4-bank interleaved memory
  - Miss penalty =  $1 + 15 + 4 \times 1 = 20$  bus cycles
  - Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle

Interleaved Memory Banks (Diaginhon Muitus) Requests (Addresses) (optional Request Men. Mem. Mem. Queues, 1 Bonk 1 , Bankø Hodr: Addr: 14 case Addr: 00 10 Gyflicts Data back & One new e.g.: 100 MS Per Bomk Request Access Split Transactions on request/reply "bus" -- pipelining

... multiple + coursactions interleaved in time

bouk conflict! \$4 A8 AC BOD BH B18 10 20 04 Bounk 1 88 Bank 2 Bank 3 Split Transactions: Replies · do NOT hold "the bus exclusively" Out-of-Order. between request and reply: release it, for other transactions . need Transaction ID with every request and its corressonding reply especially in case of replied out-of-order

# **Disk Storage**

Nonvolatile, rotating magnetic storage





#### **Disk Sectors and Access**

- Each sector records
  - Sector ID
  - Data (512 bytes, 4096 bytes proposed)
  - Error correcting code (ECC)
    - Used to hide defects and recording errors
  - Synchronization fields and gaps
- Access to a sector involves
  - Queuing delay if other accesses are pending
  - Seek: move the heads
  - Rotational latency
  - Data transfer
  - Controller overhead

## Disk Access Example

- Given
  - 512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms controller overhead, idle disk
- Average read time
  - 4ms seek time
    - $+ \frac{1}{2} / (15,000/60) = 2$ ms rotational latency
    - + 512 / 100MB/s = 0.005ms transfer time
    - + 0.2ms controller delay
    - = 6.2 ms
- If actual average seek time is 1ms
  - Average read time = 3.2ms

#### **Disk Performance Issues**

- Manufacturers quote average seek time
  - Based on all possible seeks
  - Locality and OS scheduling lead to smaller actual average seek times
- Smart disk controller allocate physical sectors on disk
  - Present logical sector interface to host
  - SCSI, ATA, SATA
- Disk drives include caches
  - Prefetch sectors in anticipation of access
  - Avoid seek and rotational delay



\$13.2 Körzos Ekkirnons, Mapoxin (startup Cost vs. Throughput)

=>Amortize cost over Large data blocks



#### **Instruction Set Architecture for I/O**

- Some machines have special input and output instructions
- ° Alternative model (used by MIPS):
  - Input: ~ reads a sequence of bytes
  - Output: ~ writes a sequence of bytes
- Memory also a sequence of bytes, so use loads for input, stores for output
  - Called "Memory Mapped Input/Output"
  - A portion of the address space dedicated to communication paths to Input or Output devices (no memory there)

#### **Memory Mapped I/O**

- Certain addresses are not regular memory
- Instead, they correspond to registers in I/O devices



#### Example: keyboard... if only a Data Register:



#### Example: keyboard... if only a Data Register:



#### Processor Checks Status before Acting

- Path to device generally has 2 registers:
  - 1 register says it's OK to read/write (I/O ready), often called Control Register
  - 1 register that contains data, often called <u>Data Register</u>
- OPPOCESSOR reads from Control Register in loop, waiting for device to set Ready bit in Control reg to say its OK (0 P 1) "Busy wait" if done
- Processor then loads from (input) or writes to (output) data register
  - Load from device/Store into Data Register resets Ready bit (1 P 0) of Control Register<sup>10</sup>

"Busy wait" if done continuously; else, poll multiple devices on every interrupt from the real-time clock (usu. 50-120 Hz)

# I/O Address Pages must be non-cacheable !



• transitional ("non-coherent"...) caching does <u>NOT</u> work when other devices (I/O, other proc. wores) access memory independently onte: write-through is a "half-solution": works for output; but not for input...

I/o/Communication Registers

- Normal Memory Semantics (non-shared)

## Normal Memory:

Solme ward: Scead x time read x

Read always yields the last written value



I/o/Communication Registers:

Some word:

Sead ×2 Depotentially ≠

Successive reads from a source location

(without any intervening writes from processor)

many yield different valued.

data

status

May change
other words too!

"Side-Effects" wiests.

Memory Consistency (Ewénera Munifins) device of ton from communication processor(s) write: from input reowler. doita 1 wait to see too slow?? -the flag data2 be come 1 ( inew) data3 network In-order or out-of-order delivery? these reside on different memory bouks In an interleaved memory !

#### What is the alternative to polling?

- ° Wasteful to have processor spend most of its time "spin–waiting" for I/O to be ready
- ° Wish we could have an unplanned procedure call that would be invoked only when I/O device is ready
- Solution: use exception mechanism to help I/O. Interrupt program when I/O ready, return when done with data transfer

#### **I/O Interrupt**

- ° An I/O interrupt is like an overflow exceptions except:
  - An I/O interrupt is "asynchronous"
  - More information needs to be conveyed
- Our An I/O interrupt is asynchronous with respect to instruction execution:
  - I/O interrupt is not associated with any instruction, but it can happen in the middle of any given instruction
  - I/O interrupt does not prevent any instruction from completion

#### **Definitions for Clarification**

- Exception: signal marking that something "out of the ordinary" has happened and needs to be handled
- Interrupt: asynchronous exception
- Trap: synchronous exception
- On Note: These are different from the book's definitions.

#### **Interrupt Driven Data Transfer**



#### **Questions Raised about Interrupts**

- ° Which I/O device caused exception?
  - Needs to convey the identity of the device generating the interrupt
     Cause register, or Vectored Interrupts
- °Can avoid interrupts during the interrupt routine?
  - What if more important interrupt occurs while servicing this interrupt?
  - Allow interrupt routine to be entered again?
- ° Who keeps track of status of all the devices, handle errors, know where to put/supply the I/O data?

Foist Devices need I/O Buffer - not just a Register ... Amortize the cost of Interrupt over many dator JOBuffer

e.g. | Gill one buffer by I/O,
| while servicing the other

by the processor)

[e.g. 1Gb/s] wore | 4 KBy1

(e.g. 1Gb/s) | Wore | 4 By every 32 ns example: (just) 1 Gbit/s 1 bit every 1 ns 32 bits every 32 ns but this may still be a problem of OK
but this fer slove word-by word

The from by the Proce law ratio

(load-store law)

(load-store law) "Register" · lost to poll status register (non-cacheable, off-chip)
usually > DRAM access usually ~ 100 ns (or more) · then read the data register a · Cost of Interrupt + Kernel Interrupt houdles usually ~ 1 45 (1000 ns)!

# Direct Memory Access (DMA)



• DMA onto non-cacheable memory pages ... tooslow when processor processes the Jb data of Flush the Cache before after I/O DMA ... quite expensive operation / total flush?

· Coche-Coherent DMA < good! > next chapter ...

(scan entire cache)