Advanced Computer Architecture
22-01-2019

A) A processor embeds two cores that have private L1 and L2 caches, L3 is shared. The caches obey the MESI protocol and have the following structure: 32KB, 4-way L1 I-cache and D-cache, each 32-byte block; 1024KB, 8-way associative, 64-byte block L2 cache; 4MB, 8-way associative, 64-byte block L3 cache. The latencies (disregarding virtual memory TLB) expressed in clock cycles are: 2 in L1, 6 in L2, 12 in L3. Addresses are 48-bit long. Write operations are managed with a write-back policy. Assuming initially empty and invalidated cache lines throughout the hierarchy, consider the following memory accesses:

core 1) ST F1, 0000FFFFA000hex; (8-byte store)

core 1) LD F2, 0000FFFFA0A0hex; (8-byte load)

core 2) ST F1, 0000FFFFA000hex; (8-byte store)

core 2) ST F3, 0000FFFFA0A0hex; (8-byte store)

a1) show the blocks involved in each cache, and the associated MESI state after each instruction.

L1: disp 5; index 8; tag 35

L2: disp 6; index 11; tag 31
L3: disp 6; index 13; tag 29

core 1) ST F1, 0000FFFFA000hex; (8-byte store)

 core1 core2

L1 000000000000000011111111111111111010000000000000 miss M -

L2 000000000000000011111111111111111010000000000000 miss I -

L3 000000000000000011111111111111111010000000000000 miss I

core 1) LD F2, 0000FFFFA0A0hex; (8-byte load)

 core1 core2

L1 000000000000000011111111111111111010000010100000 miss E -

L2 000000000000000011111111111111111010000010100000 miss E -

L3 000000000000000011111111111111111010000010100000 miss E

core 2) ST F1, 0000FFFFA000hex; (8-byte store)

 core1 core2

L1 000000000000000011111111111111111010000000000000 miss I M

L2 000000000000000011111111111111111010000000000000 miss I I

L3 000000000000000011111111111111111010000000000000 miss I
first block in cache L3 is sent to memory, then cache blocks L1(1) and L2(1) are invalidated,
then L1(2) is set to M, and L2(2) is set to I, finally L3 is set to I

core 2) ST F3, 0000FFFFA0A0hex; (8-byte load)

 core1 core2

L1 000000000000000011111111111111111010000010100000 miss I M

L2 000000000000000011111111111111111010000010100000 miss I I

L3 000000000000000011111111111111111010000010100000 hit I

cache blocks L1(1) and L2(1) are invalidated,
then L1(2) is set to M, and L2(2) is set to I, L3 is set to I

B) Let us consider the following code fragment of compiler-unscheduled instructions:

 ADDI R1,R0,0000FFFFA000hex -- R1 set to base address 0000FFFFA000hex

loop: LD F1,0(R1)

 LD F3,8(R1)

 MULTF F2,F1,F0 -- F0 contains a pre-loaded float constant

 MULTF F2,F3,F2

 ADD R1,R1,16

 LD F1,0(R1)

 LD F3,-8(R1)

 ADDF F6,F3,F1

 MULTF F5,F2,F6

 ST F5,-16(R1)

 BLI R1, 000100000000hex,loop -- compare immediate, branch on less

assuming this code fragment is executed in a statically scheduled pipeline with stages

IF|ID|INT1|INT2| |ME1|ME2|WB

 |A1 |A2 |A3|

 |M1 |M2 |M3|M4|

 |Div1-Div8|

(the Div unit is blocking) and proper forwarding units, assuming that branch instructions take a decision in stage ME2, assuming that all LD hit in L1 D-cache during ME1 (but still go through ME2), and that all ST hit during ME2

b1) show a POE (compiler schedule) that minimizes the clock cycles required to complete execution;

Producer/Consumer table

|  |  |
| --- | --- |
|  | PRODUCER |
| CONSUMER |  | INT | LD | FA | FM | FDIV |
| INT | 1 | 2 | - | - | - |
| LD | 1 | 2 | - | - | - |
| FA | - | 2 | 2 | 3 | 7 |
| FM | - | 2 | 2 | 3 | 7 |
| FDIV | - | 2 | 2 | 3 | 7 |
| ST | 1 | 2 | 2 | 3 | 7 |
|  |  |  |  |  |  |

|  |  |  |
| --- | --- | --- |
|  |  |  |
| Unscheduled | Scheduled – no register renaming | Scheduled – register renaming |
| ADDI R1,R0,0000FFFFA000hex  *nop* loop: LD F1,0(R1) LD F3,8(R1) *nop*  MULTF F2,F1,F0 *nop*  *nop* *nop* MULTF F2,F3,F2 ADD R1,R1,16 *nop*  LD F1,0(R1) LD F3,-8(R1) *nop*  *nop*  ADDF F6,F3,F1 *nop*  *nop*  MULTF F5,F2,F6 *nop*  *nop*  *nop*  ST F5,-16(R1) BLI R1, 000100000000hex,loop  DELAY-1 DELAY-2 DELAY-3 | ADDI R1,R0,0000FFFFA000hex  *nop* loop: LD F1,0(R1) LD F3,8(R1) ADD R1,R1,16 MULTF F2,F1,F0 LD F1,0(R1) *nop* *nop* MULTF F2,F3,F2 LD F3,-8(R1) *nop*  *nop*  ADDF F6,F3,F1 *nop*  *nop*  MULTF F5,F2,F6 BLI R1, 000100000000hex,loop  *nop*  *nop*  ST F5,-16(R1)Branch delay slots scheduled | ADDI R1,R0,0000FFFFA000hex  *nop* loop: LD F1,0(R1) LD F3,8(R1) ADD R1,R1,16 MULTF F2,F1,F0 LD F1,0(R1) LD F7,-8(R1) *nop* MULTF F2,F3,F2 ADDF F6,F7,F1 *nop*  *nop*  MULTF F5,F2,F6 BLI R1, 000100000000hex,loop  *nop*  *nop*  ST F5,-16(R1)Removed name dependency on F3 MULTF F2,F3,F2 LD F3,-8(R1) |

b2) show a ROE for the first iteration of the optimized POE and determine the CPI of the execution of this optimized kernel.

The chart shows the ROE of a scheduled (no register renaming) algorithm, where nops are not included, and the hardware takes care of conflicts.

The number of iterations is (000100000000hex - 0000FFFFA000hex)/16dec = 6000hex/10hex = 600hex = 6\*162dec = 1536dec

Each iteration loop executes 11 instructions.

Iterations overlap for (23-2+1)=22 clock cycles, the prologue is 1 clock cycle, the epilogue is 4 clock cycles, so the CPI is (1+1536\*22+4)/(1+1536\*11)= 2,000177546309996

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | **1** | **2** | **3** | **4** | **5** | **6** | **7** | **8** | **9** | **10** | **11** | **12** | **13** | **14** | **15** | **16** | **17** | **18** | **19** | **20** | **21** | **22** | **23** | **24** | **25** | **26** | **27** |
| ADDI R1,R0,0000FFFFA000hex | IF | ID | INT1 | INT2 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| loop: LD F1,0(R1) |  | IF | ID | ID | INT1 | INT2 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| LD F3,8(R1) |  |  | IF | IF | ID | INT1 | INT2 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| ADD R1,R1,16 |  |  |  |  | IF | ID | INT1 | INT2 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| MULTF F2,F1,F0 |  |  |  |  |  | IF | ID | M1 | M2 | M3 | M4 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |  |  |  |  |
| LD F1,0(R1) |  |  |  |  |  |  | IF | ID | INT1 | INT2 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| MULTF F2,F3,F2 |  |  |  |  |  |  |  | IF | ID | ID | ID | M1 | M2 | M3 | M4 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |
| LD F3,-8(R1) |  |  |  |  |  |  |  |  | IF | IF | IF | ID | INT1 | INT2 | MEM1 | MEM2 | WB |  |  |  |  |  |  |  |  |  |  |
| ADDF F6,F3,F1 |  |  |  |  |  |  |  |  |  |  |  | IF | ID | ID | ID | A1 | A2 | A3 | MEM1 | MEM2 | WB |  |  |  |  |  |  |
| MULTF F5,F2,F6 |  |  |  |  |  |  |  |  |  |  |  |  | IF | IF | IF | ID | ID | ID | M1 | M2 | M3 | M4 | MEM1 | MEM2 | WB |  |  |
| BLI R1, 000100000000hex,loop |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | IF | IF | IF | ID | INT1 | INT2 | MEM1 | MEM2 | WB |  |  |  |
| ST F5,-16(R1) |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | IF | ID | ID | ID | INT1 | INT2 | MEM1 | MEM2 | WB |
| ISTR A |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | IF | IF | IF | ID | canc |  |  |  |
| ISTR B |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | IF | canc |  |  |  |
| LD F1,0(R1) |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | IF | ID | INT1 | INT2 |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

C) The processor runs at 2.6GHz, and has a 64-bit external bus, driven by a memory interface module capable of sustaining 2.2 GT/sec. The external RAM is realized with DDR4 chips (see attached table) and is logically organized with 2 banks, each capable of delivering a 16-bit word. Addressing the memory subsystems requires two bus cycles, and activating a memory row requires 2 bus clock cycles.

c1) Choose a DDR4 module that allows to sustain a burst transfer mode from DDRAM compatible with the memory interface of the processor and estimate the cost of a miss for the above described cache hierarchy;
c2) discuss possible improvements to the memory subsystem organization to reduce the cost of the miss.

C1) The bus interface is capable of 2.2 GT/s, which translates to a maximum bus frequency of 1.1GHz, in the DDR mode. So, DDR4 modules compatible with this constraint are those up to DDR4-2133 included. Using DDR4-2133, the external bus will run at 1066.67MHz.

C2) Tmiss=Taddr+Natt\*Tatt+Ntran\*Ttrans (values are external bus clock cycles)

The L3 cache line is 64-byte long. Each bank is capable of delivering 2 bytes, so a single activation allows to extract 4 bytes. Accordingly, the number of activations Natt is 64/4=16. The number of transfers Ntran is also 64/4=16, each requiring half clock cycle (DDR mode).

So, Tmiss=2+16\*2+16\*0,5=42 (external bus clock cycles)
Since the processor clock cycle is 2.6GHz, tha cost of a miss in processor clock cycles is
Tmissprocessor = ⎡2.66/1.066 \*42⎤ = 105 (processor clock cycles)

D) Each core of the processor described in A) is organized as a superscalar, 2-way pipeline, that fetches, decodes issues and retires (commits) bundles containing each 2 instructions. The front-end in-order section (fetch and decode) consists of 2 stages. The issue logic takes 1 clock cycle, if the instructions in the bundle are independent, otherwise it takes 2 clock cycles. The architecture supports dynamic speculative execution, and control dependencies from branches are solved when the branch evaluates the condition, even if it is not at commit. The execution model obeys the attached state transition diagram. There is a functional unit (FUs) Int1 for integer arithmetics (arithmetic and local instructions, branches and jumps, no multiplication), 1 FUs FAdd1 for floating point addition/subtraction, 1 FU FMolt1 for floating point multiplication, and a FU for division, FDiv1.

There are 12 integer (R0-R11) and 12 floating point (F0-F11) registers. Speculation is handled through a 8-entry ROB, a pool of 4 Reservation Stations (RS) Rs1-4 shared among all FUs, 2 load buffers Load1 Load2, 1 store buffer Store1 (see the attached execution model): an instruction bundle is first placed in the ROB (if two entries are available), then up to 2 instructions are dispatched to the shared RS (if available) when they are ready for execution and then executed in the proper FU. FUs are *pipelined* (except for the float division unit, which is blocking) and have the latencies quoted in the following table:

|  |  |
| --- | --- |
| Int - 2 | Fadd – 3 |
| Fmolt – 4 | Fdiv – 8 |

Further assumption

* The code is assumed to be already in the I-cache; data caches are described in point A) and are assumed empty and invalidated; the cost of a miss is 40.

d1) assuming a write-back protocol for cache management, show state transitions for all instructions in the first iteration and instructions up to PC03 included in the second iteration :

PC01 ADDI R1,R0,0000FFFFA000hex

PC02 LD F1,0(R1)

PC03 LD F3,8(R1)

PC04 MULTF F2,F1,F0

PC05 MULTF F2,F3,F2

PC06 ADD R1,R1,16

PC07 LD F1,0(R1)

PC08 LD F3,-8(R1)

PC09 ADDF F6,F3,F1

PC10 MULTF F5,F2,F6

PC11 ST F5,-16(R1)

PC12 BLI R1,000100000000hex,PC02

d2) show ROB, RS and buffer status at the issue of the bundle containing PC03 in the second iteration.

Dynamic speculative execution

Decoupled ROB RS execution model

|  |  |  |  |
| --- | --- | --- | --- |
| **ISTRUCTION** |  |  | **INSTRUCTION STATE**  |
|  | **n.ite** | **ROBpos** | **WO** | **RE** | **DI** | **EX** | **WB** | **RR** | **CO** |
| PC01 ADDI R1,R0,0000FFFFA000hex | **1** | **0** | **-** | **1** | **2** | **3-4** | **5** | **6-68** | **69** |
| PC02 LD F1,0(R1) | **1** | **1** | **2-4** | **5** | **6** | **7-66** | **67** | **68** | **69** |
| PC03 LD F3,8(R1) | **1** | **2** | **3-4** | **5** | **6** | **7-67** | **68** | **69-74** | **75** |
| PC04 MULTF F2,F1,F0  | **1** | **3** | **3-66** | **67** | **68** | **69-72** | **73** | **74** | **75** |
| PC05 MULTF F2,F3,F2 | **1** | **4** | **4-72** | **73** | **74** | **75-78** | **79** | **80** | **81** |
| PC06 ADD R1,R1,16 | **1** | **5** | **4** | **5-6** | **7** | **8-9** | **10** | **11-80** | **81** |
| PC07 LD F1,0(R1) | **1** | **6** | **5-9** | **10-67** | **68** | **69-70** | **71** | **72-81** | **82** |
| PC08 LD F3,-8(R1) | **1** | **7** | **6-9** | **10-68** | **69** | **70-71** | **72** | **73-81** | **82** |
| PC09 ADDF F6,F3,F1 | **1** | **0** | **70-71** | **72** | **73** | **74-76** | **77** | **78-86** | **87** |
| PC10 MULTF F5,F2,F6 | **1** | **1** | **71-78** | **79** | **80** | **81-84** | **85** | **86** | **87** |
| PC11 ST F5,-16(R1) | **1** | **2** | **76-83** | **84** | **85** | **86-87** | **-** | **88** | **89** |
| PC12 BLI R1, 000100000000hex, PC02 | **1** | **3** | **-** | **77** | **78** | **79-80** | **-** | **81-88** | **89** |
| PC02 LD F1,0(R1) | **2** | **4** | **-** | **85** | **86** | **87-88** | **89** | **90-91** | **92** |
| PC03 LD F3,8(R1) | **2** | **5** |  | **86** | **87** | **88-89** | **90** | **91** | **92** |
|  |  |  |  |  |  |  |  |  |  |

RAW not highlighted – structural conflicts in colors (dispatch to reservation station, load buffers, ROB ROB)
L1-Dcache assumed to sustain 2 ops (either 2 LD for any address or 1 LD and 1 ST, provided addresses match different stes)

BLI decides on 80, so NextPC updated on 81, fetch hit I-L1 82-83, decode 84, issue begins 85
No instruction in the ROB after BLI because ROB is full until clock 81 included, so any instruction in the front-end section is cancelled at clock 80.

|  |  |
| --- | --- |
|  | Reservation station and load/store buffers |
| Busy | Op | Vj | Vk | ROBj | ROBk | ROB pos | Address |
| Rs1 |  |  |  |  |  |  |  |  |
| Rs2 |  |  |  |  |  |  |  |  |
| Rs3 |  |  |  |  |  |  |  |  |
| Rs4 |  |  |  |  |  |  |  |  |
| Load1 | Yes | PC02 LD F1,0(R1) |  |  |  |  | 4 | [R1] |
| Load2 |  |  |  |  |  |  |  |  |
| Store1 | yes | PC11 ST F5,-16(R1) | [Rob1] |  |  |  | 2 | [R1]-16 |
|  |  |  |  |  |  |  |  |  |

ROBj ROBk: sources not yet available

ROB pos: ROB entry number where instruction is located

|  |  |
| --- | --- |
|  | Result Register status |
| Integer | R0 | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 | R11 |  |  |
| ROB pos |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| state |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| Float. | F0 | F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 |  |  |
| ROB pos |  | 4 |  |  |  | 1 | 0 |  |  |  |  |  |  |  |
| state |  | B |  |  |  | B | B |  |  |  |  |  |  |  |

|  |
| --- |
| Reorder Buffer (ROB) |
|  ROB Entry#  | Busy | Op | Status | Destination | Value |
| 0 | **0** | PC09 ADDF F6,F3,F1 | RR | F6 | [Rob6]+[Rob7] |
| 1 | **1** | PC10 MULTF F5,F2,F6 | RR | F5 | [Rob0][Rob4] |
| 2 | **2** | PC11 ST F5,-16(R1) | DI | [R1]-16 | [Rob1] |
| 3 | **3** | PC12 BLI R1, 000100000000hex, PC02 | RR |  |  |
| 4 | **4** | PC02 LD F1,0(R1) | RE | F1 | Mem([R1]) |
| 6 |  |  |  |  |  |
| 7 |  |  |  |  |  |

CLOCK cycle 85

DI

EX

WB

RR

RE

CO

wo

QUEUE

**Decoupled execution model for bundled (paired) instructions**

The state diagram depicts the model for a dynamically scheduled, speculative execution microarchitecture equipped with a Reorder Buffer (ROB) and a set of Reservation Stations (RS). The ROB and RSs are allocated during the ISSUE phase, denoted as RAT (Register Alias Allocation Table) in INTEL microarchitectures, as follows: a bundle (2 instructions) if fetched from the QUEUE of decoded instructions and ISSUED if there is a couple of consecutive entries in the ROB ( head and tail of the ROB queue do not match); a maximum of two instructions are moved into the RS (if available) when all of their operands are available. Access memory instructions are allocated in the ROB and then moved to a load/store buffer (if available) when operands (address and data, if proper) are available .

**States** are labelled as follows:

WO: Waiting for Operands (at least one of the operands is not available)

RE: Ready for Execution (all operands are available)

DI: Dispatched (posted to a free RS or load/store buffer)

EX: Execution (moved to a load/store buffer or to a matching and free UF)

WB: Write Back (result is ready and is returned to the Rob by using in exclusive mode the Common Data Bus CDB)

RR: Ready to Retire (result available or STORE has completed)

CO: Commit (result is copied to the final ISA register)

**State transitions** happen at the following events:

 *from* QUEUE *to* WO: ROB entry available, operand missing

*from* QUEUE *to* RE: ROB entry available, all operands available

*loop at* WO: waiting for operand(s)

*from* WO *to* RE: all operands available

*loop at* RE: waiting for a free RS or load/store buffer

*from* RE *to* DI: RS or load/store buffer available

*loop on* DI: waiting for a free UF

*from* DI *to* EX: UF available

*loop at* EX: multi-cycle execution in a UF, or waiting for CDB

*from* EX *to* WB: result written to the ROB with exclusive use of CDB

*from* EX *to* RR: STORE completed, branch evaluted

*loop at* RR: instruction completed, not at the head of the ROB, or bundled with a not RR instruction

*from* RR *to* CO: bundle of RR instructions at the head of the ROB, no exception raised

**Resources***Register-to-Register* instructions hold resources as follows:

ROB: from state WO (or RE) up to CO, inclusive;

RS: state DI

UF: EX and WB

*Load/Store* instructions hold resources as follows:

ROB: from state WO (or RE) up to CO, inclusive;

Load buffer: from state DI up to WB

Store buffer: from state DI up to EX (do not use WB)

**Forwarding**: a write on the CDB (WB) makes the operand available to the consumer in the same clock cycle. If the consumer is doing a state transition from QUEUE to WO or RE, that operand is made available; if the consumer is in WO, it goes to RE in the same clock cycle of WB for the producer.

**Branches**: they compute Next-PC and the branch condition in EX and optionally forward Next-PC to the “in-order” section of the pipeline (Fetch states) in the next clock cycle. They do not enter WB and go to RR instead.

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| **Standard name** | **Memoryclock(MHz)** | **I/O busclock(MHz)** | **Datarate(**[**MT/s**](https://en.wikipedia.org/wiki/Transfer_%28computing%29)**)** | **Modulename** | **Peak trans-fer rate(MB/s)** | **TimingsCL-tRCD-tRP** | **CASlatency(ns)** |
| DDR4-1600J\*DDR4-1600K DDR4-1600L | 200 | 800 | 1600 | PC4-12800 | 12800 | 10-10-1011-11-1112-12-12 | 12.5 13.75 15 |
| DDR4-1866L\*DDR4-1866MDDR4-1866N | 233.33 | 933.33 | 1866.67 | PC4-14900 | 14933.33 | 12-12-1213-13-1314-14-14 | 12.85713.92915 |
| DDR4-2133N\*DDR4-2133PDDR4-2133R | 266.67 | 1066.67 | 2133.33 | PC4-17000 | 17066.67 | 14-14-1415-15-1516-16-16 | 13.12514.06315 |
| DDR4-2400P\*DDR4-2400RDDR4-2400TDDR4-2400U | 300 | 1200 | 2400 | PC4-19200 | 19200 | 15-15-1516-16-1617-17-1718-18-18 | 12.5 13.32 14.16 15 |
| DDR4-2666TDDR4-2666UDDR4-2666VDDR4-2666W | 333.33 | 1333.33 | 2666.67 | PC4-21333 | 21333.33 | 17-17-1718-18-1819-19-1920-20-20 | 12.75 13.50 14.25 15 |
| DDR4-2933VDDR4-2933WDDR4-2933YDDR4-2933AA | 366.67 | 1466.67 | 2933.33 | PC4-23466 | 23466.67 | 19-19-1920-20-2021-21-2122-22-22 | 12.96 13.64 14.32 15 |
| DDR4-3200WDDR4-3200AADDR4-3200AC | 400 | 1600 | 3200 | PC4-25600 | 25600 | 20-20-2022-22-2224-24-24 | 12.5 13.75 15 |

DDR4 Modules