Welcome Guest ( Log In | Register )

Bump Topic Topic Closed RSS Feed

Outline · [ Standard ] · Linear+

 K8L(K10) details :)

views
     
TSgOJDO
post Feb 12 2007, 01:12 AM, updated 19y ago

New Member
*
Junior Member
13 posts

Joined: Feb 2007


user posted image

K8L(K10) Rev. B (approx 300mm^2) LARGE DIE SHOT

Quad-core
- Native quad-core design
- Redesigned and improved crossbar(northbridge)
- Improved power management
- New level of cache added, L3 VICTIM
Power management - DICE(Dynamic Independent Core Engagement)
- Supports separate CPU core and memory controller power planes to allow CPU to lower its power state while the memory controller is running full bore
- Enhanced AMD's PowerNow - allows individual core frequencies to lower while other cores may be running full bore
- Power management state invariant time stamp counter (TSC)
Virtualization improvements
- Nested Paging(NP):
* Guest and Host page tables both exist in memory.(The CPU walks both page tables)
* Nested walk can have up to 24 memory acesses! (Hardware caching accelerates the walk)
* "Wire-to-wire" translations are cached in TLBs
* NP eliminates Hypervisor cycles spent managing shadow pages(As much as 75% Hypervisor time)
- Reduced world-switch time by 25%:
* World-switch time: round-trup to Hypervisor and back
Dedicated L1 cache
- 256bit 128kB (64kB instruction/64kB data), 2-way associative
- 2 x 128bit loads/cycle
- lowest latency
Dedicated L2 cache
- 128bit 512kB, 16-way associative
- 128bit bus to northbridge
- reduced latency
- eliminates conflicts common in shared caches - better for virtualization
Shared L3 cache
- 128bit 2MB
- Victim-cache architecture maximizes efficiency of cache hierarchy
- Fills from L3 leave likely shared lines in the L3
- Sharing-aware replacement policy
- Expandable
Independent DRAM controllers
- Concurrency
- More DRAM banks reduces page conflicts
- Longer burst length improves command efficiency
- Dual channel unbuffered 1066 support(applies to socket AM2+ and s1207+ QFX only)
- Channel Interleaving
Optimized DRAM paging
- Increase page hits
- Decrease page conflicts
Re-architect northbridge for higher bandwidth
- Increase buffer sizes
- Optimize schedulers
- Ready to support future DRAM technologies
Write bursting
- Minimize Rd/Wr Turnaround
DRAM prefetcher
- Track positive and negative, unit and non-unit strides
- Dedicated buffer for prefetched data
- Aggressively fill idle DRAM cycles
Core prefetchers
- DC Prefetcher fills directly to L1 Cache
- IC Prefetcher more flexible
* 2 outstanding requests to any address
HyperTransport 3
- Up to three 16bit cHT links
- Up to 5200MT/s per link
- Un-ganging mode: each 16bit HT link can be divided in two 8bit virutal links
- Can dynamically adjust frequency and bit width to save power
- AC mode (higher latency mode) to allow longer communications distances
- Hot pluggable

K8L(K10) pipeline: user posted image

CPU Core IPC Enhancements:
Advanced branch prediction
- Dedicated 512-entry Indirect Predictor
- Double return stacksize
- More branch history bits and improved branch hashing
History-based pattern predictor
32B instruction fetch
- Benefits integer code too
- Reduced split-fetch instruction cases
Sideband Stack Optimizer
- Perform stack adjustments for PUSH/POP operations "on the side"
- Stack adjustments don't occupy functional unit bandwidth
- Breaks serial dependence chains for consecutive PUSH/POPs
Out-of-order load execution
- New technology allows load instructions to bypass:
* Other loads
* Other stores which are known not to alias with the load
- Significantly mitigates L2 cache latency
TLB Optimisations
- Support for 1G pages
- 48bit physical address (256TB)
- Larger TLBs key for:
* Virtualized workloads
* Large-footprint databases and
* transaction processing
- DTLB:
* Fully-associative 48-way TLB (4K, 2M, 1G)
* Backed by L2 TLBs: 512 x 4K, 128 x 2M
- ITLB:
* 16 x 2M entries
Data-dependent divide latency
Additional fastpath instructions
- CALL and RET-Imm instructions
- Data movement between FP & INT
Bit Manipulation extensions
- LZCNT/POPCNT
SSE extensions
- EXTRQ/INSERTQ (SSE4A)
- MOVNTSD/MOVNTSS (SSE4A)
- MWAIT/MONITOR (SSE3)
Comprehensive Upgrades for SSE
- Dual 128-bit SSE dataflow
- Up to 4 dual precision FP OPS/cycle
- Dual 128-bit loads per cycle
- New vector code, SSE128
- Can perform SSE MOVs in the FP "store" pipe
- Execute two generic SSE ops + SSE MOV each cycle (+ two 128-bit SSE loads)
- FP Scheduler can hold 36 Dedicated x 128-bit ops
- SSE Unaligned Load-Execute mode:
* Remove alignment requirements for SSE ld-op instructions
* Eliminate awkward pairs of separate load and compute instructions
* To improve instruction packing and decoding efficiency

This post has been edited by gOJDO: Feb 12 2007, 01:17 AM
TSgOJDO
post Feb 12 2007, 01:54 AM

New Member
*
Junior Member
13 posts

Joined: Feb 2007


@ikanayam
Thank you. I made the list and most(95%) of the info is from AMD presentations(most of it from Ben Sander's presentation 10/10/2006 AMD FPF).

TSgOJDO
post Feb 12 2007, 02:13 AM

New Member
*
Junior Member
13 posts

Joined: Feb 2007


You can't find the whole presentation online, but here are the slides I have:
user posted image
user posted image
user posted image
user posted image
user posted image
user posted image
user posted image
user posted image
user posted image
user posted image

TSgOJDO
post Feb 12 2007, 02:14 AM

New Member
*
Junior Member
13 posts

Joined: Feb 2007


user posted image
user posted image
user posted image
TSgOJDO
post Feb 12 2007, 06:19 PM

New Member
*
Junior Member
13 posts

Joined: Feb 2007


Barcelona is for DP/MP servers, Brisbane is for desktop, and the K8L variants for desktop/workstation wouldn't be available this year. The sAM2 variants, code-name Agena will be clocked 2.4GHz and 2.5GHz. The dualcore desktop, Kuma will be clocked at 2.1GHz to 2.9GHz. There will be a QuadFX varaint(2 quadcores), known as AgenaFX(quadcore), with frequency between 2.7 and 2.9GHz.
So, K8L CPUs will be clocked higher than Brisbane, which at that time will be used for the next-generation Sempron successor, renamed as Rana.

Yorkfield 3.5GHz vs Agena 2.5GHz, will be a clear win for Yorkfield in every application known to mankind.

@edwin Yes, I am the same guy from THG.

Topic ClosedOptions
 

Change to:
| Lo-Fi Version
0.0141sec    0.90    6 queries    GZIP Disabled
Time is now: 21st December 2025 - 02:27 AM