QUOTE(X.E.D @ Nov 29 2007, 08:03 AM)
Large cache would be a means, IMC would be the end. Both would be overlapping each other in improvements, so I'm not betting too much here (except in
real-world 4P+++ server tests, where Opteron Barcelona is competitive since Tigerton is still not the killer we wanted to see, where Bloomfield will kill)
C2D/Qs have already pushed the FSB to hell and back, Nehalem wouldn't have that much cache, so basically it's not there where performance rules. Getting Hyperthreading "2.0" itself done would alread be quite a lot of win- Fishy says it'd be more substansial than the Northwood/Prescott flop. That, and negating off whatever performance hits C2D/C2Q had in 64-bit execution (due to Macro-ops not running?), getting the execution stages to be even wider, etc.
I won't call it leapfrog in say, SuperPi or single threaded performance (I don't even reckon the difference will be more than Athlon X2 -> Core 2 Duo clock for clock percentage wise), but it would be more like a big improvement everywhere else, and much more in threaded core utilization.
Nehalem will still have a large cache, even with the on die mem controller. A large cache and a odmc are not mutually exclusive. The cache is still much faster than accessing memory, especially for working sets that fit completely.
QUOTE(almostthere @ Nov 29 2007, 10:12 AM)
I can't disagree with you on that but one has to consider the fact what with Intel forging ahead with larger cache at L2 instead of going L3, and at the same time developing tech which negates or reduces the latency associated with large cache, it's hard not to be imginative with what may possibly be achievable once implementation of integration is achieved and if it's inline with what the goals of HTT 2.0 are, we may see greater bandwidth being made available altho by right current microproc designs being churned out by Intel aren't that memory hungry (Correct me if I'm wrong, getting forgetful nowadays). And with that, it's possible a substantial if not leapfrogging evolution of micro-p architecture at consumer market level. As for Northwood, I can't personally agree it's a failure since it did serve it's purpose well eventhough it close to it's design limitation. Prescott and subsequently Cedar Mill should be the one's to be considered the real flops as Intel chose to prolong a design which was fast running into a performance-per-watt wall. IINM, Cedar Mill's heat density scaled to the point that at an equivalent one sqaure meter, it generated enough waste heat equivalent to a small power plant (Thanks to ikanayam for pointing out that fact last time).
As for Bloomfield, from what I heard from the grapevine, that seems to be a stop-gap measure although I can't get nor divulge any further details since it's unsubstantiated and/or it's based on the trust as friends
When you have 4 huge cores (8 virtual cores) to feed, you're going to need a lot more bandwidth.