Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

 Intel Penryn 40% faster?

views
     
TSedwin3210
post Apr 18 2007, 12:59 AM, updated 19y ago

lll
*****
Senior Member
808 posts

Joined: Jan 2007
well, base on the main page of lowyat.net. it seems that the 12MB of cache help to elevate the performance. i believe, the 40% increase is mainly due to the additional 2-core, and it is only for multi threaded application. correct me if im wrong.
vailance
post Apr 18 2007, 01:16 AM

wat??
*****
Senior Member
841 posts

Joined: Feb 2005
From: Melaka>KL



it stated 4cores in side the proc.. penryn quad? price sure > than current core2quad..
salimbest83
post Apr 18 2007, 01:36 AM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



more core mean nothing if software not created for multi core..
vailance
post Apr 18 2007, 02:08 AM

wat??
*****
Senior Member
841 posts

Joined: Feb 2005
From: Melaka>KL



yup.. agreed, there still some program still wont ultilize pentium D and core2duo.. new 1 so fast come out, software programmers also need time to learn right lol
Thunderbolt
post Apr 18 2007, 02:12 AM

Tonight We Dine In Penang!
******
Senior Member
1,191 posts

Joined: Jan 2007
From: Penang


The extra boost came from the switch from 60 nanometers to 45nm and by using "high-k metal gate" transistors thumbup.gif

Got nothing to do with cache, extra cores laugh.gif
ikanayam
post Apr 18 2007, 02:25 AM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

40% faster in certain situations. You can bet that the 40% figure is pretty much a best case scenario. Much of it is also due to the higher clockspeeds.
c38y50y70
post Apr 18 2007, 07:12 AM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



40% increment only in those area that it has tweaked compared to C2Q, such as stream FP softwares.
Fyonne
post Apr 18 2007, 09:36 AM

Enthusiast
Group Icon
VIP
904 posts

Joined: Nov 2006
From: Penang (Mainland)
i heard that Penryn provide more than 3ghz stock clock speed, with larger cache (6mb for dual core, 12mb for quad core) as well as 1600 mhz bus speed
Radeon
post Apr 18 2007, 09:50 AM

Semi-Retired Overclocker
*******
Senior Member
2,257 posts

Joined: Jan 2003

QUOTE(vailance @ Apr 18 2007, 01:16 AM)
it stated 4cores in side the proc.. penryn quad? price sure > than current core2quad..
*
current core 2 quad is fake

this one will be the real one, lets see how it will do against out longly waited agena
cks2k2
post Apr 18 2007, 10:14 AM

...
******
Senior Member
1,966 posts

Joined: Jan 2003
From: No longer hanging by a NUS

QUOTE(Thunderbolt @ Apr 18 2007, 02:12 AM)
The extra boost came from the switch from 60 nanometers to 45nm and by using "high-k metal gate" transistors  thumbup.gif

Got nothing to do with cache, extra cores laugh.gif
*
Hi-k affects leakage; it's the minor tweaks to the core that boosts the performance.

QUOTE(Radeon @ Apr 18 2007, 09:50 AM)
current core 2 quad is fake

this one will be the real one, lets see how it will do against out longly waited agena
*
Is there a difference between "fake" and "true" quad?
BTW the correct term is non-monolithic and monolithic.
c38y50y70
post Apr 18 2007, 10:20 AM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



QUOTE(Radeon @ Apr 18 2007, 10:50 AM)
current core 2 quad is fake

this one will be the real one, lets see how it will do against out longly waited agena
*
Penryn is still a non-monolithic quad core (2x Core2 Duo in the same package).
ikanayam
post Apr 18 2007, 01:14 PM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

Actual (according to intel) performance numbers down the page:

http://arstechnica.com/news.ars/post/20070...mance-sse4.html
DSC
post Apr 18 2007, 09:14 PM

Look at all my stars!!
*******
Senior Member
2,240 posts

Joined: Jan 2003
http://www.anandtech.com/cpuchipsets/intel...doc.aspx?i=2972

Anand ran some benchmarks in Beijing, looks like Penryn isn't just die shrink with minor tweaks. Can't wait for the final review of the retail shipping cpu.
c38y50y70
post Apr 18 2007, 09:42 PM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



It is a minor tweaked core compared to Kentsfield. It shines only in FP and stream related applications. The other might have very little performance boost only, especially on applications which do a lot of context switchings and branches. The new power management isn't that good too as compared with Barcelona. However, it should can put up a good fight with Barcelona.
TSedwin3210
post Apr 18 2007, 11:18 PM

lll
*****
Senior Member
808 posts

Joined: Jan 2007
http://arstechnica.com/news.ars/post/20070...mance-sse4.html

what the...f, even penryn got some numbers for us to see. but where is Barcelona numbers?
cks2k2
post Apr 18 2007, 11:40 PM

...
******
Senior Member
1,966 posts

Joined: Jan 2003
From: No longer hanging by a NUS

QUOTE(c38y50y70 @ Apr 18 2007, 09:42 PM)
It is a minor tweaked core compared to Kentsfield. It shines only in FP and stream related applications. The other might have very little performance boost only, especially on applications which do a lot of context switchings and branches. The new power management isn't that good too as compared with Barcelona. However, it should can put up a good fight with Barcelona.
*
I've seen some pretty interesting power management stuff on Nehalem... tongue.gif
c38y50y70
post Apr 18 2007, 11:51 PM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



Yea, that is Nehalem, but not Penryn.
Nehalem took a step further than Barcelona in voltage management, but I can't disclose too much here thou wink.gif
TSedwin3210
post Apr 19 2007, 02:27 AM

lll
*****
Senior Member
808 posts

Joined: Jan 2007
user posted image


more info here.

looks like it will be a serious contender for K10, IF ONLY K10 perform what AMD claim now.

This post has been edited by edwin3210: Apr 19 2007, 02:27 AM
Thunderbolt
post Apr 19 2007, 04:11 AM

Tonight We Dine In Penang!
******
Senior Member
1,191 posts

Joined: Jan 2007
From: Penang


QUOTE(cks2k2 @ Apr 18 2007, 10:14 AM)
Hi-k affects leakage; it's the minor tweaks to the core that boosts the performance.
Is there a difference between "fake" and "true" quad?
BTW the correct term is non-monolithic and monolithic.
*

In fact there is: True Quad doesnt share the cache

ikanayam
post Apr 19 2007, 06:06 AM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

QUOTE(Thunderbolt @ Apr 17 2007, 01:12 PM)
The extra boost came from the switch from 60 nanometers to 45nm and by using "high-k metal gate" transistors  thumbup.gif

Got nothing to do with cache, extra cores laugh.gif
*
Clock for clock? no. That only helps with power/leakage/frequency scaling.


QUOTE(Thunderbolt @ Apr 18 2007, 03:11 PM)
In fact there is: True Quad doesnt share the cache
*
No. Quad core means 4 cores. Does it have 4 cores? Yes it does.

By your definition, the barcelona is not a true quad core either because it has shared cache. lol.

salimbest83
post Apr 23 2007, 02:15 AM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



then... when we will see any benchie fo barcelona....
dun know when to sell intel stock..
Thunderbolt
post Apr 23 2007, 02:58 AM

Tonight We Dine In Penang!
******
Senior Member
1,191 posts

Joined: Jan 2007
From: Penang


QUOTE(ikanayam @ Apr 19 2007, 06:06 AM)
Clock for clock? no. That only helps with power/leakage/frequency scaling.
No. Quad core means 4 cores. Does it have 4 cores? Yes it does.

By your definition, the barcelona is not a true quad core either because it has shared cache. lol.
*

Im not referring to Barcelona laugh.gif
ikanayam
post Apr 23 2007, 09:42 AM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

QUOTE(Thunderbolt @ Apr 22 2007, 01:58 PM)
Im not referring to Barcelona laugh.gif
*
Great. That really helps prove your point. lol.
charge-n-go
post Apr 23 2007, 09:46 AM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(Thunderbolt @ Apr 23 2007, 02:58 AM)
Im not referring to Barcelona laugh.gif
*
Then what are u referring to?
It is either Kentsfield, Penryn or Barcelona only.
Thunderbolt
post Apr 23 2007, 05:08 PM

Tonight We Dine In Penang!
******
Senior Member
1,191 posts

Joined: Jan 2007
From: Penang


QUOTE(ikanayam @ Apr 23 2007, 09:42 AM)
Great. That really helps prove your point. lol.
*

Thanks laugh.gif


Added on April 23, 2007, 5:09 pm
QUOTE(charge-n-go @ Apr 23 2007, 09:46 AM)
Then what are u referring to?
It is either Kentsfield, Penryn or Barcelona only.
*

Penryn of course biggrin.gif


This post has been edited by Thunderbolt: Apr 23 2007, 05:09 PM
charge-n-go
post Apr 23 2007, 11:55 PM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(Thunderbolt @ Apr 19 2007, 04:11 AM)
In fact there is: True Quad doesnt share the cache
*
QUOTE(Thunderbolt @ Apr 23 2007, 05:08 PM)
Penryn of course  biggrin.gif
So u mean, Barcelona doesn't share the cache, and Penryn does? How about Kentsfield, isn't it the same, y Penryn only?
soulfly
post Apr 23 2007, 11:57 PM

revving towards 10,000 rpm
Group Icon
VIP
15,904 posts

Joined: Jan 2003
From: Miri



sharing is caring smile.gif
kapitan
post Apr 24 2007, 12:03 AM

Look at all my stars!!
*******
Senior Member
2,205 posts

Joined: Jan 2003


Stop the argument la. Barcelona share L3 cache too. Thats it.

What we want to know is numbers. Who cares if its real quad or fake quad. What we want is performance.
c38y50y70
post Apr 24 2007, 12:16 AM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



A monolithic quad usually has better performance because the communication and data sharing among all cores are much faster, and the power management can be done more efficiently using a centralized controller.
Thunderbolt
post Apr 24 2007, 04:04 AM

Tonight We Dine In Penang!
******
Senior Member
1,191 posts

Joined: Jan 2007
From: Penang


QUOTE(charge-n-go @ Apr 23 2007, 11:55 PM)
So u mean, Barcelona doesn't share the cache, and Penryn does? How about Kentsfield, isn't it the same, y Penryn only?
*

I think we should stop this biggrin.gif

TSedwin3210
post Apr 24 2007, 04:08 AM

lll
*****
Senior Member
808 posts

Joined: Jan 2007
QUOTE(Thunderbolt @ Apr 24 2007, 04:04 AM)
I think we should stop this  biggrin.gif
*
shuld close this thread
ikanayam
post Apr 24 2007, 04:42 AM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

QUOTE(Thunderbolt @ Apr 23 2007, 03:04 PM)
I think we should stop this  biggrin.gif
*
You mean you should stop talking nonsense.
empire23
post Apr 24 2007, 05:19 AM

Team Island Hopper
Group Icon
Staff
9,417 posts

Joined: Jan 2003
From: Bladin Point, Northern Territory
^ and i thought comedy was dead laugh.gif
charge-n-go
post Apr 24 2007, 09:00 AM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(kapitan @ Apr 24 2007, 12:03 AM)
Who cares if its real quad or fake quad. What we want is performance.
*
There is no 'real quad' and 'fake quad'. Barcelona, Penryn and Kentsfield are all Quad Cores. If seriously wanna define 'Fake Quad', Pentium D EE with 2 physical cores and 2 logical cores is a good example.

QUOTE(Thunderbolt @ Apr 24 2007, 04:04 AM)
I think we should stop this  biggrin.gif
*
You are the one who starts : "In fact there is: True Quad doesnt share the cache". So i was really interested how True Quad can be defined this way.
In fact, your statement said : Penryn / Kentsfield is a 'true' quad because they don't share cache between sites. Barcelona shared their cache among all cores, so it is not a 'true' quad. This is a misleading info dude.
squall_12
post Apr 24 2007, 09:27 AM

Regular
******
Senior Member
1,779 posts

Joined: Jan 2003
dont let the comedy end heheh smile.gif
badguy86
post Apr 24 2007, 10:26 AM

Getting Started
**
Junior Member
292 posts

Joined: Sep 2006
From: Kuching, Sarawak, Malaysia



So, is there actually got 'true quad' and 'fake quad' different? blink.gif
charge-n-go
post Apr 24 2007, 10:35 AM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

there is no 'true' and 'fake' quad. If a CPU has 4 physical cores in the same package, it is a quad core.

Intel uses non-monolithic approach to simplify the design with some sacrifice in performance, while AMD uses monolithic approach to have higher performance but takes longer time to design.
cks2k2
post Apr 24 2007, 11:40 AM

...
******
Senior Member
1,966 posts

Joined: Jan 2003
From: No longer hanging by a NUS

QUOTE(charge-n-go @ Apr 24 2007, 10:35 AM)
there is no 'true' and 'fake' quad. If a CPU has 4 physical cores in the same package, it is a quad core.

Intel uses non-monolithic approach to simplify the design with some sacrifice in performance, while AMD uses monolithic approach to have higher performance but takes longer time to design.
*
I would say design time would be pretty much the same - it's the manufacturing that's the problem.

4-cores in 1 die == larger die size == higher defect potential.
Also larger die size == less dies per wafer == less cost effective.
Binning will be another problem -> you can only sell at the lowest common clock speed.

MCM makes sense until you move to a mature smaller process.
charge-n-go
post Apr 24 2007, 12:26 PM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(cks2k2 @ Apr 24 2007, 11:40 AM)
I would say design time would be pretty much the same - it's the manufacturing that's the problem.

4-cores in 1 die == larger die size == higher defect potential.
Also larger die size == less dies per wafer == less cost effective.
Binning will be another problem -> you can only sell at the lowest common clock speed.

MCM makes sense until you move to a mature smaller process.
*
No, the design time will not be the same. In terms of architecture, you need a redesign the intercore communications and cache if it is a monolithic approach, not to mention about circuit and layout optimization. For Kentsfield, all you need is to pack 2x C2D into a package, where the architecture and circuit are almost identical. Probably the power management logic and coherency/EBL need some changes, that's all. MCM makes sense to save cost and design effort, but sacrificing the performance & efficiency. This is why Intel can debut the 1st quad core much faster than AMD.

This post has been edited by charge-n-go: Apr 24 2007, 01:02 PM
Thunderbolt
post Apr 24 2007, 04:55 PM

Tonight We Dine In Penang!
******
Senior Member
1,191 posts

Joined: Jan 2007
From: Penang


To anyone who got offended by my previous statement about shared cache, i sincere apology for that

It was my mistakes and i understand how people react back
BlackThyra87
post Apr 24 2007, 05:34 PM

Can't be Mad.
******
Senior Member
1,643 posts

Joined: Jan 2003
From: Pulau Indah



QUOTE(c38y50y70 @ Apr 18 2007, 09:42 PM)
It is a minor tweaked core compared to Kentsfield. It shines only in FP and stream related applications. The other might have very little performance boost only, especially on applications which do a lot of context switchings and branches. The new power management isn't that good too as compared with Barcelona. However, it should can put up a good fight with Barcelona.
*
hell no. Penryn is not just a die shrink and minor tweaks but its got a BUNCH of tweaks to begin with. GOOGLE it.

QUOTE(charge-n-go @ Apr 24 2007, 09:00 AM)
There is no 'real quad' and 'fake quad'. Barcelona, Penryn and Kentsfield are all Quad Cores. If seriously wanna define 'Fake Quad', Pentium D EE with 2 physical cores and 2 logical cores is a good example.
You are the one who starts : "In fact there is: True Quad doesnt share the cache". So i was really interested how True Quad can be defined this way.
In fact, your statement said : Penryn / Kentsfield is a 'true' quad because they don't share cache between sites. Barcelona shared their cache among all cores, so it is not a 'true' quad. This is a misleading info dude.
*
TRUE QUAD / NATIVE QUAD core means:

there are 4 individual cores stick together (will look as one) in one processor die, has its own cache on each core. Just like the Barcelona.

what intel has done on their Quad core is that they just stick 2x dual core in one processor die. Of course there are performance inprovement over some applications but the app must be specifically optimized for intel's Quad core paths.

The real quad core is the Barcelona. No doubt about that.

U guys can do a google search/WiKi for more info.

user posted image

see the image there:

Left = AMD's Barcelona.
Right = Intel's C2Q / Clovertown (Intel's 1st quad core processor)

can u see the difference now?

http://www.legitreviews.com/article/426/1/
http://www.xbitlabs.com/news/cpu/display/20060301233527.html
http://www.amd.com/us-en/Corporate/Virtual...~111541,00.html

This post has been edited by BlackThyra87: Apr 24 2007, 05:48 PM
c38y50y70
post Apr 24 2007, 05:50 PM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



QUOTE(BlackThyra87 @ Apr 24 2007, 06:34 PM)
hell no. Penryn is not just a die shrink and minor tweaks but its got a BUNCH of tweaks to begin with. GOOGLE it.
All these are minor tweaks.
1. SSE4 support. It is just a bunch of new instructions to boost FP performance.
2. Using new radix-16 divider - speed up division.
3. Big fat cache - 45nm allows this, there is no big change in cache architecture & protocol.
4. C6 power management state to save more power - only applicable to laptop
5. DAT, all so called turbo mode. It is already available in Merom, but it is not enabled, just like the 64-bit feature in P4. <-- not a tweak at all
6. speed up VMEntry and VMExit, not a big deal, just some minor tweaks too.
7. higher FSB support. This is darn easy to be achieved.

1) and 2) are the 2 key tweaks to improve FP related benchies, while 3) and 7) helps to improve overall performance. 4) and 5) has nothing to do with desktop variant, you can forget it. 6) has nothing to do with you too unless u r using VMWare.

So tell me dude, which are the major tweaks? There is no major architectural change AT ALL. I'll consider the move from Core Duo to Core 2 Duo, or from K8 to K8L as a major tweak.
Jcsy
post Apr 24 2007, 05:51 PM

Frag First! Think Later?
*******
Senior Member
2,782 posts

Joined: Feb 2006
From: BANGSARian
WE ARE STILL BOTTLED NECKED WITH HARD DISK !!

gosh, every 6 months new SPEEDS for CPU / RAM come out

every 6 YEARS, new hard disk speed and capabilities come out

7200RPM.. Zzzzz
BlackThyra87
post Apr 24 2007, 05:56 PM

Can't be Mad.
******
Senior Member
1,643 posts

Joined: Jan 2003
From: Pulau Indah



QUOTE(c38y50y70 @ Apr 24 2007, 05:50 PM)
All these are minor tweaks.
1. SSE4 support. It is just a bunch of new instructions to boost FP performance.
2. Using new radix-16 divider - speed up division.
3. Big fat cache - 45nm allows this, there is no big change in cache architecture & protocol.
4. C6 power management state to save more power - only applicable to laptop
5. DAT, all so called turbo mode. It is already available in Merom, but it is not enabled, just like the 64-bit feature in P4. <-- not a tweak at all
6. speed up VMEntry and VMExit, not a big deal, just some minor tweaks too.
7. higher FSB support. This is darn easy to be achieved.

1) and 2) are the 2 key tweaks to improve FP related benchies, while 3) and 7) helps to improve overall performance. 4) and 5) has nothing to do with desktop variant, you can forget it. 6) has nothing to do with you too unless u r using VMWare.

So tell me dude, which are the major tweaks? There is no major architectural change AT ALL. I'll consider the move from Core Duo to Core 2 Duo, or from K8 to K8L as a major tweak.
*
u got me wrong, dude.

im talking about numbers here, see the word BUNCH in my previous post there? cool2.gif

This post has been edited by BlackThyra87: Apr 24 2007, 05:57 PM
charge-n-go
post Apr 24 2007, 06:05 PM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(BlackThyra87 @ Apr 24 2007, 05:34 PM)
can u see the difference now?
*
As I said, kentsfield and barcelona are real quad cores, as they has 4 physical cores. The 4 processing cores ARE there. In engineering term, it is called the Monolithic vs Non-monolithic approach. The real or fake stuff are just some marketing terms to confuse the public.

QUOTE
TRUE QUAD / NATIVE QUAD core means:
there are 4 individual cores stick together (will look as one) in one processor die, has its own cache on each core. Just like the Barcelona.
The real quad core is the Barcelona. No doubt about that.

What are u trying to say? Kentsfield has its own cache in individual core also. The definition of native quad core doesnt care about the cache at all. The cache system in Barcelona is simply implementation specific but has nothing to do with Native Quad Core design. The word "true" is misleading, but "native" describes the monolithic approach best.

QUOTE
what intel has done on their Quad core is that they just stick 2x dual core in one processor die. Of course there are performance inprovement over some applications but the app must be specifically optimized for intel's Quad core paths.

you are wrong again. intel doesnt stick 2x dual core into one processor die. they are simply packaged together as single CPU unit. the word 'die' means 1 silicon --> the monolithic approach. As long as the applications have multi-threading support, it will have performance increment for sure. No special optimization is needed.

c38y50y70
post Apr 24 2007, 06:11 PM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



QUOTE(BlackThyra87 @ Apr 24 2007, 06:56 PM)
u got me wrong, dude.

im talking about numbers here, see the word BUNCH in my previous post there? cool2.gif
*
I believe somehow u agreed with my statements below:
"All these are minor tweaks.
1. SSE4 support. It is just a bunch of new instructions to boost FP performance.
2. Using new radix-16 divider - speed up division.
3. Big fat cache - 45nm allows this, there is no big change in cache architecture & protocol.
4. C6 power management state to save more power - only applicable to laptop
5. DAT, all so called turbo mode. It is already available in Merom, but it is not enabled, just like the 64-bit feature in P4. <-- not a tweak at all
6. speed up VMEntry and VMExit, not a big deal, just some minor tweaks too.
7. higher FSB support. This is darn easy to be achieved."

----------------------------------------------------------------------------

Let me re-quote myself again:
QUOTE(c38y50y70 @ Apr 18 2007, 10:42 PM)
It is a minor tweaked core compared to Kentsfield. It shines only in FP and stream related applications.point 1), 2)

The other might have very little performance boost only, especially on applications which do a lot of context switchings and branches. point 3), 7)

The new power management isn't that good too as compared with Barcelona. However, it should can put up a good fight with Barcelona. point 4), 5)
*
So when u said :"hell no. Penryn is not just a die shrink and minor tweaks but its got a BUNCH of tweaks to begin with. GOOGLE it."

Dont you mean u r disagreeing my previous statement? And now you are saying i am correct? doh.gif
so why don't you said, BUNCH of minor tweaks instead? aren't BUNCH of minor tweaks still minor tweaks?

This post has been edited by c38y50y70: Apr 24 2007, 06:20 PM
BlackThyra87
post Apr 24 2007, 06:19 PM

Can't be Mad.
******
Senior Member
1,643 posts

Joined: Jan 2003
From: Pulau Indah



QUOTE(charge-n-go @ Apr 24 2007, 06:05 PM)
As I said, kentsfield and barcelona are real quad cores, as they has 4 physical cores. The 4 processing cores ARE there. In engineering term, it is called the Monolithic vs Non-monolithic approach. The real or fake stuff are just some marketing terms to confuse the public.
What are u trying to say? Kentsfield has its own cache in individual core also. The definition of native quad core doesnt care about the cache at all. The cache system in Barcelona is simply implementation specific but has nothing to do with Native Quad Core design. The word "true" is misleading, but "native" describes the monolithic approach best.
you are wrong again. intel doesnt stick 2x dual core into one processor die. they are simply packaged together as single CPU unit. the word 'die' means 1 silicon --> the monolithic approach. As long as the applications have multi-threading support, it will have performance increment for sure. No special optimization is needed.
*
i might be wrong here but ur statement is based on which article?
c38y50y70
post Apr 24 2007, 06:22 PM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



QUOTE(c38y50y70 @ Apr 18 2007, 10:42 PM)
It is a minor tweaked core compared to Kentsfield. It shines only in FP and stream related applications. The other might have very little performance boost only, especially on applications which do a lot of context switchings and branches. The new power management isn't that good too as compared with Barcelona. However, it should can put up a good fight with Barcelona.
*
In fact, the HDD access is very minimal after a program is loaded before. That is why we are having bigger cache and memory size too, in order to compensate high latency from disk access.
s[H]sIkuA
post Apr 24 2007, 06:24 PM

live in the present
*******
Senior Member
2,162 posts

Joined: Sep 2004


QUOTE(BlackThyra87 @ Apr 24 2007, 06:19 PM)
i might be wrong here but ur statement is based on which article?
*
He himself is the article, he is an engineering student btw, hence some technical term used that most people not familiar with , am I right charge-n-go? tongue.gif
empire23
post Apr 24 2007, 06:25 PM

Team Island Hopper
Group Icon
Staff
9,417 posts

Joined: Jan 2003
From: Bladin Point, Northern Territory
QUOTE(BlackThyra87 @ Apr 24 2007, 05:34 PM)
hell no. Penryn is not just a die shrink and minor tweaks but its got a BUNCH of tweaks to begin with. GOOGLE it.
TRUE QUAD / NATIVE QUAD core means:

there are 4 individual cores stick together (will look as one) in one processor die, has its own cache on each core. Just like the Barcelona.

what intel has done on their Quad core is that they just stick 2x dual core in one processor die. Of course there are performance inprovement over some applications but the app must be specifically optimized for intel's Quad core paths.

The real quad core is the Barcelona. No doubt about that.

U guys can do a google search/WiKi for more info.

user posted image

see the image there:

Left = AMD's Barcelona.
Right = Intel's C2Q / Clovertown (Intel's 1st quad core processor)

can u see the difference now?

http://www.legitreviews.com/article/426/1/
http://www.xbitlabs.com/news/cpu/display/20060301233527.html
http://www.amd.com/us-en/Corporate/Virtual...~111541,00.html
*
Wrong, anything that has 4 freaking cores on them monolithic or non monolithic is still quad core and that the definition. Even it's on the same die, even if it's not interconnected, it seriously doesn't mean anything. How many cores just depends of how many are there on the same substrate packaging.

No optimizations needed, threading and scheduling is rather OS related. As for the multithreading of the same program, even AMD suffers from that issue, because the instruction window which derives parallelism or ILP from code is still at the enterance of each core and is discreet from the other. So your assertion is moot.

It is proven that even AMD needs to go back to the NB for Core to Core transfers, so that crossbar seems pretty useless lol. Anyways your newslinks don't matter much, because until we the proc studying community get real benchmarks, we aren't keen on making assumptions. SPEC2006 plz.

Atleast so far Intel's Core2 is still far more unified design than the X2, since it actually shares a cache and prefetch and heck MESI works across both IIRC. You better come up with better arguements dude.


Added on April 24, 2007, 6:26 pm
QUOTE(BlackThyra87 @ Apr 24 2007, 06:19 PM)
i might be wrong here but ur statement is based on which article?
*
The fact he works for Intel?

This post has been edited by empire23: Apr 24 2007, 06:26 PM
BlackThyra87
post Apr 24 2007, 06:31 PM

Can't be Mad.
******
Senior Member
1,643 posts

Joined: Jan 2003
From: Pulau Indah



i dont need arguments lol. since u guys corrected me, now thats fine.

looks like i need to read more about the core to core technologies wink.gif

This post has been edited by BlackThyra87: Apr 24 2007, 06:32 PM
charge-n-go
post Apr 24 2007, 06:36 PM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(s[H)
sIkuA,Apr 24 2007, 06:24 PM]He himself is the article, he is an engineering student btw, hence some technical term used that most people not familiar with , am I right charge-n-go? tongue.gif
*
haha, don't say tat. i m also referring to some internal documentations to answer BlackThyra87

QUOTE(empire23 @ Apr 24 2007, 06:25 PM)
It is proven that even AMD needs to go back to the NB for Core to Core transfers, so that crossbar seems pretty useless lol.

The fact he works for Intel?
*
haha, although i worked in intel, but there are some amd overview classes over here.
You are right, the core to core transfer is done via crossbar. It is just like the D-link or whatever intranet switch we use at home. NB is the centralize controller to determine what data to be transferred via crossbar, and serves all requests from the cores. Well, i think the fact that NB is built-in into K8 doesn't cause as much penalty as C2D / P4 where the NB is external.

from the attached image, K8's NB is called the System Request Interface.
Attached Image

^ oh shooot, this is a single core K8. a dual core K8 has the same structure btw, js adding another core side by side with the 1st core.

Image taken from Mindshare Slides (www.mindshare.com)

This post has been edited by charge-n-go: Apr 24 2007, 06:42 PM
BlackThyra87
post Apr 24 2007, 07:19 PM

Can't be Mad.
******
Senior Member
1,643 posts

Joined: Jan 2003
From: Pulau Indah



looks like we got a processor techie here biggrin.gif

how about intel's Quad core design bro? mind to explain and compare to AMD's K8L/Barcelona?
c38y50y70
post Apr 24 2007, 08:26 PM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



If you don't mind, i can help charge to answer your question smile.gif

For Intel
1. all cores communicated via FSB. The communication is done using MESI protocol.

2. Core2Duo, Core Duo, Pentium D, Pentium 4 use the MCH and FSB for communication, they cant contact each other side by side in terms of ensuring secure data sharing.

3. The shared cache in C2D minimize the communication via FSB/MCH since it is a shared property between 1st and 2nd core. However, if the data is in L1 cache, many FSB transaction will be initiated.

4. Pentium D doesnt have shared cache, hence any data transfer between cores have to use FSB.

5. Same goes to Kentsfield. When 1st and 2nd core needs data from either 3rd or 4th core, it must goes through FSB. However, if 3rd core wants data from 4th core, and if the data is in 2nd L2 cache, very minimum FSB transaction is needed.


For AMD
1. MOESI protocol is pretty much similar to MESI. However, the communication is much faster because data transfer and *snoops* transaction are all done internally via System Request Interface, Crossbar switch and integrated mem controller. So it will be darn fast as compared to FSB.

*note* : snoop is a process where the requesting core is propagating to all the other cores in the computer system, saying:"I want this data/i want to change this piece of info/i want some data, please update yourself accordingly". This is to ensure all cores have the most updated data when one of the cores modified some shared memory area.

If you still don't understand, please PM me.

This post has been edited by c38y50y70: Apr 24 2007, 08:31 PM
ikanayam
post Apr 24 2007, 11:09 PM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

As empire said, it has been proven a while ago that the crossbar in the current x2 processors is NOT used to transfer data between cores. Which is pretty dumb IMO, but let's see if they change/fix it in Barcelona. Probably requires a change in coherency protocol to achieve, which is why they didn't do it.

This post has been edited by ikanayam: Apr 24 2007, 11:10 PM
salimbest83
post Apr 25 2007, 01:26 AM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



with all these talk..

which one do u think much faster or better...

intel Quad core or Barcelona....


TSedwin3210
post Apr 25 2007, 01:53 AM

lll
*****
Senior Member
808 posts

Joined: Jan 2007
QUOTE(salimbest83 @ Apr 25 2007, 01:26 AM)
with all these talk..

which one do u think much faster or better...

intel Quad core or Barcelona....
*
calmez vous maintenant. READ
salimbest83
post Apr 25 2007, 02:08 AM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



QUOTE(edwin3210 @ Apr 25 2007, 01:53 AM)
calmez vous maintenant. READ
*
i think thats if barcelona if compare to clovertown...

and only in SPECint_rate....

but if compare with penryn..coz it will launch this year oso...
how bout desktop,video encoding and gaming performance....

Processors gain from 45nm high-K metal-gate silicon technology
Intel's invention of a manufacturable high-k process technology brings a 2x improvement in transistor density. Each 45nm Penryn dual-core processor has 410 million transistors; quad-core versions will have 820 million. The die size of dual-core processors is 107 square millimeters, 25 percent smaller than current 65nm products and a quarter the size of the average U.S. postage stamp.

The new processors achieve higher performance and greater energy efficiency through Intel's industry-leading 45nm high-K process technology with its hafnium-based high-K plus metal gate transistor design.

Microarchitecture enhanced for performance and energy efficiency
The Penryn processor family extends the technology leadership of the Intel(R) Core(tm) microarchitecture.

The mobile Penryn processor's new Deep Power Down Technology significantly reduces the power consumption of the processor during idle periods, which helps extend battery life in laptops. This advanced power management state is a major advancement over previous generation industry-leading Intel mobile processors.

The mobile Penryn processor has an enhanced version of the Intel(R) Dynamic Acceleration Technology that is available in current Intel(R) Core(tm)2 Duo processors. With this feature, the processor uses the power that is freed up by a core going inactive to boost the performance of a core that is still active. Imagine a shower with two powerful water shower heads: When one shower head is turned off, the water pressure (performance) in the other increases.

Other technical features improve Penryn performance
The Penryn family of products will deliver higher overall clock frequencies within existing power and thermal envelopes to further increase performance. Desktop and server products will introduce speeds greater than 3 GHz.

All members of the Penryn processor family include Intel(R) Streaming SIMD Extensions 4 (SSE4) instructions for speeding up video, photo imaging, and other high-performance software. The combination of these instructions with larger caches and a super shuffle engine and fast divider improves the performance of applications for desktop, mobile, and servers.

Cache is a memory reservoir where frequently accessed data can be stored for more rapid access. Larger and faster cache sizes speed a computer's performance and response time. The size of the Penryn processor's L2 cache is up to 50 percent larger, with a higher degree of associativity, which improves the hit rate and maximizes cache utilization. Dual-core Penryn processors will feature up to a 6MB of L2 cache; quad-core processors will have up to 12MB of L2 cache.

Penryn-based processors include a new, faster divide technique called "Radix 16," which roughly doubles the divider speed over previous generations for computations used in nearly all applications, especially for scientific computing and 3D content creation.

A set of microarchitecture optimizations delivers more instruction executions per clock cycle, which results in more performance and quicker PC responsiveness.

With enhanced Intel(R) Virtualization Technology, Penryn processors speed up virtual machine transition (entry/exit) times by an average of 25 to 75 percent-all through microarchitecture improvements and without virtual machine software changes. Virtualization compartmentalizes a single computer so that it can run separate operating systems and software. By letting a single machine act as many virtual "mini" computers, customers get better leverage of their multi-core processing power, increased efficiency, and lower costs.


This post has been edited by salimbest83: Apr 25 2007, 02:09 AM
empire23
post Apr 25 2007, 02:11 AM

Team Island Hopper
Group Icon
Staff
9,417 posts

Joined: Jan 2003
From: Bladin Point, Northern Territory
QUOTE(salimbest83 @ Apr 25 2007, 01:26 AM)
with all these talk..

which one do u think much faster or better...

intel Quad core or Barcelona....
*
In the arena of all good processors, no one can be certain, for example the Barcelona's FP might be higher, but it might lag behind when it comes to ALU intensive programs. It's up to you to see what the processor offers, and then make a choice.

Anyways SPECint is very general benchmark since it tries to emulate probable loads from general high performance use. Any lead in it would indicate quite alot although other benchmarks should be atleast present for comparison.

This post has been edited by empire23: Apr 25 2007, 02:17 AM
ikanayam
post Apr 25 2007, 05:15 AM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

they're not giving spec base scores either... that would probably be more useful.
salimbest83
post Apr 25 2007, 06:21 AM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



if barcelona really good....AMD sure will show all other benchmark for us...

but its seem barcelona just can beat in certain suite only ....

not like when C2D beat AMD
charge-n-go
post Apr 25 2007, 08:16 AM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(ikanayam @ Apr 24 2007, 11:09 PM)
As empire said, it has been proven a while ago that the crossbar in the current x2 processors is NOT used to transfer data between cores. Which is pretty dumb IMO, but let's see if they change/fix it in Barcelona. Probably requires a change in coherency protocol to achieve, which is why they didn't do it.
*
It does go through the crossbar in trasferring data between cores, unless the course given here is misleading tongue.gif
empire23
post Apr 25 2007, 02:03 PM

Team Island Hopper
Group Icon
Staff
9,417 posts

Joined: Jan 2003
From: Bladin Point, Northern Territory
QUOTE(charge-n-go @ Apr 25 2007, 08:16 AM)
It does go through the crossbar in trasferring data between cores, unless the course given here is misleading tongue.gif
*
Xbitlabs did some testing and they proved it went through the FSB by looking at the latency numbers IIRC.
ikanayam
post Apr 25 2007, 02:13 PM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

Chargey and I had a chat about that, and it's either there was a problem with the testing methods (unlikely), or it really is that way. Seems that it needs either a unified cache at some level or a smarter coherence protocol controller on the xbar in order to make that happen.

The original paper detailing the MOESI protocol (for the Pirahna chip, which is like the father of the A64) used the unified L2 controller to handle intra-core coherence.
charge-n-go
post Apr 25 2007, 03:19 PM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(empire23 @ Apr 25 2007, 02:03 PM)
Xbitlabs did some testing and they proved it went through the FSB by looking at the latency numbers IIRC.
*
Hey dude, still FSB? I think you Intel too much lately laugh.gif

Anyway, even the data is taken from RAM, it still goes through the crossbar in this fashion: RAM -> DRAM controller -> mem controller -> crossbar -> SRI -> cache. MOESI / MESI is so stupid that it wont transfer data from core 0 to core 1 in a more direct way. (when going from 'E' state to 'S' state).

This post has been edited by charge-n-go: Apr 25 2007, 03:19 PM
ikanayam
post Apr 25 2007, 03:30 PM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

^ it's more like the coherency controller wasn't smart enough to take advantage of MOESI than a weakness in the protocol itself. It can be done with MOESI given the right implementation (i.e. how it's done in Pirahna).

Hopefully barcelona takes care of this. It does have a shared L3 after all, which should make things a lot easier if they do it right.

This post has been edited by ikanayam: Apr 25 2007, 03:32 PM
charge-n-go
post Apr 25 2007, 03:52 PM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

QUOTE(ikanayam @ Apr 25 2007, 03:30 PM)
^ it's more like the coherency controller wasn't smart enough to take advantage of MOESI than a weakness in the protocol itself. It can be done with MOESI given the right implementation (i.e. how it's done in Pirahna).

Hopefully barcelona takes care of this. It does have a shared L3 after all, which should make things a lot easier if they do it right.
*
yeah you got the point. The states can actually remain the same, but the implementation could be different.

Hmm.. mind to post up the piranha details in engineering thread? biggrin.gif

ikanayam
post Apr 25 2007, 04:02 PM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

QUOTE(charge-n-go @ Apr 25 2007, 02:52 AM)
yeah you got the point. The states can actually remain the same, but the implementation could be different.

Hmm.. mind to post up the piranha details in engineering thread? biggrin.gif
*
I don't think i can make the paper public, however you know where to find me.... tongue.gif

I have to look for it first. I have it in print, but i have to find the digital copy.
gtoforce
post Apr 30 2007, 03:33 PM

SPAM AND BECOME A SENIOR MEMBER
*******
Senior Member
2,967 posts

Joined: May 2006



intel has both penryn and nehalem

now amd's hoping so much on barcelona which i think is all bit of a rush kan?
the 65nm processors from amd sucks big time
i dunno where they get the idea to spend all those money in worthless rnd
anyways, i've always been a fanboy to amd
if and when barcelona comes out (Q4 2007 right?), i'd say intel would still be the king the next day with their better processors
salimbest83
post May 3 2007, 12:14 PM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



barcelona cannot beat penryn..
like X2900 XTX cannot beat 8800 GTX.......
dailytech
arjuna_mfna
post May 3 2007, 12:45 PM

**Towards Justice World**
******
Senior Member
1,496 posts

Joined: Jan 2006
From: Baling, Kedah



QUOTE(salimbest83 @ May 3 2007, 12:14 PM)
barcelona cannot beat penryn..
like X2900 XTX cannot beat 8800 GTX.......
dailytech
*
can u provide the link...

This post has been edited by arjuna_mfna: May 3 2007, 12:47 PM
raymond5105
post May 3 2007, 08:19 PM

Newbie
*******
Senior Member
5,341 posts

Joined: Jan 2003
QUOTE(arjuna_mfna @ May 3 2007, 12:45 PM)
can u provide the link...
*
http://dailytech.com/ATI+Radeon+HD+2900+XT...article7052.htm

I think he is mentioning about this.

This post has been edited by raymond5105: May 3 2007, 08:19 PM
arjuna_mfna
post May 4 2007, 09:25 AM

**Towards Justice World**
******
Senior Member
1,496 posts

Joined: Jan 2006
From: Baling, Kedah



QUOTE(raymond5105 @ May 3 2007, 08:19 PM)
that old thing.. it oem and run on gddr3, btw retail one will come with gddr4
salimbest83
post May 6 2007, 03:18 PM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



QUOTE(arjuna_mfna @ May 4 2007, 09:25 AM)
that old thing.. it oem and run on gddr3, btw retail one will come with gddr4
*
then X2900 XTX cant beat 8800 ultra
TSedwin3210
post May 7 2007, 10:24 AM

lll
*****
Senior Member
808 posts

Joined: Jan 2007
QUOTE(salimbest83 @ May 6 2007, 03:18 PM)
then X2900 XTX cant beat 8800 ultra
*
why this Penryn thread morph into a graphic vs graphic card thread doh.gif

anyway, it seems that SSE4 can boost performance by up to 50%. i dunno how true is that. but it seems that previous SSE versions doesnt boost performance that much. anyone hav more info on these? notworthy.gif

This post has been edited by edwin3210: May 7 2007, 10:25 AM
ikanayam
post May 7 2007, 10:32 AM

there are no pacts between fish and men
********
Senior Member
10,544 posts

Joined: Jan 2003
From: GMT +8:00

QUOTE(edwin3210 @ May 6 2007, 09:24 PM)
why this Penryn thread morph into a graphic vs graphic card thread  doh.gif

anyway, it seems that SSE4 can boost performance by up to 50%. i dunno how true is that. but it seems that previous SSE versions doesnt boost performance that much. anyone hav more info on these? notworthy.gif
*
The key word is "for certain apps". And yes, previous SSE versions could also boost performance "for certain apps" by that much or even more.
toughnut
post May 7 2007, 10:50 AM

Look at all my stars!!
*******
Senior Member
3,239 posts

Joined: Jun 2005
for SSE, it's more on software optimization. software need to be coded to support it right?
c38y50y70
post May 7 2007, 02:00 PM

Getting Started
**
Validating
140 posts

Joined: Dec 2005
From: R&D Center & Home



Any SSE needs software support.
salimbest83
post May 8 2007, 04:36 PM

♥PMS on certain day♥
*******
Senior Member
8,648 posts

Joined: Feb 2006
From: Jelutong Penang



i thought sse is somekind of shortcut the cpu use to execute the process...like combining the same order..
charge-n-go
post May 8 2007, 06:44 PM

Look at all my stars!!
*******
Senior Member
4,060 posts

Joined: Jan 2003
From: Penang / PJ

SSE are instruction sets, that means you need to use the assembly instruction in order to use the feature supported by SSE.

 

Change to:
| Lo-Fi Version
0.0395sec    0.71    6 queries    GZIP Disabled
Time is now: 24th December 2025 - 10:59 PM