http://www.ps3grill.com/
Don't take this seriously. Laugh about it.
Next Gen Console: PS3 vs XBOX 360 vs. Wii, Next Gen speculation discussion
|
|
May 24 2005, 10:24 AM
Return to original view | Post
#1
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
|
|
|
|
|
|
May 25 2005, 12:48 PM
Return to original view | Post
#2
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(Matrix @ May 25 2005, 11:42 AM) IGN is really a truckload of shit. That looks more like a marketing campaign written by some 10-years old moron at MS marketing department. They keep blabbering the 256GB/sec, which is the bandwitdth between the EDRAM and/or between the main GPU and the daugther GPU(the XBOX360 GPU is split in two). Anyone who has half a brain can see through the bullshit. Really, they needen't to resort to such low tactics...and i guess IGN was paid handsomely by MS. I don't see what the fuss is about. That was a very balance analysis based on speculative data. Based on the speculative information, the XBox 360 does seems to be a more powerful beast compared to the PS3 or the Revolution. There's a much fairer and realistic (and neutral) comparison of the specs at Gamespot. At least they don't take sides with no real information. I don't own a XBox and am myself a huge fan of Nintendo. But I can't deny the blatant fact slapped right at my face. But again this is speculative, so please hold your horses with titanium reins. |
|
|
May 25 2005, 01:06 PM
Return to original view | Post
#3
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(Matrix @ May 25 2005, 12:54 PM) Yes, it's all speculation. But, there was NOTHING balance about the analysis. In fact, it wasn't any analysis. If it's from anandtech or tomshardware or ars-technica, then it's an analysis. This is just pure marketing crap. It's an analysis. Look at the charts, those are what you would find in an analysis.And how can you even SPECULATE that XBOX360 is more powerful than Revolution when there is ZERO info of the Revolution hardware specs available????? Regarding the speculation of the Revolution spec, Nintendo did say that it's 2 or 3 times more powerful than GC. Again, the analysis this is based on speculative data. Don't grill the presenter (IGN) if the data is inaccurate, grill Microsoft. |
|
|
May 25 2005, 01:10 PM
Return to original view | Post
#4
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
|
|
|
May 25 2005, 01:24 PM
Return to original view | Post
#5
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(ikanayam @ May 25 2005, 01:19 PM) Well i think the purpose of this thread is not to run on hype I couldn't agree more. It really gets me when people gets offended with certain analysis skewed towards a particular console. They're tonnes of it out there, one just need to take an unbiased stance and go through these articles objectively.So far the details have been coming out slowly, there will be more revelations over these next few months or so. |
|
|
May 25 2005, 02:29 PM
Return to original view | Post
#6
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
I seriously doubt Sony's approach of the PS3 core. 7 DSP (SPE) without L1 cache is unthinkable.
DSP are good at crunching data at high MIPS, in this case floating numbers. I would think that it would be prudent to at least have a data cache (D-cache). Without cache the bottleneck would likely be the access latency to the DRAM (whichever type it is). It just make more sense to have cache. Well, we'll just have to wait and see how it goes. EDIT: And no DMA too!! This post has been edited by ray_: May 25 2005, 02:33 PM |
|
|
|
|
|
May 26 2005, 09:51 AM
Return to original view | Post
#7
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(ikanayam @ May 26 2005, 09:00 AM) I would have quoted from the same place. I don't see why you need to do this. You should discuss about the topic at hand knowing full well that the source is valid/invalid instead of having someone quote something that you've already read somewhere. You could have provided the link instead. Shame on you.Please don't patronize. EDIT: By the way, nothing is carved in stone currently. The thread comment reads speculative discussion, this is what it is, speculative. So please stop the "your source is wrong and my source is correct" argument unless you work for MS, Sony or PS3 next gen. console or have irrefutable prove that your source is correct. Or just leave it at that, speculative and enjoy the rebuttal. This post has been edited by ray_: May 26 2005, 10:14 AM |
|
|
May 26 2005, 10:46 AM
Return to original view | Post
#8
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(silkworm @ May 26 2005, 08:59 AM) Straight from the horse's mouth:IBM Cell project page Thanks for the link. Furthermore, each SPE has 256K of "Local Storage", which is effectively a Von Neumann style L1 cache. I'm not sure you could qualify the "Local Storage" as L1 cache yet. I would think that L1 cache access should be synchronized at core's frequency. We'll need to see if "Local Storage" is running at core frequency or other slower bus frequencies. But you got one thing right. The SPE does indeed have DMA. There are no description however on whether the SMFs are DMAs. It would be nice to know if they have a dedicated DMA channel for each SPU or virtual DMA channels supported by a few physical DMAs. |
|
|
May 26 2005, 12:29 PM
Return to original view | Post
#9
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(ikanayam @ May 26 2005, 11:57 AM) the local storage is on die, therefore it is almost certain that it is running at core frequency. Especially since the SPEs are so data hungry, it would be practically a design flaw if they didn't have such fast cache. And i don't think we have seen any recent CPU with a L1/L2 cache running slower than core freq. You'll need to think SoC (System-On-Chip). With SoC, every friggin' peripherals are now on a die. But I do agree with your argument as I've made similar comment earlier, I don't see how this would work without L1 cache. But again I stress, until we see some clock distribution diagram, this is all speculative.By the way, L2 cache does not necessarily run at core frequency. There are L2 cache that runs at lower frequencies such as that of the FSB. But L1 cache always runs at core frequency. EDIT: I've read that you said recently. So yes, most recent L2 cache runs at core speed. But the question is, is the L2 cache in PS3 doing the same? This post has been edited by ray_: May 26 2005, 12:37 PM |
|
|
May 26 2005, 01:11 PM
Return to original view | Post
#10
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(ikanayam @ May 26 2005, 12:52 PM) I am 99% certain that it runs at the same speed as the core. There really is no reason for it not to. I don't think it's even a big design challenge anymore to get cache memory cells to work at those frequencies. I'm an ignorant fool. Exactly what frequency is the PS3 core capable of? |
|
|
May 26 2005, 05:26 PM
Return to original view | Post
#11
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(silkworm @ May 26 2005, 04:56 PM) Actually, after reading the arstechnica article on the X360 CPU, I was reminded of something. The targetted application of the SPUs is mainly processing of streaming data, which is inherently cache-unfriendly. Cache in the memory hierarchy alleviates the memory-to-cpu bottleneck by leveraging temporal and spatial locality. 3D scene data and video decoding have neither locality. However, code that acts on this data has high locality. Hence, the SPE local store looks like it is more useful as I-cache + deep buffer for streaming data. Ah...I've forgotten my facts. Most recent DSP iteration has i-cache not d-cache. I apologize for my error.This sort of lines up with the "DSP-like" tag the SPEs have been given. Real DSPs also don't feature data caches, they're operated in more of a "fire and forget" way. Which makes sense, because unless you're playing a broken record, or pointing a camera at a painting, there's no way that the data into the DSP is going to repeat itself. One noteworthy point though. It's almost certain that the DMA of the SPE would be utilized for isochronous transfer (streaming data), thus off-loading the core. The DMA would most certainly have a reduced pipeline, removing redundant stages normally required in a CPU pipeline. There wouldn't be any need to fetch/execute any instructions at all. |
|
|
May 30 2005, 12:29 AM
Return to original view | Post
#12
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
Hate to reopen the can of worm marked expired.
But just to share my thoughts of the likelihood of "local storage" being cache. First, I do not think that IBM would called cache as local storage. I think local storage is more of a SPE internal RAM. An internal RAM would typically be refered to as a local storage. Secondly, Having so many SPU units (8 in total based on the diagram), it would mean 256Kx8 L1 cache. A 2MB cache would bump up the price of the PS3 to preposterous level. But I'm not claiming that the SPE has no L1 cache. On the contrary, I think that we might be looking at a very top level block diagram ( a macro view is you like ) and that the cache (if ever there be one) is embedded into one of this block (probably the SXU). Again, I stress, this is all speculative. Constructive feedback welcomed. At the time of writing I've also got this from gamespot: 1 Core, 7 x SPE 3.2GHz (256KB SRAM per SPE), 7 x 128b 128 SIMD GPRs http://hardware.gamespot.com/Sony-PlayStation-3-15015-S-4-4 SRAM. If this is true, we could safely assert that the local storage is certainly not L1 cache. It could still be L2 cache, but I think it's most likely just that, internal RAM Also spot this interesting bit after the above edit. On the L2 Cache row has this bit of info: 512KB L2 cache, 256KB per SPE Yes, it's 1am and I'm still writing this shite. This post has been edited by ray_: May 30 2005, 01:09 AM |
|
|
May 30 2005, 01:37 AM
Return to original view | Post
#13
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(ikanayam @ May 30 2005, 01:32 AM) Well here's what i think. It's a cache of some sort, call it whatever you want. The purpose of this cache is to provide high bandwith to the SPE units because they are streaming data processors and the initial latency (probably higher than that of a typical L1/L2 cache) will not matter that much because the SPE is a streaming data processor. Look under the definition of "cache", it does not have any specific details about how it should be implemented. Cache and local memory are loosely interchangeable terms. It's too late to write a rebuttal. But do expect one in the morning. I believe adding more cache to a CPU is a lot cheaper than adding more logic. Cache is typically densely packed and cheap to manufacture. A 2MB cache on the P4 6xx series does not bump the price up to preposterous levels at all. |
|
|
|
|
|
May 30 2005, 10:18 AM
Return to original view | Post
#14
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(ikanayam @ May 30 2005, 01:32 AM) Well here's what i think. It's a cache of some sort, call it whatever you want. The purpose of this cache is to provide high bandwith to the SPE units because they are streaming data processors and the initial latency (probably higher than that of a typical L1/L2 cache) will not matter that much because the SPE is a streaming data processor. Look under the definition of "cache", it does not have any specific details about how it should be implemented. Cache and local memory are loosely interchangeable terms. First, let me get to some points:I believe adding more cache to a CPU is a lot cheaper than adding more logic. Cache is typically densely packed and cheap to manufacture. A 2MB cache on the P4 6xx series does not bump the price up to preposterous levels at all. 1) I've suspected that the local storage is indeed not L1 cache. If the tech. spec. at gamespot is anything to go by, thenI would say that my suspicion is correct. The local storage is infact a L2 cache according to gamespot. 2) I've suspected that the local storage is in fact internal memory. Again, if gamespot is correct, then I would be wrong. Cell's "Local storage" isn't internal RAM. Although I agree that the line has blurred somewhat. I still think that there's a fundamental difference between RAM (memory) and cache. Cache has additional logic to invalidate, flush and enable/disable it's cells. Also the hybrid set-associative cache is only addressable by the CPU and "associates" each memory region of the cache with an equivalent region of the external RAM. Memory works differently to cache. I do not think that cache is cheap. Cache, especially fast L1/L2 cache running at core frequency, are really expensive to make, but it does gets cheaper as the technology improves. That is why there is so very little of cache RAM available on all off-the-shelf processors. If you goggle "cache" and click through the result, it would almost always be tagged with the word "expensive". Although it might be densely packed, cache would still take a large portion of the die. The Cell has a 512K L2 cache. Look at how much space it requires on the die layout. |
|
|
May 30 2005, 10:44 AM
Return to original view | Post
#15
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(silkworm @ May 30 2005, 10:27 AM) First, let's share what google brought me this morning: microprocessor report article Again, thanks for the wonderful link. In that report, it is written that SPEs do not have cache, and we can infer that the LS is indeed not an L1 cache. But why do you have this fixation on L1 anyway? The LS is SRAM, it sits right next to the SPE's logic and execution units. With that kind of placement it probably runs at the same clock frequency (or perhaps matched with the pipeline latency for read/writes to save power?) That's probably all that the LS have in common with cache. I've also mentioned earlier that cache is only effective when there is locality to be exploited. When there is no locality, cache misses are so expensive that it's better not to have them there in the first place. Cache adds an element of unpredictability to programming; Cache misses are inevitable and the programmer has no direct control over when they happen. This sort of unpredictability is a no-no in realtime applications. Graphics rendering in games can be classified as realtime, because they have a 1/60 second deadline for all their processing. The LS of each SPE can be mapped into system memory. Which means that LS is visible to the programmer. A cache is invisible to the programmer, it has no implicit memory address range. After all this discussion on the SPE, we seem to have left out the PPE which is a traditional 64-bit processor, and has its own L1 data and instruction cache. My fixation on the L1 cache was due to your claim that "Local storage" was infact a L1 cache. Which has now been proven to be incorrect. I think you should give me that much. I share some of your argument on my most recent post (eg. the fact that cache is not directly addressable by the user). Please read em' and see if you'd agree. |
|
|
May 30 2005, 11:53 AM
Return to original view | Post
#16
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
Reading the document, it looks like the SPE has no cache at all. And that the local storage is rather quite simplistic in design. And also that Gamespot is inaccurate in labeling local storage as L2 cache.
I think here's how it would work: - Prior to running a specific process on the SPE, all instructions and data required for the process would need to be copied to the SPE local storage via the PPE MMU. This should be done either autonomously by a robust compiler (supplied by the PS3 dev. kit) or by the programmer (gasp...). There are no mentions of the speed at which the SRAM is running. But since SRAM are traditionally asynchronous, there would be wait cycles required for read and write accesses. There are also no mentions on whether the SPE DMA could directly address external memory space or I/O address space to transfer instruction or data from external memory or I/O memory mapped peripherals to the local store. This post has been edited by ray_: May 30 2005, 11:58 AM |
|
|
Jun 1 2005, 02:02 PM
Return to original view | Post
#17
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(silkworm @ May 30 2005, 11:36 AM) So, can we safely agree that the Local Store memory of 256K on each SPE is neither L1 nor L2 cache, and doesn't need to be? I do agree that cache miss is expensive. But it must be said that the SPE would still be benefiting from some form of cache implemented sensibly.We all agreed that there's no need for a D-cache. But adding an I-cache could enhance the throughput of the SPE based upon these assumption: 1) I've never implemented or seen a graphic processing algorithm before, but I would assume that it would need to process acquired buffered real-time data (either acquired through the CPU or DMA). Since the SPE is not designed to have a resident RTOS, instead of having multiple tasks waiting on mutexes/semaphores synchronized to the acquisition of these data, it would most probably be synchronized to an IRQ. The acquired real-time data should be synchronous, thus the IRQs should be fired periodically and at deterministic intervals. With a 256K of internal storage, there should be minimal cache miss (cache fill) if the cache is large enough (16k?) and has a suitable block and set distribution, and the executable image has a small enough footprint. 2) Unlike DSP required for network protocol processing, which branches frequently (to service hardware interrupt, software interrupt or task switch), the access pattern of a graphic processing application should be more predictable. Thus could be sensibly distributed and optimized to minimized cache miss. The CPM module of the PowerPC would be used for protocol handling. I've made I-cache evaluation on certain DSP. And I must say, even with large executable image and unpredictable access pattern, we could still register some throughput improvement compared to accessing plain old RAM (although it must be said that as the size increases and pattern becomes unpredictable, the access time grows drastically). Now, compare running code at RAM speed most of the time and running code at core speed sometime and RAM speed at others, it's not hard to see which is better. My 2 cents. |
|
|
Jun 1 2005, 11:26 PM
Return to original view | Post
#18
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(silkworm @ Jun 1 2005, 06:25 PM) First off, some more nice nice PDF files. And another one.. These are the papers that were being analysed seperately by Real World Technologies and ArsTechnica. Both authors were probably also privy to other material, having attended the ISSCC talks in person. The arstechnica article, in particular, covers what we had been debating about previously, namely the role of the LS in replacing L1. Sorry but could only skim through the ars-technica article. I could only see that this could be a non-issue if the 256K LS is synchronized to core frequency but has none of the typical cache function, as stated in the article, such as tag and cache coherency/snooping logic (i.e. having the speed of a L1 cache but not the redundancies associated with it). Can we assume that much?I was going to write a rebuttal to your points, but I've had a long day at work and I badly need a shower. The documents should keep you occupied for the rest of the time though. I'll get round to rebutting tomorrow, if still necessary. EDIT: Nevermind. Got my answer. No assumption required. The 256K is indeed synchronized at core frequency. Now, that's one %@^&#^ SRAM. EDIT2: There's a fundamental flaw in my argument. Nobody caches internal RAM. Now let me wallow in my pool of ignorance. (* splash..splash) This post has been edited by ray_: Jun 2 2005, 04:06 PM |
|
|
Jun 3 2005, 12:50 PM
Return to original view | Post
#19
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
QUOTE(H@H@ @ Jun 3 2005, 11:22 AM) Ars Technica has another in-depth article on the Xbox 360, this time focusing on the CPU. Thanks to Evil Avatar for the link. Great read and good post. Now for the customary bait to lure massive tech. flames: Comments on my part: 1) If I'm not mistaken, branch prediction would never work for IRQ. Thus with PPE's deep 21 stages pipeline, there should be much pipeline stall (pipeline flush) if the PPE were to run interrupt prone tasks (eg. protocol related, TDMA...etc.). The XBox would need to handle a couple of network related processing such as WIFI and Ethernet, presumably using dedicated hardwares on the CPM module. 2) Another one of those single-threaded task would be this: while(1) { printf("Hello 4TW\n"); } But I do not think noobs would be programming for large gaming companies. Anyone know if the MPx or the 60x bus is used as inter-processor bus? And if there's an internal RAM on each PPE like the DPRAM in previous PowerPC? |
|
|
Jun 3 2005, 01:02 PM
Return to original view | Post
#20
|
|
Elite
169 posts Joined: Mar 2005 From: Wallowing in my Pool of Ignorance (splat..splat..) |
|
| Change to: | 0.0180sec
0.27
7 queries
GZIP Disabled
Time is now: 9th December 2025 - 01:06 AM |