Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

 Next Gen Console: PS3 vs XBOX 360 vs. Wii, Next Gen speculation discussion

views
     
silkworm
post Mar 12 2005, 08:19 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


I wouldn't get my hopes up about going to E3. The show isn't open to the general public. sad.gif
QUOTE(from www.e3expo.com official website of e3 2005)
Frequently Asked Questions

Is the show open to the public?

No, E3 is not open to the public. E3 is a trade event and only professionals from the industry will be allowed to attend. Individuals who are not able to document their direct and current professional affiliation to the interactive entertainment industry are not qualified to attend E3. All E3 attendees are required to show government-issued photo I.D. (such as a driver's license or passport) upon request.
silkworm
post May 17 2005, 11:14 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(PrivateJohn @ May 17 2005, 09:35 AM)
....and ps3 turn to be bulkiness...and a ninja weapon as controller.
*
bulky? Looking at the photos at the PS3 thread, we can roughly judge the dimensions of the console:
width is roughly twice of the DVD/CD/BD slot. To fit a 12cm diameter disc, the slot is usually wider, let's say about 13cm. So PS3 width ~ 2 x 13cm = 26cm.

height/thickess is roughly three and a half times of the IEC power connector. The power connector is roughly 2cm high. So PS3 thickness is approximately 3.5 x 2cm = 7cm. Give or take 1cm if I'm cock-eyed.

depth is about 1.5 the diameter of a DVD, so about 18cm.

somebody should shoot the designer of that awful controller though... at least I hope third parties like logitech will adopt a more DS-like shape.
silkworm
post May 26 2005, 08:59 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ May 25 2005, 02:29 PM)
I seriously doubt Sony's approach of the PS3 core. 7 DSP (SPE) without L1 cache is unthinkable.

DSP are good at crunching data at high MIPS, in this case floating numbers. I would think that it would be prudent to at least have a data cache (D-cache). Without cache the bottleneck would likely be the access latency to the DRAM (whichever type it is). It just make more sense to have cache.

Well, we'll just have to wait and see how it goes.

EDIT: And no DMA too!!
*
Straight from the horse's mouth:IBM Cell project page
QUOTE
Memory access is performed via a DMA-based interface using copy-in/copy-out semantics, and data transfers can be initiated by either the Power processor or an SPU. The DMA-based interface uses the Power page protection model, giving a consistent interface to the system storage map for all processor structures despite its heterogeneous instruction set architecture structure. A high-performance on-chip bus connects the SPU and Power computing elements.


Furthermore, each SPE has 256K of "Local Storage", which is effectively a Von Neumann style L1 cache.
silkworm
post May 26 2005, 04:56 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ May 26 2005, 10:46 AM)
Thanks for the link.

I'm not sure you could qualify the "Local Storage" as L1 cache yet. I would think that L1 cache access should be synchronized at core's frequency. We'll need to see if "Local Storage" is running at core frequency or other slower bus frequencies.

But you got one thing right. The SPE does indeed have DMA. There are no description however on whether the SMFs are DMAs. It would be nice to know if they have a dedicated DMA channel for each SPU or virtual DMA channels supported by a few physical DMAs.
*

Actually, after reading the arstechnica article on the X360 CPU, I was reminded of something. The targetted application of the SPUs is mainly processing of streaming data, which is inherently cache-unfriendly. Cache in the memory hierarchy alleviates the memory-to-cpu bottleneck by leveraging temporal and spatial locality. 3D scene data and video decoding have neither locality. However, code that acts on this data has high locality. Hence, the SPE local store looks like it is more useful as I-cache + deep buffer for streaming data.

This sort of lines up with the "DSP-like" tag the SPEs have been given. Real DSPs also don't feature much in the way of data caches, since they're operated in more of a "fire and forget" way. Which makes sense, because unless you're playing a broken record, or pointing a camera at a painting, there's no way that the data into the DSP is going to repeat itself.

This post has been edited by silkworm: May 26 2005, 05:09 PM
silkworm
post May 30 2005, 10:27 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ May 30 2005, 12:29 AM)
Hate to reopen the can of worm marked expired.

But just to share my thoughts of the likelihood of "local storage" being cache. First, I do not think that IBM would called cache as local storage. I think local storage is more of a SPE internal RAM. An internal RAM would typically be refered to as a local storage.

Secondly, Having so many SPU units (8 in total based on the diagram), it would mean 256Kx8 L1 cache. A 2MB cache would bump up the price of the PS3 to preposterous level.

But I'm not claiming that the SPE has no L1 cache. On the contrary, I think that we might be looking at a very top level block diagram ( a macro view is you like ) and that the cache (if ever there be one) is embedded into one of this block (probably the SXU).

Again, I stress, this is all speculative. Constructive feedback welcomed.

At the time of writing I've also got this from gamespot:

1 Core, 7 x SPE 3.2GHz (256KB SRAM per SPE), 7 x 128b 128 SIMD GPRs
http://hardware.gamespot.com/Sony-PlayStation-3-15015-S-4-4

SRAM. If this is true, we could safely assert that the local storage is certainly not L1 cache. It could still be L2 cache, but I think it's most likely just that, internal RAM


Also spot this interesting bit after the above edit. On the L2 Cache row has this bit of info:
512KB L2 cache, 256KB per SPE

Yes, it's 1am and I'm still writing this shite.

*
First, let's share what google brought me this morning: microprocessor report article

In that report, it is written that SPEs do not have cache, and we can infer that the LS is indeed not an L1 cache. But why do you have this fixation on L1 anyway? wink.gif

The LS is SRAM, it sits right next to the SPE's logic and execution units. With that kind of placement it probably runs at the same clock frequency (or perhaps matched with the pipeline latency for read/writes to save power?) That's probably all that the LS have in common with cache.

I've also mentioned earlier that cache is only effective when there is locality to be exploited. When there is no locality, cache misses are so expensive that it's better not to have them there in the first place. Cache adds an element of unpredictability to programming; Cache misses are inevitable and the programmer has no direct control over when they happen. This sort of unpredictability is a no-no in realtime applications. Graphics rendering in games can be classified as realtime, because they have a 1/60 second deadline for all their processing.

The LS of each SPE can be mapped into system memory. Which means that LS is visible to the programmer. A cache is invisible to the programmer, it has no implicit memory address range.

After all this discussion on the SPE, we seem to have left out the PPE which is a traditional 64-bit processor, and has its own L1 data and instruction cache.
silkworm
post May 30 2005, 11:36 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ May 30 2005, 10:44 AM)
Again, thanks for the wonderful link.  thumbup.gif

My fixation on the L1 cache was due to your claim that "Local storage" was infact a L1 cache. Which has now been proven to be incorrect. I think you should give me that much.  smile.gif

I share some of your argument on my most recent post (eg. the fact that cache is not directly addressable by the user). Please read em' and see if you'd agree.
*
For context:
QUOTE(me)
Furthermore, each SPE has 256K of "Local Storage", which is effectively a Von Neumann style L1 cache.

Fed my own medicine, gah! sweat.gif In hindsight that was a rather naive conclusion based on the observation that the LS consists of SRAM (as are all low-level caches), is geographically close to the SPU and (maybe) runs at core frequency. If I wanted to cover my backside, I could retort that by using the word "effectively", I wasn't too convinced of it being L1 cache even back then, but that would make me look small, so I won't. laugh.gif

So, can we safely agree that the Local Store memory of 256K on each SPE is neither L1 nor L2 cache, and doesn't need to be?

WRT to cache implementation and cost:

The fastest caches are implemented in SRAM, which iirc, take 6 transistors to implement a single cell. They are certainly not as dense as DRAM. A common convention for calculating gate counts is to take the number of transistors to implement a NAND gate, in CMOS processes, that is 4 transistors. That gives us a 3:2 transistor ratio, ie for every 2 SRAM cells we could have had 3 logic cells. Let's see you guys bake your noodles on that tid-bit. smile.gif

I'd like to add that n-way set associativity isn't the only way to make a cache. Direct mapping does not require the sort of logic complexity that we see in the L2 in the die photos. Sure, nobody wants to use direct mapped caches anymore today, but computer architecture trends tend to make a full circle every couple of decades. We're seeing stack machines and VMs making a re-appearance lately even though they were supposed to have died off in the 70s and 80s. That said, we should not write off any technique even though it looks sub-obtimal based on today's performance metrics/requirements.

silkworm
post Jun 1 2005, 06:25 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


First off, some more nice nice PDF files. And another one.. These are the papers that were being analysed seperately by Real World Technologies and ArsTechnica. Both authors were probably also privy to other material, having attended the ISSCC talks in person. The arstechnica article, in particular, covers what we had been debating about previously, namely the role of the LS in replacing L1.

I was going to write a rebuttal to your points, but I've had a long day at work and I badly need a shower. The documents should keep you occupied for the rest of the time though. I'll get round to rebutting tomorrow, if still necessary.

silkworm
post Jun 3 2005, 02:46 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ Jun 3 2005, 01:59 PM)
Top of the list are some of the architecture that uses internal RAM:
1) OMAP
2) MCore
3) 68k
4) variants of PowerPCs
*
Sorry, those are pretty bad examples. Looks to me like you're mixing up microcontrollers, which are actually systems-on-a-chip.

OMAP
Uses ARM7/9/11 cores. ARMs are traditionally cacheless.

MCORE and MC68HCxx
Again, the RAM is not internal to the actual processing core itself. The 68HCxx are 8-bit (`05, `08) or 16-bit (`11, `12) parts.

Same goes with PowerPC, the PPC4xx cores were licensed out by IBM to other SoC houses, including Freescale, for use in embedded systems.

There are several pretty obvious differences between these chips and the PPE that's being discussed. Most of the time these ICs are self-sufficient. Program code is stored on, and run directly from ROM/EEPROM. Unless they are performing exceptionally data hungry tasks, the internal SRAM is all the memory they will ever use. In such cases, cache is largely unnecessary.

Embedded processors have not breached the 1GHz operating frequency barrier just yet. The PPE core may be integrated into a microcontroller, someday (if ever the need arises for a >GHz mcu, [shudder])

-Edit- Rather than to leave it hanging just like that, a conclusion should be in order:
The PPE is a processor core. Microcontrollers like the examples given are built around cores.

This post has been edited by silkworm: Jun 3 2005, 02:59 PM
silkworm
post Jun 3 2005, 04:39 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ Jun 3 2005, 03:29 PM)
Bait works. smile.gif
Augh!

QUOTE
Last I've check, the 68k is a 32bit microcontroller.
The last of the 68K series was the 68332, after which Motorola/Freescale evolved the architecture into the ColdFire series. You just used "68HC". The 8 and 16 bit freescale MCU families are commonly called by "HCxx", eg, HC05, HC12, etc.

QUOTE
I don't think cell-phones or PDAs are running from ROM/EEPROM or just internal RAM. Modern RTOS neccessitates some form of external RAM.
A RTOS only needs as much RAM as necessary for a process table and storing context information (Instruction pointer, stack pointer, flags, GP Registers) for each process. Applications code can still be loaded direct from non-volatile storage like Flash ROM. Application heap space may be in external RAM. I admit we are seeing more lower end phones being able to run user installed programs, namely java games, but for the bare essential workings of a phone, larger RAM is unnecessary.

QUOTE
Now you could argue that PPE processor core is different to microcontroller in embedded systems. But you must realize that early iterations of the computer are infact embedded systems. They are intricately linked as demonstrated by the use of internal RAM on the SPE.
True enough, early personal computers and embedded systems were indistinguishable from each other. But as soon as miniaturisation kicked in, their paths diverged. Microprocessors packed all their peripherals like ADCs, DACs, timers, counters, RAM and ROM into a single package, while general purpose processors today are just cores with one or two levels of cache and buttloads of address/data/IO pins.

The word "embedded" projects an image of permanence, ie the function of the system is unchanged throughout its working life. A general purpose computer on the other hand is transient, the user chooses what he wants to do with the computer by running different application programs. Embedded systems run solely on firmware. A PC BIOS is indeed a form of firmware, but it only bootstraps the system and the rest is up to the user (yes, even an OS is optional).
silkworm
post Jun 4 2005, 12:17 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ Jun 3 2005, 05:27 PM)
Even with everything statically binded, modern function-packed tech. gadgets (qualified as embedded system), would still be requiring some sort of external RAM. You do not need to have a cell-phone that runs Java games to be needing an external RAM, the cell-phone's core functionality itself is sufficient enough to be requiring one.

In a typical cell-phone, you would have RTOS or some form of scheduler, disportionate amount of interrupt sources with corresponding amount of service routines, codes to handle signalling, codes to handle ergonomic, low-level hardware drivers, numerous different level of abstract interfaces (POSIX...etc), DSP algorithm, and huge amount of temporary storage required for real-time data. These translate to a large amount of volatile memory real estates required.

I think you would find most of the paraphernalia running on 32bit core, requiring an external RAM. Now that's lots.
*

OK, I may have underestimated the RAM usage required in the case of a cellphone design, particularly the consumption during baseband processing and the protocol stack. Probably has to do with me playing with 8-bitters too long, I'm too used to offloading the heavy stuff onto ASICs. whistling.gif

QUOTE(ray_ @ Jun 3 2005, 06:59 PM)
Here's the bomb.

Consoles are traditionally embedded systems. With the introduction of the PS3 and the cell technology, Sony wants them to be called "computers":
*

Sony makes no pretense about their PlayStation platform being "computers", after all the division that makes PlayStations is called Sony Computer Entertainment Inc.

QUOTE
Now there are numerous definition for embedded system. Just goggling it returns this:
http://www.google.co.uk/search?hl=en&lr=&c...Embedded+system

Some of the descriptions are ridiculous ("four-digit dates or leap days??"). But some do represent the concept of an embedded system and even fit the bill for our next generation consoles.

There isn't any hard definition of what makes an embedded system. If it has a processor in it, it could be an embedded system. Even a regular PC could be pressed into service as an embedded system, which happens quite often in industrial environments such as production lines.

My personal definition of an embedded system is a system that performs a function, and from the moment it turns on until it turns off, that function does not change fundamentally. A cellphone is an embedded system, because once you turn it on you expect it to function as a phone. Sure you may play games on a cellphone, use it as an organizer or alarm clock, even browse the internet and send e-mails. But when a call comes in, you can still answer it.

QUOTE(ray_ @ Jun 3 2005, 06:59 PM)
But to me, I've always identified a system as an embedded system if there's a need for a cross-compiler to build an application that runs on that system (tool association??). It's a weird definition, but I'll stick to that. Computer lets you write and run application locally in its native language, but embedded system requires a cross-compiler to translates it into target readable instructions and be transfered and loaded into the target's memory space using specialized tools.
Quite a logical definition, but perhaps a bit narrow. For physically small or resource limited systems, this rings true. However, once certain criteria of performance (CPU/RAM) and user interface are met, an embedded system is entirely capable of self-hosting its development environment. For instance, one needs a decent method of character input, a screen that displays a reasonable amount of text, and storage for the source files and compilation tools. My example of the "embedded PC" above could be one such system.

QUOTE
With that said, it would qualify consoles as embedded systems. Unless of course, Sony lets its tool chains run natively on the PS3.
PS2 development is done on what is basically a beefed up version of the retail PS2, running linux. I expect the production versions of the PS3 development kit to be more of the same.

QUOTE
The line has blurred so much that it's almost indistinguishable. Who knows, H@H@ might need to move "Console Couch" back to "Computer" and back again several times. Now, would you call these next generation consoles, computers or embedded systems?
*

Phew at last we're getting back on topic. I'll call them "computer-based entertainment systems". This is not a cop-out. The purpose of a console system is to entertain you. For the moment it means games, but it's recently expanded in scope to cover other forms of electronic entertainment, chiefly movies and music. Stuff like Xbox Live expands it some more by adding a social element. But the one thing you will not find running on a console is a spreadsheet or word-processor. So that strikes it out as a "computer" in the classic sense. You're not going to be calculating the effects of a nuclear explosion, or predict the weather (no matter how much Sony insists you could), so that doesn't make it a "super computer" either. You will be having a good time, (at least the game developers and console manufacturers hope so), or otherwise generally entertained . That's what matters.

Those of you who are still with me up to this point will note that this conforms with my definition of an embedded system too. So what. Bring on the games.
silkworm
post Jun 14 2005, 05:38 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(matrix)
Still, currently, the CELL as it is, sounds good ( the multi SPE) but at the same time, it's a IN-ORDER CPU (like the XBOX360 PPC CPU also), which is much less complicated than Intel/AMD CPU's for PCs which is OUT OF ORDER CPU. I think OOO CPU is more powerful as it can predict instructions and acts accordingly while, IO CPU is simplistic and just acts on whatever it's fed.
You are confused. The purpose of OoO is not to predict, but rather to exploit more instruction level parallelism out of a pipeline. There are basically two major drawbacks to a pipelined structure: data dependancy/hazards and pipeline flushes due to branches. Pipeline flushes affect the whole pipe from the point where the branch is taken and are generally more "expensive" in terms of lost performance. They are countered by branch prediction, which the PPE and SPEs of Cell do have.

Data hazards such as Read-after-Write(RAW), Write-after-Read (WAR), and Write-after-Write (WAW), stall a pipeline by inserting "bubbles" in the pipeline until the dependancies have been solved, at the cost of a few cycles. Out-of-Order Execution may help in special cases where there are enough operands available in the instruction stream to feed to an execution unit, so that the processor can go ahead and execute that instruction while another (earlier) instruction is still waiting for its data to be ready. Sort of like in a bank, if a customer is at the counter and he's busy filling out a form, the teller will ask the next customer to come to the counter, provided that this next customer doesn't need to fill any forms.

Even without OoO, a pipeline is still equipped with data forwarding buffers to minimize length of pipeline stalls. A forwarding buffer "forwards" the result of the execution units to the earlier stages of the pipeline so that those results may be used by following instructions that depend on it, instead of having to wait for it to be written to memory and fetched again. Furthermore, instruction re-ordering and scheduling is also within the capabilities of modern optimizing compilers. Put these two together and pipeline stalls are a non-issue with careful programming.

OoO is "expensive" to implement in terms of complexity and silicon area. In order (pun not intended) to find a suitable instruction to execute out-of-order, the decoded instruction buffer should be quite large. Look at the P4's Netburst micro architecture, their "L1" trace-cache is filled entirely with decoded micro-ops, which the OoO engine can search through to find a suitable candidate for scheduling. The further ahead the OoO engine grabs the instruction from, the larger the re-ordering buffer needs to be; instructions still need to be retired in order. The kind of silicon area that takes up could be put to different use, and in the case of the PPE in the Cell CPU, it looks like it's been taken up by Simultaneous Multi-Threading (SMT).

SMT is another strategy to make efficient use of the execution units in a CPU, at the cost of doubling the necessary units for thread context (Instruction Pointer, Stack, Flags, GPRs), when one thread is using the Adder, another can be using the Load-Store Unit, or the FP unit.

In summary, OoO is not a magic bullet that promises huge performance gains just by being featured in a CPU's architecture. It's not a new idea either, ironically it was used first by IBM in the 1960s, as shown in this research paper. The designers of Cell have made traded off this feature with other architectural features like SMT, dual-issue pipelines, etc to attain a balance that is acceptable to them, and hopefully suitable for the intended application, namely console gaming.
silkworm
post Jun 16 2005, 02:14 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ Jun 16 2005, 01:17 PM)
Actually, I would anticipate a few scenarios that could be implemented into the memory management portion of the SPE in the RTOS:
1) Each SPE's task footprint would be made small enough to fit into the 256K LS. Task would be queued in a shared virtual task pipe and fed into the SPE based on either first-come-first-served basis, round-robin or priority scheduling. Any any rate, there would be frequent memory swap accomodated by the PPE MMU.
2) Each SPE would be assigned its own dedicated virtual task pipe and fed into the SPE based on either first-come-first-served basis, round-robin or priority scheduling. Memory swap is accomodated by the PPE MMU.
3) There aren't enough task to load all 7 SPEs and the redundant SPE(s) could be programmed to power the PS3 grill as required. (*yum...)
*
We can have a peek at the Linux "model" of programming the SPEs in these kernel patches released a couple of months ago. SPEs exist as "files" in the filesystem and that's how program code and data are transferred onto it. The SPEs are idle until a DMA "kick" command is sent to it, after which it looks like it's controlled by the standard POSIX threading API. At least, that's what I've managed to glean from the raw patch data. I'd need to actually patch the full kernel source to see the context of some parts, like the interrupt controller and the memory management unit.

One might also gain insight on the programming model of Cell from the archived presentation/webcast linked from Power.org, from the Barcelona Power conference held last week. I haven't had a chance to view it yet and the webcast isn't downloadable for offline viewing. sad.gif

QUOTE(ikanayam @ Jun 16 2005, 01:27 PM)
IIRC... the only reason the "redundant" SPE is there is to increase yields. The Cell is a pretty big chip, so it's very likely that there will be some imperfections in many of the chips. Since the SPEs take up most of the space, it's likely that a fault is on one of the SPEs. In that case, they can just disable the SPE and still get a useable chip out of it. All chips that have fully functional SPEs will also have one disabled to maintain consistency. It's the same thing with the XB360 GPU. There are more than 48 ALUs but the faulty ones are disabled to increase yields. I think a lot of modern chips have some sort of redundancy for this purpose.
*
Right-o, its all a matter of statistics. Right now fabs are using 300mm diameter wafers, which gives 70,695mm˛. The Cell processor, weighs in at 235mm˛. Divide and you get about 300 dies, take away another 20% or so because you're fitting rectangles into a circle, and that leaves you with about 240 dies per wafer. The SPEs cover around 2/3 or 66% of a single Cell die, so that gives a higher chance that defects would form on a SPE. On new processes, the initial yield is usually below 50%. By making one SPE redundant, they may increase the yield up to 60-70%, which is important if they want to hit the volumes that they expect for PS3.

An SPE may be deactivated by firmware, by putting it in a power saving state, or by modifying the metal layers thereby cutting off the power to it on the IC level. If the first option is used, we might see something like what PC overclockers have been doing for a while now; tweaking firmware to unlock deactivated hardware. But that might be pointless because all PS3 software would be targetted for only 7 SPEs and the reactivated one wouldn't be used anyway.
silkworm
post Jun 16 2005, 04:39 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


Take it easy, the host thread is stalled, not the entire PPE. The SPU/SPE is going to be encapsulated into a Pthread, as you will see in the "bpathread.c" module.

Edit - oops, in my haste, I grabbed the google groups link off my browser history list, and apparently that thread didn't have the whole set of patches. No worries, the complete set of 8 patch files are available on the linux kernel mailing list archive, run a search for "ppc64 bpa" should get you the right hits.

This post has been edited by silkworm: Jun 16 2005, 05:12 PM
silkworm
post Aug 9 2005, 08:37 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ikanayam @ Aug 9 2005, 04:33 AM)
Oh noes this is turning into a spam thread.

Btw the Cell's PPE just got beefed up in the latest revision. Double the size of the 1st revision. Perhaps they realized they needed more general purpose processing power.
*
The PPE got a bump up in size in the "DD2" revision of the Cell, but there hasn't been any concrete explanation as to what was changed, just more speculation. Anyway, more goodies for the technically minded: papers at IBM research

The TRE demo paper is a good one. smile.gif
silkworm
post Aug 12 2005, 10:20 PM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(ray_ @ Aug 12 2005, 11:44 AM)
"Each SPE is responsible for four regions of the screen and the vertical cuts are processed in a round robin fashion, one vertical cut per region, and left to right within a
each region, so no synchronization is need on the output as no two SPEs will ever attempt to modify the same locations in the accumulation buffer even with two vertical cuts in flight (double buffering) per SPE."

I wonder if the vertical cut processing on the SPE, done in a round-robin fashion, is due to the limitation of the DMA. Since SPE basically screams parallelism, the bottleneck would be the result of DMAs prioritized in a way such that each SPE has equal rights to the DMA thus resulting in a round-robin stalemate.

Paralellism lost in translation. You'll make it big if you could patent something that makes DMA bus arbitration and memory coherency a thing of the past.
*
I think you might have gone off-track somewhere. The round-robin scheduling is applied to the four regions being processed within each SPE, not SPE-to-SPE. There is a significant delay in DMA fetches, but that is being hidden by the double-buffering of input and output per SPE. As they explain further down, at any one time an SPE is calculating the ray intersection and downloading data for the next cut into LS. I believe they are doing it in this way to leverage the fact that the SPE is capable of issuing a SIMD arithmetic op and a load/store/DMA channel op per clock.
silkworm
post Oct 3 2005, 08:21 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


QUOTE(DeaDLocK @ Oct 2 2005, 09:52 AM)
How soon after launch do consoles tend to start appearing in Low Yat shelves, and are they NTSC consoles? Are the prices on par with the foreign equivalents?
*
New consoles that are released in Japan usually get here in the first week of release, if not on Day Zero. However, be prepared for a serious markup. Going by the PSP pricing trend, I'd expect the prices of next generation consoles to reach parity with the overseas "suggested retail price" within 3-6 months of release.

PS, Low Yat Plaza isn't really a haven for console gaming, you'd be better off looking for console related things across the street in Sg. Wang or Imbi Plaza.
silkworm
post Jan 26 2006, 08:39 AM

Enthusiast
Group Icon
Elite
965 posts

Joined: Jan 2003
From: Kajang


No, the "+" D-pad harkens all the way back to the Nintendo Game & Watch laugh.gif

 

Change to:
| Lo-Fi Version
0.0573sec    0.47    7 queries    GZIP Disabled
Time is now: 7th December 2025 - 02:40 AM