Outline ·
[ Standard ] ·
Linear+
86 Mac Plus Vs 07 AMD DualCore! Unbelievable!
|
charge-n-go
|
Jun 3 2007, 01:43 PM
|
|
QUOTE(ikanayam @ Jun 3 2007, 11:09 AM) 4. I don't think memory bandwidth is a limitation in the vast majority of applications. L1 cache hit rates are >90% (>95% even i think) in modern CPUs. L2 will catch much of the rest. Only in certain massively streaming parallel processing applications will you be memory bandwidth limited, but then those tend to be processing heavy as well, so you are probably processing power limited before that. Yes, definitely not memory bandwidth limitation. The smart memory pre-fetch has eliminated a lot of memory transaction on FSB. Actually, there are just too many branches and serialization in modern software, where the the SMT is completely useless. Other factors like cache and page tables coherency also will slow down the multi threaded performance exponentially when we have more n more cores in the system. Of course, these days it is impossible to write programs with only assembly or C language. So the optimization won't be as good as oldern days (like the Mac PLUS). Heck, even C programming is a lot slower than ASM programming! (from my observation from internal test suites). This post has been edited by charge-n-go: Jun 3 2007, 01:48 PM
|
|
|
|
|
|
charge-n-go
|
Jun 3 2007, 02:16 PM
|
|
QUOTE(ikanayam @ Jun 3 2007, 01:57 PM) Prefetching tries to hide latency, it can't hide a lack of memory bandwidth. It increases memory transactions because you don't always prefetch the right data. LoL, my mistake. What was I thinking just now  QUOTE(ikanayam @ Jun 3 2007, 02:09 PM) Programming wise, it's a matter of where you spend your time optimizing of course. Good programming practices coupled with a good compiler can do very well. Of course for your super critical stuff you will want to go down to ASM, but otherwise, you can probably get better improvements by simply spending your time improving your algorithm rather than hacking at ASM code. Same thing with hardware design. Biggest gains come from improvement in algorithms, the gain typically gets smaller as you move down to a lower level. It is in the software with millions of lines of code (such as an OS) that you typically see a lot of hand tuned ASM code. You don't have to do it all in ASM from scratch, you can compile it and then hand optimize the ASM code for certain critical stuff. Obviously it makes sense to do for a function you call a billion times, or for an important inner loop. Well said, the algorithm is very important. C / C++ compiler should not have too much problem as Intel is actually providing them to customers, but I dunno how efficient other high level language will be. From my experience in test coding, the most important part will be the algorithm within a FOR loop. If 1 loop is a bit slower, the end result will b farking slow considering there are million of loops (streaming for eg). Actually, some critical functions which access hardware directly (usually APIs) will be written in ASM or C programming.
|
|
|
|
|