QUOTE(Demonic Wrath @ May 20 2016, 08:36 PM)
I don't understand why it is regarded as "brute force" when it is performing better using less resources? I'm sure they're actually very efficient since they can perform better than AMD with lower count of cores and power consumption. You wouldn't say Intel CPU (having less core) is inefficient compared to AMD's CPU, right?
i did read and they did explain the differences. you'll just have to google the sources like i did.but bottomline the result of async on/off performance. this picture pretty much sums why it matters

but at the end of the day, if graphics a fps is still higher despite being less efficient compared to product b's better async compute method, guess who wins?
but... if amd can achieve a fps result quite close to pascal which costs a lot more.... then people may go for team red if you get more bang for your bucks (assuming they aren't already locked down with a gsync monitor). though their gonna have to wait for amds new cards to come out, so nvidia has the better timing
*update
found someone who explained some of the more techie stuff if your interested
QUOTE
May 17, 2016 | 02:43 PM - Posted by Anonymous (not verified)
"Similarly for compute tasks, Pascal integrates thread level preemption. If you happen to be running CUDA code, Pascal can support preemption down the instruction level!"
So what they may be saying is that its improved, but that it's not fully hardware based, and that single instruction preemption needs CUDA code to be of any help for debugging at the single instruction level(AKA single stepping through code in debugging mode)! Most certainly Nvidia has improved on some thread level graphics/compute partally in hardware scheduling and that will result in better GPU hardware execution resources utilization than the previous Nvidia generations.
I do not like the sounds of that “happen to be running CUDA code” as that smacks of a vendor specific proprietary solution that forces others into the CUDA ecosystem in order to obtain the ability to look at things at the instruction level. How is this going to play out for Vulkan/other API debugging, as well as OpenCL, or other cross platform open code/graphics APIs/other code that may not be using CUDA.
There is going to have to be a serious comparison and contrast of the in hardware async-compute features of both Polaris/Vega, and Pascal/Volta and it cannot wait for the Hot Chips Symposium white papers and other professional trade events.
Any GPU processor thread scheduling/dispatch done in software is just not going to be as responsive to any sudden asynchronous events that might occur at the hardware/instruction level as that which is done fully in the hardware buy specialized in hardware GPU processing by a hardware based thread/instruction scheduler/dispatch and context switching unit. No amount of trying to hide latencies for asynchronous events in software can result in as efficient and as rapid of as response to an asynchronous GPU processing thread event as that which in fully implemented in GPU's/any processor's hardware! Without the fully in hardware asynchronous compute processor thread scheduling/dispatch and context switching there will be idle execution resources, even with work backed up in the processor’s thread scheduler queues! Most software based scheduling, for lack of fully in hardware based units, has an intrinsic deficiency in the software's ability to respond at the sub-single instruction level to any changing event in a GPU processing units execution pipelines(FP, INT, and others) like having the fully in hardware async-compute units does.
Read up on Intel's version of SMT(HyperThreading) to see how async compute is done fully in hardware, and async compute done fully in a GPUs processor thread dispatch/scheduling/context switching units has a large advantage over any software, or partially in software processor dispatch/scheduling/context switching for asynchronous compute. The fully in hardware based asynchronous compute has the fastest response to any asynchronous events, and the best processor execution resources utilization possible!
"Similarly for compute tasks, Pascal integrates thread level preemption. If you happen to be running CUDA code, Pascal can support preemption down the instruction level!"
So what they may be saying is that its improved, but that it's not fully hardware based, and that single instruction preemption needs CUDA code to be of any help for debugging at the single instruction level(AKA single stepping through code in debugging mode)! Most certainly Nvidia has improved on some thread level graphics/compute partally in hardware scheduling and that will result in better GPU hardware execution resources utilization than the previous Nvidia generations.
I do not like the sounds of that “happen to be running CUDA code” as that smacks of a vendor specific proprietary solution that forces others into the CUDA ecosystem in order to obtain the ability to look at things at the instruction level. How is this going to play out for Vulkan/other API debugging, as well as OpenCL, or other cross platform open code/graphics APIs/other code that may not be using CUDA.
There is going to have to be a serious comparison and contrast of the in hardware async-compute features of both Polaris/Vega, and Pascal/Volta and it cannot wait for the Hot Chips Symposium white papers and other professional trade events.
Any GPU processor thread scheduling/dispatch done in software is just not going to be as responsive to any sudden asynchronous events that might occur at the hardware/instruction level as that which is done fully in the hardware buy specialized in hardware GPU processing by a hardware based thread/instruction scheduler/dispatch and context switching unit. No amount of trying to hide latencies for asynchronous events in software can result in as efficient and as rapid of as response to an asynchronous GPU processing thread event as that which in fully implemented in GPU's/any processor's hardware! Without the fully in hardware asynchronous compute processor thread scheduling/dispatch and context switching there will be idle execution resources, even with work backed up in the processor’s thread scheduler queues! Most software based scheduling, for lack of fully in hardware based units, has an intrinsic deficiency in the software's ability to respond at the sub-single instruction level to any changing event in a GPU processing units execution pipelines(FP, INT, and others) like having the fully in hardware async-compute units does.
Read up on Intel's version of SMT(HyperThreading) to see how async compute is done fully in hardware, and async compute done fully in a GPUs processor thread dispatch/scheduling/context switching units has a large advantage over any software, or partially in software processor dispatch/scheduling/context switching for asynchronous compute. The fully in hardware based asynchronous compute has the fastest response to any asynchronous events, and the best processor execution resources utilization possible!
QUOTE
P.S. true hardware based asynchronous compute is fully transparent to any software(except the ring 0 level of the OS Kernel, Mostly for paging/page fault events and other preemptive multitasking OS context switching/hardware interrupt handling events) and is fully implemented in the processors hardware for CPU/GPU hardware processor thread scheduling/dispatch/context switching!
For a discrete GPUs the OS is in the card's firmware(mostly) and GPU drivers, and runs under control of the system's main OS/driver/OS Driver API(WDDM for windows, Kernel drivers for Linux) software stack.
For a discrete GPUs the OS is in the card's firmware(mostly) and GPU drivers, and runs under control of the system's main OS/driver/OS Driver API(WDDM for windows, Kernel drivers for Linux) software stack.
source:
https://www.pcper.com/reviews/Graphics-Card...al-Gamers/GPU-B
so in layman terms what does this all mean?
1. amds hardware async compute is the way to go, and is the right way to do it.
2. the performance chart clearly shows the performance difference between async compute enabled/disabled. see which card has the better gains when the feature is enabled? so that itself goes to show amd did it right.
that said it still seems that even with it's inefficiencies, the 1080 is still doing more fps regardless. so i'd still get a pascal when upgrading from my 680. But the question here is, if amd new card comes out, and if it's cheaper, will it be able to out perform or get close to fps of a 1080 at a cheaper price point ? thats the million dollar question
if you had a 980 or 980ti, it's probably better to skip over pascal and just wait for a volta imho. but for me with a 680 upgrading now is fine.
This post has been edited by Moogle Stiltzkin: May 20 2016, 11:21 PM
May 20 2016, 09:40 PM

Quote
0.0305sec
0.84
6 queries
GZIP Disabled