QUOTE(Demonic Wrath @ Jul 16 2016, 06:52 PM)
Yes, and NVIDIA already presented on Async Compute on Maxwell and Pascal. Static Partitioning and Dynamic Partitioning solution for Asynchronous Computing.
As for the "slow" context switch. A context switch takes about 20 microseconds. The issue on Maxwell is not about the context switch time for Async Compute. The issue is when the workload is imbalanced i.e. compute workloads taking a longer time to complete. Those workloads need to be completed first before a context switch can happen. This is potentially hazardous for Maxwell and make it perform slower. This is also why I think NVIDIA won't ever enable async queues on Maxwell.
Thirdly, my post is about the so-called "dedicated async shader" on AMD card. The way those redditors point out is like AMD having this "dedicated async shader" unit that allows compute to be offloaded. This is simply not true about the ACEs. It's almost like saying having a "dedicated PhysX card" that allows PhysX calculation to be offloaded.
if NVIDIA won't enable async queues on Maxwell, then perhaps Maxwell owners deserve to know (especially if they kept the 970 knowing the 3.5GB issue), if NVIDIA will ever 'enable' or support it in the coming months, or no they've done all they can.
NVIDIA only claim support in their White Papers so far. so you could say that Maxwell does indeed support Async Compute (as in the White Paper), but incapable of using 32 queues for that due to workload imbalance, and it does not bring much performance benefits as they have to rely on slow context switch, usng a single queue for both. that is what Nvidia's document claims by calling it a 'heavyweight switch'.
since everyone is just third party analyzing from sources provided by Nvidia/AMD/third party on the internet, we can be completely misguided too. we can only believe what Nvidia claimed or rely on actual benchmarks.
and about ACEs, its not like Nvidia GPUs don't need Async Compute because they already have high shader utilization in DX11. this under-sells the importance of a Multi-Engine API like DX12 or Vulkan.
"A misconception is that Nvidia sees small gains with Async because their shaders are already fully utilized which is not true. Console shaders are already fully utilized, yet Async Compute still enhances performance. This is because Async is tapping into other GPU engines that normally idle in a serial APIs like dx11."
"DX12/Vulkan as a Multi-Engine API that fully access GPUs sub-units like Shaders, Rasterizers (ROPs) and DMAs (Direct Memory Access units to stream data). While in serial API like DX11 and older, if you're already running work on Shaders you cannot run work on Rasterizers or DMA at the same time."
anyone interested can read more in depth here:
http://www.overclock.net/t/1572716/directx...d-sourcing/0_30This post has been edited by svfn: Jul 16 2016, 08:36 PM