QUOTE(svfn @ Jul 13 2016, 11:47 PM)
yes AMD has less resources to focus on their drivers compared to Nvidia excellent driver team. you can see their performance on dx11/opengl is not as good as Nvidia. for the newer apis the optimization fall more on developers hands. there will also be bad dx12 implementations as well, since there isnt a true dx12 game built from the ground up yet. Pascal will have support for Async on the driver side, so owners dont really have to worry much. for Maxwell i just cant say the same. we also dont know if Async will even be widely used in the future.
here's a post about it from Nvidia reddit:
In both Maxwell and Pascal, it is the same at SM level. Each SM can work on different workload. It is not separated at GPC level. Changes from Maxwell to Pascal are within the SM and also the Gigathread Engine. I'd guess the SMs now have a buffer to store it's "partial" result when doing context switching between graphics/compute workload.
In Maxwell, a context switch can only be made during drawcalls. Imagine: First drawcall: 10 SMs dedicated to graphic load, 6 SMs to compute load. Second drawcall: 14 SMs dedicated to graphic load, 2 SMs to compute load. It will be bad if in first drawcall, the compute load takes longer to finish. The 10 SMs will be idling and go to Hawaii for vacation.
In Pascal, a context switch can be made on instruction/pixel level. Imagine: it can draw pixels halfway and then it can switch to compute load, finish it, then resume the drawing.
Regardless of GCN or SM architecture, a context switch is required to change from graphics to compute workload. Within a AMD CU (64 cores), it can't do graphic/compute concurrently (i.e. 40 cores on graphic, 24 cores on compute). It can, however, do graphics halfway, pause it, switch to compute load, finish it, then resume the graphic.
In this regard, Pascal and GCN works very similarly. In DX12/Vulkan "async compute enabled", without a compute-only queue, NVIDIA GPUs will idle more as there is now fences (fences basically just say "hey, don't submit anymore work until you have the result from the previous work"), this is bad and reduces workload to the GPU. Hence, performance drops.
What NVIDIA prefer is to just keep submitting work to the GPU and let their hardware scheduler figure out how to distribute and dispatch the workload (This is also a feature to keep their GPU busy in DX11).
TL:DR summary : As long as the codes are not programmed in such a way that it stalls the SM, Maxwell should still perform good.
In practical however, graphic workload and compute workload doesn't always end together. There will be some difference on the duration to complete. For example, graphic workload might take 5ms to complete

, compute workload might take 8ms to complete

. So there will be 3ms idle for some SMs. Total job duration: 8ms.
Pascal is the better solution compared to Maxwell. For example, graphic workload takes 5ms to complete. Compute workload not yet complete. The idle SMs can be retasked to complete the compute workload faster, say instead of 8ms, it will only take 6ms. Total job duration: 6ms.
It is also more resilient to badly coded codes. In Maxwell, the driver will just crash since it detects something is not right and stalling the SMs.