NVIDIA GeForce Community V15 (new era pascal)

Lowyat.NET forums

Lowyat.NET Kopitiam Garage Sales

Lowyat.NET Rules and Regulations FAQ Help Search Members

Welcome Guest ( Log In | Register )

Lowyat.NET -> Hardware -> Hardware Clubs / Brand Discussions

Bump Topic Topic Closed RSS Feed

3 Pages < 1 2 3Bottom

Outline · [ Standard ] · Linear+

NVIDIA GeForce Community V15 (new era pascal), ALL HAIL NEW PASCAL KING GTX1080 out now

views

Demonic Wrath	Jun 4 2016, 10:34 AM Return to original view \| Post #41
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(JohnLai @ Jun 3 2016, 11:51 PM) ............Does Pascal driver actually has async compute support enabled in first place? Nvidia keeps on claiming async compute support still not enabled yet in the driver. I might be wrong but it doesn't seem to need a "special driver" to enable async compute capability. You can read from GTX1080's whitepaper on their approach in Maxwell and Pascal on the async compute. You can see this benchmark. It shows that NVIDIA's card is capable of async compute even in DX11. In DX11 mode, even a GTX980 wins Fury X performance. » Click to show Spoiler - click again to hide... « Note that AOTS is very suitable for GCN's architecture. It shows what GCN is capable of and its pure compute performance. If a Fury X uses all its CU (compute units) at 100% (8.6 TFlops), it will beat GTX980Ti (6.1 TFlops). On best case, it can even perform similar to GTX1080 (8.8 TFlops). But in games, it would be limited by scheduling efficiency, pixel fillrate (ROPs), geometry performance etc. Async Compute solves the scheduling efficiency issue for AMD so it can utilize it's massive amount of CUs better. In the end, how much utilization and how balanced the hardware is the important part. (Refer Intel vs AMD CPUs - more cores is not equal more performance) IMHO, NVIDIA's card is more balanced and utilized better. That's why GTX980Ti can perform better or on par with FuryX even when it's theoretical peak shader performance is lower compared to AMD's. Any games that are built for a specific vendor architecture will run better when using the vendor's hardware. You can see when tessellation factor is increased to 64X (overkill) or Gameworks enabled, it will run better on NVIDIA hardware.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 5 2016, 11:55 AM Return to original view \| Post #42
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jun 4 2016, 08:26 PM) Bro, sorry had to correct you here. Async compute is one the new features available in DX12, but not in DX11. Nvidia previous Maxwell GPU does not support Async Compute in DX12, like AMD Fiji GPU. For the case of Ashes of Singularity, it can run in DX11 or DX12. And in DX12, Async Compute can be enabled and disabled. It was AoTs benchmark which highlighted the Maxwell Async Compute issues. There are quite a number of analysis, but here are few, generally between GTX 980 Ti vs Fury X. There seems to be some misunderstanding on what the DX12 async compute enabled or disabled means.. Most of the "async compute" is referring to the scheduling method. Not whether the GPU can have both graphic and compute tasks processed at the same time. What DX12 enables is allowing work to be dispatched concurrently. If no work is being dispatched to the compute unit, it will idle. Traditionally, DX11 has a single hardware work queue. The CPU will see only a single queue to submit task to. Queue is basically a list of pending task waiting to be send to GPU compute units to be processed. Say you have 3 streams of tasks. CODE Stream ABC - graphic Stream DEF - compute Stream GHI - compute Stream ABC is independent of DEF so theoretically it can work in parallel on different compute units. In DX11 with a single work queue, Step 1: the CPU will submit ABC \| DEF \| GHI to this work queue sequentially. The GPU can only know if the task can be processed concurrently when it reaches the scheduler. CODE DX11 CPU to GPU Hardware queue: ABC \| DEF \| GHI Step 2: The GPU scheduler will dispatch to idle compute units (from the top to bottom) to process it CODE A B C and D concurrently since the scheduler knows it is independent of each other E F and G concurrently since the scheduler knows it is independent of each other H I This causes some of the compute units to be under occupied. In DX12, there's now 3 types of hardware queue (graphic, compute, copy). Step 1: The CPU will send the task to its respective queues (graphic task to graphic queue etc.) Remember, the task needs to be independent of each other so it doesn't rely on other data to process. Now the queue becomes: CODE Graphic hardware queue: ABC Compute hardware queue 0: DEF Compute hardware queue 1: GHI Step 2: The GPU scheduler will then dispatch to idle compute units (from top to bottom) to process it CODE A, D, G concurrently B, E, H concurrently C, F, I concurrently This improves the utilization of the GPU since there is more work to feed the GPU. On NVIDIA However, NVIDIA has a more intelligent to handle the work queue. Again, we'll use the example from above. Stream ABC, DEF, GHI. In DX11, Step 1: the CPU will still submit the work queue sequentially. CODE DX11 CPU to GPU queue: ABC \| DEF \| GHI Step 2: Once the GPU has received the task list, it will check for dependency and distribute to another hardware queue. CODE GMU to GPU Hardware queue 0: ABC GMU to GPU Hardware queue 1: DEF GMU to GPU Hardware queue 2: GHI Step 3: The GPU scheduler will then dispatch to idle compute units (from top to bottom) to process it CODE A, D, G concurrently B, E, H concurrently C, F, I concurrently This improves the utilization of the GPU since there is more work to feed the GPU. However, in DX12 with "async compute" enabled, Step 1: The CPU will send the task to its respective queues (graphic task to graphic queue etc.) CODE Graphic hardware queue: ABC Compute hardware queue 0: DEF Compute hardware queue 1: GHI Step 2: NVIDIA driver now don't give a ** on the different hardware queues submitted from CPU. So it still uses its own scheduling method. As with DX11, once NVIDIA GPU has received the task list, it will check for dependency and distribute to another hardware queue. CODE GMU to GPU Hardware queue 0: ABC GMU to GPU Hardware queue 1: DEF GMU to GPU Hardware queue 2: GHI Step 3: The GPU scheduler will then dispatch to idle compute units (from top to bottom) CODE A, D, G concurrently B, E, H concurrently C, F, I concurrently Summary For NVIDIA, separate hardware queues from CPU to GPU doesn't do anything to improve its performance since it can already distribute work efficiently. If you're seeing slight performance drop when Async Compute is enabled, it is probably because of the redundant overhead created in Step 2 (check for dependency and distribute to another hardware queue). NVIDIA could probably do something to the scheduler so the GPU can skip this step and work similar to AMD's. But the performance gain is minimal, so NVIDIA prefer to work on other parts of the architecture to improve performance. For AMD, separate hardware queues from CPU to GPU will improves its performance i.e. more works can be feed into the compute units. In diagram:- » Click to show Spoiler - click again to hide... « DX11 DX12 » Click to show Spoiler - click again to hide... « If you see AOTS performance, AMD's performance should be higher than NVIDIA due to its higher raw compute performance. As I mentioned earlier, Fury X best case (100% compute) could almost rival even GTX 1080 best case (100% compute). The problem is that games are not 100% compute performance. It depends on various things such as ROP, geometry hardware, memory bandwidth etc. Finally in DX12, AMD GCN architecture is more efficient compared to Maxwell async compute capability. Why? Because it can assign work dynamically to the compute unit and the context switching of the CU is independent to draw-call. Maxwell SMs can only do context switch during draw-call. If either the graphic stalls or the compute stalls (i.e. the SM cannot finish the allocate work in a single draw call), it will have to wait until the next draw call. Pascal, however, doesn't require waiting for draw-call to do context switch. If any workload stalls, the GPU can dynamically allocate more processor to work on the task, independent of the draw-call timing. So in summary, GCN, Maxwell and Pascal can still work on graphic and compute at a same time. It depends on whether the scheduler can dispatch work concurrently. NVIDIA GPU can fully dispatch work concurrently in DX11, but AMD GPU's ability to dispatch work concurrently is limited in DX11. However in DX12, it can fully dispatch work concurrently. This post has been edited by Demonic Wrath**: Jun 5 2016, 12:39 PM
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 6 2016, 03:05 PM Return to original view \| Post #43
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jun 6 2016, 02:35 AM) Thanks for the explanation Bro. My basis was just on DX12 Async Compute feature. Though Pascal has improved a lot for Async Compute, but Maxwell (regardless if it has its own load balancing, pre-emption etc that may be similar to what Async Compute), still technically does not support DX12 Async Compute. It may do some sort of its own 'proprietary async compute", but it is still not the "DX12 Async Compute", one that uses the DX12 API. So for owners of Maxwell card like me (GTX 970), its pretty much a disappointment. More so when new game developers are lauding the Async Compute feature, not just PC, but for Playstation 4 and Xbox One. Those developers are lauding of the Async Compute feature because it is a required feature to extract AMD's architecture (PS4 and XBOX One). If not, AMD GPU's utilization will be low due to scheduling issues (not enough work to feed the compute units). Nothing to be disappointed about... it's just like you don't need to eat Panadol if you're feeling well...NVIDIA doesn't need "DX12 async compute queues" to have a high utilization of it's cores.. less idling cores = more efficient = more peformance... remember, the final result is to utilize the cores fully. Even if the utilization of the cores are efficient, some games will run faster on AMD card while some on NVIDIA card. This is because different bottlenecks in different game. For example, if the game is heavily tessellated, it will always run faster on NVIDIA cards. Gameworks run faster on NVIDIA cards because it focuses on NVIDIA's strength on geometry and pixel fillrate. Likewise, AMD have been encouraging developer to use more compute because AMD cards are stronger on compute/shader arithmetic (TFlops) compared to NVIDIA.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 6 2016, 04:30 PM Return to original view \| Post #44
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(TheHitman47 @ Jun 6 2016, 03:15 PM) Gameworks is not comparable in this case at all. that thing is not even open source to begin with. Open source or not, my point is that NVIDIA coded Gameworks in a way that is beneficial to their architecture (obviously) (for example extreme tessellation factors). This causes a larger performance hit on AMD cards due to their weaker geometry capability. IF AMD coded GPUOpen to exploit it's compute performance to a higher degree (lots of compute, minimal geometry, minimal pixel), NVIDIA GPU will definitely run slower. I'd expect GTX1080 to run at the same performance at Fury X. It highly depends on how the game dev code the game and which component in the GPU they want to saturate. AOTS is largely compute performance bound. Why this conclusion? Because the FPS scaling correlates with the TFLops. NVIDIA GPUs having lower performance in this game is not due to async compute capability or whatsoever, it is mainly due to difference in compute performance. Reason for GTX1080 beating FuryX in AOTS is because it has higher ROP performance and similar compute performance. Reason why R9 390X beating GTX980 in AOTS is because 390X has higher compute performance (5.9 > 4.9 TFlops). Reason why R9 390 beating GTX970 in AOTS is because 390 has higher compute performance (5.1 > 3.9 TFlops). How to test this? 1. If anyone with a GTX1080 has some free time, downclock it to 1000MHz and you should be getting R9 390 (2560 cores) performance in AOTS (either in DX11 or DX12). Why 390? Because it has the same amount of cores and ROPs. 2. If you overclock GTX980 to 1440MHz (same TFLops as R9 390X), you should be getting R9 390X performance in AOTS. 3. If you overclock GTX970 to 1532MHz (same TFLops as R9 390), you should be getting R9 390 performance in AOTS. If NVIDIA Maxwell or Pascal are getting less performance compared to AMD GCN at the same TFLops, it means that it has problem with concurrent graphics + compute, which it does not. Note: To test AOTS async compute scheduling, don't test at 4K since other components (for example memory bandwidth) will bottleneck the performance. This post has been edited by Demonic Wrath: Jun 6 2016, 04:41 PM
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 7 2016, 11:12 AM Return to original view \| Post #45
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	Expected GTX1070 price Normal Edition MSRP USD379 x 5 = RM 1,895 Expect AIB to be around RM 2,000. Founder Edition MSRP USD449 x 5 = RM 2,245
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 7 2016, 04:56 PM Return to original view \| Post #46
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(SSJBen @ Jun 7 2016, 04:45 PM) People need to understand a few things regarding 1080p vs 4k. 4k monitors/TVs aren't actually 4k in resolution, it's actually 3440x2160 instead of the actual 3840x2160. Stop being duped by Hollywood. 3440x2160 is an EXACT 4 times increament of resolution over 1920x1080. All a monitor or TV need to do is to quadruple the FHD image into UHD without any further calculations. This is different to when 480p was upscaled to 1080p, or 720p going to 1080p. Neither 720p or 480p were linear increase in pixel count when being upscaled to 1080p, which is why 480p often looks like horseshit in FHD (even with the best post-processing scaler). 4K monitors is 3840 x 2160, not 3440 x 2160. 3440 x 2160 is not exactly 4x 1920 x 1080... 3840 x 2160 is exact 4x of 1920 x 1080. I think you meant it is not 4096 x 2160 (4K) A more accurate would be 2160p. Since {height}p nomenclature is almost always referring to 16:9 aspect screens.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 7 2016, 05:48 PM Return to original view \| Post #47
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(scchan107 @ Jun 7 2016, 05:42 PM) Any recommended 1440p@120hz(or 144hz) monitor? Currently poorfag looking at Dell U2515H Acer Predator XB271HU Get a Gsync one. Hehe.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 7 2016, 07:35 PM Return to original view \| Post #48
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	For 27" monitor and at 30" viewing distance, going higher than 2560x1440 will not have any more benefit. The eyes will hardly see the pixels. 1440p is actually already a very optimal resolution. Larger screen will require the user to sit further, cancelling any benefit from increasing the resolution density.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 8 2016, 05:06 PM Return to original view \| Post #49
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	My advice is not to go for 4K UHD panels for computer monitors. I doubt anyone can really see the pixels difference between 1440p and 4K at monitor viewing distance (24" distance). Of course, if viewing at 1-6" viewing distance it is noticable, but who would view at that range for normal usage.. The optimal should be 1440p @ 27" at 120hz for computer monitor. Larger screen will require you to sit further, negating any benefit of increased pixel density.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jun 24 2016, 08:50 PM Return to original view \| Post #50
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	For me I'll only upgrade if the graphic card is at least 4x more powerful than my current graphic card. So far so good
Card PM	Report Top Like Quote Reply

« Next Oldest · Hardware Clubs / Brand Discussions · Next Newest »

3 Pages < 1 2 3Top

Topic ClosedOptions

Change to:

0.0539sec

0.46

7 queries

GZIP Disabled
Time is now: 26th November 2025 - 05:33 AM

All Rights Reserved © 2002- 2025 Vijandren Ramadass (~unite against racism~)

Removal Request

Powered by Invision Power Board © 2025 IPS, Inc.