NVIDIA GeForce Community V16 (welcum pascal)

Lowyat.NET forums

Lowyat.NET Kopitiam Garage Sales

Lowyat.NET Rules and Regulations FAQ Help Search Members

Welcome Guest ( Log In | Register )

Lowyat.NET -> Hardware -> Hardware Clubs / Brand Discussions

Bump Topic Topic Closed RSS Feed

5 Pages < 1 2 3 4 > » Bottom

Outline · [ Standard ] · Linear+

NVIDIA GeForce Community V16 (welcum pascal), ALL HAIL NEW PASCAL KING GTX1080 out now

views

Demonic Wrath	Jul 7 2016, 09:42 PM Return to original view \| Post #21
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(goldfries @ Jul 7 2016, 09:37 PM) http://www.hardwarezone.com.my/tech-news-n...dition-official http://www.lowyat.net/2016/109080/nvidia-g...around-rm-1000/ 2016 already and people still never learn, still want to convert the announced SRP for upcoming graphics card to MYR with exchange rate. Dream on guys, it doesn't work that way. I has never been direct conversion since the 1990s and it won't be even now. Your best bet to avoid disappointment is to multiply it by 5.2. SRP x exchange rate x 1.3 (add 30%, at a bare minimum). Hmm, if based on 5.3, RM 1320 for GTX1060 RM 1580 for GTX1060 FE Lol they already know people won't get 2 of these instead of 1x GTX1070 that's why they don't put SLI on GTX1060. With this kind of pricing, GTX1070 sales will increase. NVIDIA StrategyWorks. This post has been edited by Demonic Wrath: Jul 7 2016, 09:42 PM
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 7 2016, 10:20 PM Return to original view \| Post #22
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(jonathan_k @ Jul 7 2016, 10:15 PM) Was deciding between 1060 and 1070 for purely 1080p(sofar) gaming Non-FE 250x5.2x1.3 = RM1690 FE 300x5.2x1.3 = RM2028 GTX 1070 RM1999 here i come. That's some seriously wrong calculation there. Based on previous release (GTX1070 and GTX1080): USD 379 x 5.3 = ~RM 2000 USD 449 x 5.3 = ~RM 2400 GTX1080 USD 599 x 5.3 = RM 3200 USD 699 x 5.3 = RM 3700 so GTX1060 USD 249 x 5.3 = RM 1320 USD 299 x 5.3 = RM 1580
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 8 2016, 09:14 AM Return to original view \| Post #23
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(kmarc @ Jul 8 2016, 08:16 AM) Historically, IINM, newer generation GTX x60 cards could never beat a previous generation GTX x80 cards. If 1060 could consistently beat a 980 in most games, that would be a surprise. From the GTX1080 and GTX1070 reviews, it seems that Pascal with GPUBoost 3.0 will clocks higher than its rated boost clock, running at 1780 to 1800MHz most of the time. GTX980 TFLops = 2048 x 1216 x 2 = 4.98 TFlops GTX970 TFlops = 1664 x 1178 x 2 = 3.92 TFlops GTX1060 TFlops = 1280 x 1800 x 2 = 4.6 TFlops (very similar to GTX980 perf) Overclocked:- GTX980 = 2048 x 1450 x 2 = 5.94 TFlops (close to GTX980Ti, stock) GTX970 = 1664 x 1450 x 2 = 4.83 TFlops (close to GTX980, stock) GTX1060 = 1280 x 2050 x 2 = 5.25 TFlops Using the TFlops comparison, this is IMO quite accurate indicator on how GP104 and GP106 will perform based on Maxwell. It is after all very similar layout. GP100's layout is somewhat different from GP104. GTX1070 = 1920 x 1800 x 2 = 6.92 TFlops GTX 980Ti = 2816 x 1075 x 2 = 6.05 TFlops GTX TitanX = 3072 x 1075 x 2 = 6.60 TFlops Hence, GTX1070 is faster than TitanX. But what happens when overclocked? GTX1070 = 1920 x 2050 x 2 = 7.87 TFlops (only 14% increase) GTX980Ti = 2816 x 1450 x 2 = 8.17 TFlops (35% increase ) GTX980Ti once overclocked, will be faster, a bit
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 8 2016, 03:10 PM Return to original view \| Post #24
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(Someonesim @ Jul 8 2016, 02:59 PM) Thanks for that info. Guess if I want to jump on 980 ti bandwagon, need to decide fast. I'm saw a LYN seller have Gigabyte 980 ti @ 1825 ( winforce ) and 1925 ( G1 gaming ), didnt check if got stock or not I wonder how will 980 ti OC vs 1070 for the next 2 years. Will 1070 gain much more performance boost with driver optimization during coming months/years, that it's much worthy to go 1070 path ? 980 ti have upper hand on cude cores count and 384 bit memory, which will benefit via OC. Pascal with its dynamic load balancing will be more efficient compared to GTX980Ti. It handles overlapping workloads (graphic + compute) better than Maxwell. It is more resilient towards badly optimized code compared to Maxwell (i.e. long running compute task) There's also the Simultaneous Projection thing. It is a more refined Maxwell. So IMO GTX1070 is a better choice.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 9 2016, 01:11 PM Return to original view \| Post #25
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(Someonesim @ Jul 9 2016, 01:01 PM) 1+1 + 1+1 ? Four GPU to beat one GPU ? No la... CFX vs GTX1080. GG Nvidia
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 9 2016, 03:15 PM Return to original view \| Post #26
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(goldfries @ Jul 9 2016, 02:00 PM) He's correct la. RX 490 is dual GPU so it's 1+1 CFX RX 490 (if ever it is possible) would be (1+1) + (1+1) need.... more.... power.... overkillllllll well, GTX1080 SLI will kill RX490CFX (if the rumors are true RX490 = RX480CF)
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 9 2016, 09:30 PM Return to original view \| Post #27
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(ssxcool @ Jul 9 2016, 09:01 PM) my home feels like sauna lol Lol your ambient is 40'C? Gigabyte G1 dissipates its heat in the casing.. need to have good airflow casing.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 10 2016, 01:26 AM Return to original view \| Post #28
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jul 10 2016, 01:02 AM) Well they may not hv the money now, but at least they hv an option to get CFX for cheap a few months or a year down the road when 2nd hand RX 480 price drops by half. 40% to 90% bump on performance (on games that scales) for another RM 600-RM 700 is gold for those on tight budget. I guess GTX 1060 buyers have to be more firm about their upgrade plan. If they dont hv intention to go 1440p soon, than its okay. But if they do, then they have to stomach the discount selling off their GTX 1060 and then add back RM 2,000 to get GTX 1070. I sold of my 6 months old GTX 970 SLI for almost RM1,200 discount, can feel the pain. You sold your GTX970 SLI for RM 1,200 (RM 600 each)? Your estimation is a bit wrong. CFX or SLI is less flexible. Why RX480 drop to RM 700 but GTX1070 stays at RM 2,000? Maybe it can drop to RM 1,400? Buying new Buy RX480 - RM 1299 Buy RX480 - RM 1299 Total used = RM 1299 + RM 1299 = RM 2598 Buy RX480 - RM 1299 Sell RX480 - RM 700 Buy GTX1070 - RM 1999 (If 11 series comes out, then stock clearance cheaper) OR GTX1160 - RM 1500 but performance like GTX1070 ?? Total used = RM 1299 - RM 700 + RM 1999 = RM 2598 Total used = RM 1299 - RM 700 + RM 1500 = RM 2198 (theoretical GTX1160) Buying 2nd hand Buy Rx480 = RM 1299 Buy 2nd hand RX480 = RM 700 Total used RM 1299 + RM 700 = RM 1999 Buy RX480 = RM 1299 Sell RX480 = RM 700 Buy 2nd hand GTX1070 = RM 1400 Total used = RM 1299 - RM 700 + RM 1400 = RM 1999
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 10 2016, 01:53 AM Return to original view \| Post #29
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(goldfries @ Jul 10 2016, 01:33 AM) Wah so free to do mathematics. Anyway for me I feel that 1060 without SLI is fine. GTX 960 with SLI never made it, no reason to buy GTX 960 SLI when GTX 970 does better overall. Same goes to 1060. Better off selling it and get a GTX 1070. It's only what's expected from us Asian, to be able to do math at 1 AM. Not planning to disappoint.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 10 2016, 10:38 PM Return to original view \| Post #30
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(svfn @ Jul 10 2016, 10:25 PM) isnt in between 970 and 980 same as rx480? Not really.. in between has 3 possibilities: near GTX970, near GTX980, middle of GTX970 and GTX980. RX480 is more like near GTX970. I suspect GTX1060 will be near GTX980.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 11 2016, 01:25 PM Return to original view \| Post #31
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	Ermm...the reason for large increase in AMD's performance for DX12 is because their DX11 drivers has too much overhead (more specifically the driver doesn't support DX11 deferred context).. It is not because of async compute queues that just magically increase its performance by so much.. In the benchmark, you can see that in DX11, NVIDIA GPUs is capable of delivering almost 3x performance compared to AMD GPUs. And in DX12, NVIDIA GPUs will scale too. » Click to show Spoiler - click again to hide... « In games like Project Cars, the developer mentioned that AMD GPU bad performance is due to driver overhead. But fanboys obviously points it to Gimpworks / NVIDIA touched and refuse to see that bad driver is bad drivers. In Total War Warhammer, NVIDIA GTX970 can even perform similar to FuryX in 1080p » Click to show Spoiler - click again to hide... « Eurogamer also mentioned that if you're using low end CPUs, it's better to pair with NVIDIA GPUs as its performance won't be CPU-limited that much compared to AMDs. Which is kinda ironic since AMD is often viewed as the best cards for budget builds. AMD GPUs has a somewhat good hardware foundation. Their GPUs have a good compute performance compared to NVIDIA counterpart (i.e. GTX970 (3.7TFLops) vs R9 390 (5 TFlops)). This means that their GPUs are very good at shader performance. It just couldn't utilize it. DX12 games that utilize a lot of draw calls (especially RTS) will see a bigger boost in AMD HW compared to NVIDIA. DX12 games that utilize a lot of shaders for its post processing (i.e. SSR, SSAO) will excel using AMD hardware (i.e. AOTS, Total War Warhammer). DX11 games that utilize heavy shaders and not drawcall/geometry bottlenecked will excel too. (i.e. the new Hitman) From this, you can see that AMD is moving to reduce their performance difference in geometry performance. With RX480, AMD chose not to reduce it's geometry processor and remains the same count as 4 (fun fact: RX480 geometry processing performance is a bit higher than Fury X due to higher clocks). If someday AMD decides to put more focus on its drivers for DX11, then you'll see RX480 outperforming GTX970 very frequently as it's sheer performance is higher. Heck, I would not be surprised if it performs like GTX980. But when will that day comes? No one knows. As for NVIDIA, it still does what it's already doing, increase compute power each generation, increase power efficiency. Currently, there's no cards from AMD that can beat GTX1080 in gaming because GTX1080 has higher compute performance, higher geometry performance, better drivers. It is kinda funny how sometimes people mention NVIDIA is bruteforcing their way. But it is actually AMD that is brute-forcing its way to gain performance. Brute-forcing and power efficiency doesn't go hand-in-hand.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 11 2016, 04:51 PM Return to original view \| Post #32
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jul 11 2016, 03:32 PM) Now in 2016, with games coming out in both Dx11 and DX12 Asyn Compute capabilities, benchmark I saw showed that AMD gains in DX12 is mostly attributed with to Async Compute. Hitman DX12 and AoTs., which interestingly for Maxwell cards, runs slower in DX12 than in DX11, unlike what happened with the Star Swarm bench. Hitman already run worse in DX11 for NVIDIA cards. Also, in Guru3D Hitman Benchmark, AMD cards show slight performance decrease too in DX12 (1080p). Fury X doesn't show perf improvement, R9 390X is 1fps behind FuryX in 4K. Some benchmark shows performance regression (even on AMD cards). AOTS I suspect is compute shader bound. That's why the performance and Tflops is correlated (GTX980 having the same TFlops as R9 390 will perform the same). Edit: Also another game exhibit this performance behavior is Quantum Break, where AMD GPUs performs better. It is not using async compute too. QUOTE(adilz @ Jul 11 2016, 03:32 PM) And to say that AMD older gen cards gains is due to driver works better on DX12 than DX11 does not explain why in Raise of the Tomb Raider bench, AMD Fury performed better in DX11 than DX12. Bench March 2016 (Here) You do realize that NVIDIA's performance dips (from DX11 to DX12) to in that benchmark you linked right? QUOTE(adilz @ Jul 11 2016, 03:32 PM) Keyword: Async Compute. Because in March, Raise of the Tomb Raider even in DX12 does not have Async Compute, Developer just announced game patch 2 days and among other patch note Adds utilization of DirectX 12 Asynchronous Compute, on AMD GCN 1.1 GPUs and NVIDIA Pascal-based GPUs, for improved GPU performance. News here. Again, keyword is Async Compute, which only AMD GCN 1.1 above and Pascal can take advantage of. So will wait for the new bench to come out and see if if Async Compute makes any difference to this game. If AMD cards get better in DX12 than DX11 after the patch, its most probably because of Async Compute. This is the latest comparison between RX480 vs GTX970 with the newest patch applied. Looks like RX480 still behind GTX970. https://www.youtube.com/watch?v=CdWM7eQZnNc Cheers This post has been edited by Demonic Wrath: Jul 11 2016, 05:34 PM
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 11 2016, 09:57 PM Return to original view \| Post #33
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(svfn @ Jul 11 2016, 06:49 PM) cant find proper patch 7 bench with/without Async for Pascal too. our OC R9 290: https://www.youtube.com/watch?v=BfObYNEQkE8 only about 2 FPS gain on older gen "this game has no support for dx12 from gound up. the dx12 support offered now is just an envelope that is covered on dx 11 code." Hmm.. isn't Hitman and AOTS built from DX11 codes too? I really doubt they'd build 2 engines (DX11 and DX12) from ground up just for this... it could be unoptimized DX11 vs optimized DX12 codes too.. really doubt they can have that much time to optimize for DX11 and DX12.. especially when they're largely different designs. ROTTR is built for XBoxOne first, so it should have the proper layout (as in how to schedule tasks, what workload to multithread and so on) for DX12 support already.. but as most points out, it's a mess This is also might be a large barrier for DX12 and shows that PC hardware having too large variation making it hard to optimize. Closer to metal = harder to optimize, so games with either perform faster on AMD or NVIDIA.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 12 2016, 10:50 AM Return to original view \| Post #34
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jul 12 2016, 08:11 AM) I think as far as GTX 1080 is concern for DOOM, does not matter if Async Compute comes through or not, coz already running high fps even at 4K Ultra, unless someone who has that Dell 4K 120Hz monitor, and still want to juice whatover possible. Even for GTX 1070 impact may not be much of a concern. But the upcoming GTX 1060, it will be more important as its more of a direct competition to RX 480, and even AMDs older cards which so far shown significant bump with Vulkan Async Compute implementation. I'd be seriously surprised if enabling Vulkan increases the performance of NVIDIA GPUs further by 20-30%. Scenes that are CPU-limited will see increase. Low end CPUs should see a performance increase too. AMD GPUs are gaining a lot because their driver problem. See, in this DOOM benchmark http://www.guru3d.com/articles_pages/evga_..._review,17.html, FuryX, Fury, R9 390X performing roughly 70+fps only. Their driver is bottlenecking their performance for DX11 and OpenGL. Well, NV still holds the top performance in terms of FP32 TFlops shader performance. But those R9 390, R9 390X will get a tier up (RX480 against GTX980, R9 390X against 980Ti, FuryX performing closer to GTX1080) in DX12 and Vulkan optimized games. I also predict that RX480 OC (1350MHz) will perform 10% faster than GTX1060 OC (2000MHz) in Vulkan/DX12 titles. But GTX1060 has an edge at DX11/OpenGL titles. FuryX 4GB VRAM is a problem though..
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 12 2016, 02:07 PM Return to original view \| Post #35
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jul 12 2016, 01:55 PM) An article from extremetech http://www.extremetech.com/gaming/231527-n...rformance-boost Not only talk about the Vulkan patch, but general response from both camps on shifting to the new APIs. On the Async Compute, here is excerpt from that article "According to AMD, Doom now takes advantage of asynchronous compute and shader intrinsics that allow the game engine to directly access hardware (a video describing how this works is embedded above)." I can only hope Nvidia lack of response is only because they are working on way to overcome this. Actually, NVIDIA responded during GPU Technology Conference during the talk "VULKAN OVERVIEW" by Piers Daniell http://on-demand.gputechconf.com/gtc/2016/...an_Overview.pdf NVIDIA TODO What’s not in our first release? Transfer-only queue to expose copy engine(s) (implemented already on 364.91) Compute-only queue for potential async compute This post has been edited by Demonic Wrath: Jul 12 2016, 02:15 PM
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 12 2016, 03:51 PM Return to original view \| Post #36
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jul 12 2016, 03:16 PM) Piers Daniell also responded about the same thing in 2015. http://on-demand.gputechconf.com/siggraph/...ers-Daniell.pdf So looks like between Jan 2016 when Nvidia responded about Vulkan, and now, Nvidia didnt make much progress with Vulkan Look like the "to-do compute-only queue for potential async compute" still remains a to do list. Im expecting response from Nvidia saying that they are working to get Vulkan improvement like AMD, at least in relation to the recent Doom Vulkan update. Unless Nvidia concedes that they not going to get any. Not really.. they implemented asynchronous transfer queue on driver 364.91 (April 2016).. who knows how far is the compute-only queue from being finished.. but seeing that they're still performing fastest (GTX1080), they might just chill out. They already have this for CUDA (32 hardware work queue), it is not impossible to implement, just needs the driver so the app can recognize it. From Vulkan CapViewer on latest drivers, Before the 364.91 driver, NVIDIA GTX970 has 16 graphics queue only After the driver, NVIDIA GTX970 has 16 graphics queue, 1 transfer queue GTX780 has 16 graphics queue, 1 transfer queue too AMD R9 390 has 1 graphics queue, 4 compute queue, 1 transfer queue AMD Fury or RX480 has 1 graphics queue, 3 compute queue, 2 transfer queue
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 12 2016, 09:23 PM Return to original view \| Post #37
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	http://videocardz.com/62138/nvidia-geforce...rs-guide-leaked GTX1060 reviewer's guide. Top kek.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 13 2016, 11:51 PM Return to original view \| Post #38
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(adilz @ Jul 13 2016, 11:21 PM) Your DirectX photo (comparing DX9, DX11, DX12) is showing the ideal case of usage. In DX11, AMD driver doesn't support Deferred Context, so the GPU gets underutilized because the immediate context doesn't have enough works to be submitted to the GPU. » Click to show Spoiler - click again to hide... « In DX, there is an immediate context on a main thread (e.g. thread0) that will submit work to GPU. Other threads can create work on a deferred context. This deferred context will then be submitted to the immediate context so it can be sent to the GPU. Without this deferred context, the immediate context will have some period where there is no work that can be submitted to the GPU, leaving the GPU idling. This is CPU bottleneck. Read this: https://developer.nvidia.com/sites/default/...redContexts.pdf NVIDIA driver supported it for DX in driver 337.50 (magical driver against Mantle). Same case with AMD's OpenGL driver. See this latest Digital Foundry video (at 4:39). See that CPU frame-time massive decrease for AMD FuryX when going from OpenGL to Vulkan? » Click to show Spoiler - click again to hide... « https://www.youtube.com/watch?v=ZCHmV3c7H1Q In DX11, deferred context support is optional. What happens on DX12/Vulkan is that the work submission is handled by the app now. So, the task goes to the programmer. IHV only needs to have a driver that recognizes those data and queues. Also, Doom Vulkan uses some AMD shader extension (AMD specific) to improve their performance too. NV Vulkan driver doesn't support a lot of shader extensions and also doesn't have compute-only queue. Their Vulkan driver is in a very immature stage.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 14 2016, 12:39 AM Return to original view \| Post #39
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(svfn @ Jul 13 2016, 11:47 PM) yes AMD has less resources to focus on their drivers compared to Nvidia excellent driver team. you can see their performance on dx11/opengl is not as good as Nvidia. for the newer apis the optimization fall more on developers hands. there will also be bad dx12 implementations as well, since there isnt a true dx12 game built from the ground up yet. Pascal will have support for Async on the driver side, so owners dont really have to worry much. for Maxwell i just cant say the same. we also dont know if Async will even be widely used in the future. here's a post about it from Nvidia reddit: In both Maxwell and Pascal, it is the same at SM level. Each SM can work on different workload. It is not separated at GPC level. Changes from Maxwell to Pascal are within the SM and also the Gigathread Engine. I'd guess the SMs now have a buffer to store it's "partial" result when doing context switching between graphics/compute workload. In Maxwell, a context switch can only be made during drawcalls. Imagine: First drawcall: 10 SMs dedicated to graphic load, 6 SMs to compute load. Second drawcall: 14 SMs dedicated to graphic load, 2 SMs to compute load. It will be bad if in first drawcall, the compute load takes longer to finish. The 10 SMs will be idling and go to Hawaii for vacation. In Pascal, a context switch can be made on instruction/pixel level. Imagine: it can draw pixels halfway and then it can switch to compute load, finish it, then resume the drawing. Regardless of GCN or SM architecture, a context switch is required to change from graphics to compute workload. Within a AMD CU (64 cores), it can't do graphic/compute concurrently (i.e. 40 cores on graphic, 24 cores on compute). It can, however, do graphics halfway, pause it, switch to compute load, finish it, then resume the graphic. In this regard, Pascal and GCN works very similarly. In DX12/Vulkan "async compute enabled", without a compute-only queue, NVIDIA GPUs will idle more as there is now fences (fences basically just say "hey, don't submit anymore work until you have the result from the previous work"), this is bad and reduces workload to the GPU. Hence, performance drops. What NVIDIA prefer is to just keep submitting work to the GPU and let their hardware scheduler figure out how to distribute and dispatch the workload (This is also a feature to keep their GPU busy in DX11). TL:DR summary : As long as the codes are not programmed in such a way that it stalls the SM, Maxwell should still perform good. In practical however, graphic workload and compute workload doesn't always end together. There will be some difference on the duration to complete. For example, graphic workload might take 5ms to complete , compute workload might take 8ms to complete . So there will be 3ms idle for some SMs. Total job duration: 8ms. Pascal is the better solution compared to Maxwell. For example, graphic workload takes 5ms to complete. Compute workload not yet complete. The idle SMs can be retasked to complete the compute workload faster, say instead of 8ms, it will only take 6ms. Total job duration: 6ms. It is also more resilient to badly coded codes. In Maxwell, the driver will just crash since it detects something is not right and stalling the SMs.
Card PM	Report Top Like Quote Reply

Demonic Wrath	Jul 14 2016, 07:47 AM Return to original view \| Post #40
My name so cool Senior Member 1,667 posts Joined: Jan 2003 From: The Cool Name Place	QUOTE(svfn @ Jul 14 2016, 03:02 AM) I dont think Nvidia had hardware scheduler since Fermi, it was removed in Kepler, a decision by NV. For Maxwell it will be more stable if you stick to dx11, depending on the title and how good its dx12 implementation is. On NVIDIA GPUs, there's a hardware scheduler. It is the Gigathread Engine. This Gigathread Engine block is equivalent to (graphic processor + ACEs) in AMD GPUs. NVIDIA doesn't show the internals of their Gigathread Engine. In Vulkan, they have exposed 16 graphic queues and 2 copy engine in the latest driver. Drivers only compile the task lists and send to the GPU (host to device). How it distribute the tasks is managed by the hardware scheduler (Work Distributor) on the GPU. This might be clearer with all the diagrams: https://developer.nvidia.com/content/life-t...ogical-pipeline QUOTE(adilz @ Jul 14 2016, 03:44 AM) You got me a bit confused here. When I started reading about Async Compute even before I got my ex GTX 970s earlier this year, all the articles and forums I read said otherwise. I can't list or recall all the articles or forums, but here's one example GTX 1080 Async Compute - Eteknix - my understanding SM/ CU : Stream Multiprocessor/ Compute Unit Yes. AMD CU is roughly equivalent to NVIDIA SMM (The internals are different). Example, R9 390X has 44 CUs. I mean, in a single AMD Compute Unit (CU), the workload is either graphic or compute too. The 64 cores in a single CU will all be doing the same type of workload. It needs to do context switch when changing from graphic or compute too. But different CUs can work on graphic and compute in parallel.
Card PM	Report Top Like Quote Reply

« Next Oldest · Hardware Clubs / Brand Discussions · Next Newest »

5 Pages < 1 2 3 4 > » Top

Topic ClosedOptions

Change to:

0.1097sec

0.67

7 queries

GZIP Disabled
Time is now: 22nd December 2025 - 10:26 PM

All Rights Reserved © 2002- 2025 Vijandren Ramadass (~unite against racism~)

Removal Request

Powered by Invision Power Board © 2025 IPS, Inc.