Welcome Guest ( Log In | Register )

Bump Topic Topic Closed RSS Feed
5 Pages < 1 2 3 4 5 >Bottom

Outline · [ Standard ] · Linear+

 NVIDIA GeForce Community V16 (welcum pascal), ALL HAIL NEW PASCAL KING GTX1080 out now

views
     
Demonic Wrath
post Jul 14 2016, 06:19 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(svfn @ Jul 14 2016, 05:29 PM)
DX12 Multi engine capabilties of recent AMD and Nvidia hardware (Kepler, Maxwell v1 (750 Series) and Maxwell v2 (900 Series))
http://ext3h.makegames.de/DX12_Compute.html

Kepler removed the hardware scheduler so there is no hardware scheduler on die. Since Fermi they also had Gigathread engine but that is 1 Gigathread that splits workloads, compared to

The GMU has 32 truly async compute queues, but it is incompatible with DX12 for unknown reasons:
http://www.extremetech.com/extreme/213519-...-we-know-so-far

Demonic Wrath just sharing abit here. i suggest not worry about it too much, because in the end only actual benchmarks / in game FPS that matters.
*
Err... Kepler simplified the hardware scheduler, not removed it... in the hardware, it needs to have a scheduler to keep track on which SM is idle, which SM can be retasked.. and so on. It is not reasonable if this tasks need to go back to CPU due to latency.

From their Kepler whitepaper:
QUOTE
We also looked for opportunities to optimize the power in the SMX warp scheduler logic. For example,
both Kepler and Fermi schedulers contain similar hardware units to handle the scheduling function,
including:
a) Register scoreboarding for long latency operations (texture and load)
b) Inter‐warp scheduling decisions (e.g., pick the best warp to go next among eligible candidates)
c) Thread block level scheduling (e.g., the GigaThread engine)


As far as the Gigathread is concerned, it has 32 hardware managed queues that can support graphics/compute tasks. It seems it can be repurposed using driver.
GTX970: http://vulkan.gpuinfo.org/displayreport.ph...7#queuefamilies
R9 200 series: http://vulkan.gpuinfo.org/displayreport.ph...4#queuefamilies

If you noticed,
GTX970: 16 queues that can support GRAPHIC/COMPUTE/TRANSFER, 1 queue that can support TRANSFER
R9 200 series: 1 queue that can support GRAPHIC/COMPUTE/TRANSFER, 7 queue that can support COMPUTE/TRANSFER, 2 queue that can support TRANSFER

Demonic Wrath
post Jul 14 2016, 06:52 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(svfn @ Jul 14 2016, 06:20 PM)
then how do you explain the small gains on Vulkan api in DOOM with Maxwell 2 (900 series)?

user posted image
*
Few reasons:-
1) NVIDIA Vulkan driver has not exposed compute-only queues. Probably won't make much of a difference.
2) Not just Maxwell 2 showing small gains. Pascal too showing small gains. But a gain is a gain. As you can see in the graph you posted, GTX980Ti showing higher gains than GTX1070. This video shows it too: https://www.youtube.com/watch?v=ZCHmV3c7H1Q
3) NVIDIA GPUs already has almost peak utilization averagely.
4) Some scene showing large gains too on NV hardware. Large gains also in some CPU limited scene.
5) This game has AMD shader intrinsic function (specific to AMD). It is not supported by NVIDIA shader extension in current Vulkan driver.
6) AMD FuryX has 23% more compute performance than GTX1070. At peak, it will perform 23% faster. So, it is performing as it should.
7) As mentioned before, AMD OpenGL driver has high overhead issue. Once this issue is not there, it will perform as it should. If NVIDIA cripple their OpenGL driver, you'd see significant gains too going to Vulkan (do people prefer this?). There's obviously something wrong if GTX970 can perform similar to FuryX in OpenGL mode...If Fury X can outperform GTX1080 in Vulkan, then that means NVIDIA is not performing as good as AMD in Vulkan. But it is not the case here, GTX1080 is still leading.
Demonic Wrath
post Jul 14 2016, 07:12 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(svfn @ Jul 14 2016, 06:20 PM)
m/gwFtejD.png[/img]

if you see the Kepler whitepaper, they mentioned replacing the complex one with simple software scheduler..on the die itself.

from what i understand, the Gigathread Engine is a serial engine compared to the 8 hardware ACE units found on AMD. 8 engines which gives 8 threads/queues = 64 capability instead of a single engine with 1x1x32 threads.
*
What is meant by simpler scheduler is this:
In a hardware/dynamic scheduler, there will be a hardware to re-order instructions at runtime for performance reason. In other words, there's a hardware in the scheduler to optimize the algorithm.

In a compiler/static scheduler, this re-ordering/optimization of instruction is done during code compilation. (As it is possible to predict the instruction anyway.. e.g. an add operation will always go through the same instruction) It is using compiler to create the best optimized order of instruction. Hence, why it is labelled as software scheduler.

You can read more on static vs dynamic scheduler for more info.

But the hardware to perform task allocation is still there. Don't worry biggrin.gif Even a warp scheduler in each SM is also considered as a scheduler. It still need some hardware to decode the instruction and send it to SM to crunch.

The Gigathread is a well kept secret on NVIDIA side. No one knows (except NVIDIA) how it is internally. They don't clearly label it as 32 graphic processors engine or so in any presentation. It is one of the reason why it can crunch DX11 games so efficiently too.


Demonic Wrath
post Jul 15 2016, 08:07 AM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

http://www.pcper.com/reviews/Graphics-Card...ute-Performance

http://www.guru3d.com/articles-pages/futur...k-review,1.html

With GTX1080

FuryX performance is...lacking...it should be able to zip through GTX1070 with its shader perf..

This post has been edited by Demonic Wrath: Jul 15 2016, 08:14 AM
Demonic Wrath
post Jul 15 2016, 10:20 AM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(adilz @ Jul 15 2016, 09:26 AM)
---
*
Hype or rumors (is there even a hype or rumor on this lol?).. doesn't matter.. it's a fact that Furies have higher shader performance.. just saying..

I was also comparing it against its own 28nm brethen R9 390X. It is weird that Fury having much higher shader perf (1.27x higher) doesn't have much higher perf compared to R9 390X.. mainstream cards are performing as expected though.. if I'm not wrong, it should be tessellation bottleneck..

There's also this.. well, AMD cannot push their shader intrinsic stuff into this benchmark as it is AMD specific codes.
QUOTE(Anandtech)
Under the hood, the engine only makes use of FL 11_0 features, which means it can run on video cards as far back as GeForce GTX 680 and Radeon HD 7970. At the same time it doesn't use any of the features from the newer feature levels, so while it ensures a consistent test between all cards, it doesn't push the very newest graphics features such as conservative rasterization.


This post has been edited by Demonic Wrath: Jul 15 2016, 10:23 AM
Demonic Wrath
post Jul 15 2016, 11:23 AM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(adilz @ Jul 15 2016, 10:47 AM)
Its also a fact that my i7-4930K Ivy-Bridge CPU has extra 2 cores/ 4 threads compared to i7-6700K Skylake. But I will be really disappointed if I expect my CPU to beat the Skylake in Firestrike or Time Spy bench just because its higher core count. Don't expect too much from old tech compared to the newer ones.
*
I'm not sure what you're getting at here doh.gif . Higher core counts and higher clocks will benefit in heavily threaded app if the difference in arch is not much.

In heavily threaded app, your i7-4930K will beat i7-6700K Skylake. Skylake is faster because it has higher IPC and clock speed in games (prefer faster core compared to more cores). In Firestrike Phsyics score, 4930K will score higher than 6700K. I'm pretty sure video encoding will be faster on your 4930K too. The keyword here is heavily threaded app. Please don't be disappointed smile.gif

First, going from Ivy Bridge to Skylake, Intel has increased the clock speed and IPC performance. R9 390X and Fury has the same IPC performance. R9 390X and Fury also has similar clock speed. So the difference is core count and raw shader performance (TFlops). Now it is only 5% higher on the Time Spy benchmark. Clearly something is being bottlenecked for Fury series. You wouldn't expect GTX1060 to beat GTX980Ti just because it is new architecture right?

Secondly, graphics are heavily threaded app. So higher cores and higher clocks will beat lower cores and lower clocks with same architecture. R9 390X and Fury has the same GCN architecture. Heck, AMD has increased Fury efficiency in tessellation too. So Fury should have a bigger difference in performance compared to 390X.

Cheers.

This post has been edited by Demonic Wrath: Jul 15 2016, 11:24 AM
Demonic Wrath
post Jul 15 2016, 01:09 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(adilz @ Jul 15 2016, 01:00 PM)
They want to maintain a certain price/ performance tier-ing so that they can milk our money as much as the possibly can.  cry.gif
*
Haha, its ok. biggrin.gif

My next upgrade will need to be at least 4x GTX970 perf. So, just wait. Now GTX1080 is only 2x GTX970 perf. Even GTX1080Ti with 3584 cores is only 1.4x GTX1080, so only 2.8x GTX970.

Now need to upgrade monitor first.
Demonic Wrath
post Jul 15 2016, 04:06 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(Jet23sky @ Jul 15 2016, 03:28 PM)
Then how about my scenario? from GTX670 jump to GTX1070?  biggrin.gif
*
Just right thumbup.gif I think it's almost 4x jump? Last time I jumped from GTX460 to GTX970 lol.

QUOTE(skylinelover @ Jul 15 2016, 03:35 PM)
Haha rugi lorh

Should buy bigger chunk gpu first then only jump 4k laugh.gif rclxms.gif
Haha happens all the time laugh.gif doh.gif
*
Monitor last longer ma.. GPU too fast update..

Waiting for 27" QHD or 3440x1440(?) G-sync HDR 144Mhz IPS. Sure expensive lol.

This post has been edited by Demonic Wrath: Jul 15 2016, 04:08 PM
Demonic Wrath
post Jul 15 2016, 04:36 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(Lurker @ Jul 15 2016, 04:23 PM)
Acer xb271hu

RM 3k
*
I might be wrong but it's not HDR monitor right? But I'd agree it's a very good monitor for gaming.
Demonic Wrath
post Jul 15 2016, 08:39 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(adilz @ Jul 15 2016, 08:20 PM)
Can anyone tell how to disable DX12 Async Compute? Wanna try another run with AC off and see the difference.
*
You need to go to the Custom Run to disable.
Demonic Wrath
post Jul 16 2016, 10:07 AM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(NUR_VER.3 @ Jul 16 2016, 01:30 AM)
Since RX480 are the only latest AMD cards right now but poor mid range performance, I only have NVIDIA to vote for, but seriously hoping that NVIDIA can rectify the issue.
RX480 doesn't have poor mid-range performance... if someday AMD manage to fix their DX11 and OpenGL driver, it will perform like GTX980 consistently. It is also limited by the reference cooler and power delivery. Aftermarket RX480 should be good, but the price might not be.

QUOTE(svfn @ Jul 16 2016, 02:16 AM)
the gains that you see on DOOM for AMD is also from Vulkan api (AMD perf bad in openGL) and not only from Async, Vulkan also utilize dedicated Async shaders which is a feature of GCN.

Pascal has new dynamic load balancer and improved pre-emption. they dont have Async shaders but that isnt necessary for Async Compute. they just used a different approach. from the benchmarks in Time Spy or DOOM Vulkan, you can see that it still yield decent gains when Async is turned on.

if you're interested you can read more here: https://www.reddit.com/r/nvidia/comments/4s...shaders_nvidia/
*
NV drivers already have compute-only queue that supports DX12 for Pascal GPUs (can be seen using GPUview, there's an additional COMPUTE_1 queue). Their Vulkan driver haven't expose it (can be confirmed in the VulkanCapsViewer on queue families).
Demonic Wrath
post Jul 16 2016, 12:17 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(scchan107 @ Jul 16 2016, 11:26 AM)
Interesting read from reddit.

Basically it's the red team that has been underutilised, aka the AE65 with new engine but low rpm meter, vulkan is just changing to the suitable rpm meter, while the green team has been running at their max potential.
*
"Asynchronous Shaders" and "Asynchronous Compute" is the same thing lol. Async Shaders are just an alternative term AMD use for Async Compute. Async Computing term is used even in their "Async Shader" whitepaper. The reason why it is termed as "Async Shader" is because the compute shader can be run async with the graphic pipeline.

There is no such thing as "dedicated async shaders".

Shaders in computer is defined as "computer program that is used to do shading". So having "dedicated async shaders" literally means "dedicated computer program that is used to do shading"..

Unless that term is for dedicated ACEs.

ACEs on AMD are dynamic schedulers that reorder and issue compute workload to the CUs. ACEs don't do the calculation. The calculations are still done on the Compute Units..

ACEs however, can schedule compute task independent of Graphic Processor.

ACEs are used for scheduling compute task, not to perform compute.

Having more ACE ≠ faster compute performance
Scheduling task ≠ Performing computing

The reason why in DX11 AMD is slow is very likely because the AMD driver only talks to the Graphic Processor. So the Graphic Processor couldn't schedule the task fast enough to feed the massive amount of compute units. Imagine having 1 company manager issuing task to 100 employees.

With DX12, the AMD driver is mapped correctly to the GCP + ACEs. Then you'll have more managers issuing task to 100 employees.

--
Maxwell is partitioned in SM, too not GPC. Each SM can do different kernels, independent of other SMs.
Demonic Wrath
post Jul 16 2016, 06:52 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(svfn @ Jul 16 2016, 05:51 PM)
http://www.pcper.com/reviews/Editorial/Hot...hronous-Shaders
you can see more third party theories, but still no actual confirmation from Nvidia about Maxwell:
https://www.reddit.com/r/nvidia/comments/3j...nvidia_cant_do/

i'm no engineer, and people claim different things, it's still best to hear it straight from Nvidia.
*
Yes, and NVIDIA already presented on Async Compute on Maxwell and Pascal. Static Partitioning and Dynamic Partitioning solution for Asynchronous Computing.

As for the "slow" context switch. A context switch takes about 20 microseconds. The issue on Maxwell is not about the context switch time for Async Compute. The issue is when the workload is imbalanced i.e. compute workloads taking a longer time to complete. Those workloads need to be completed first before a context switch can happen. This is potentially hazardous for Maxwell and make it perform slower. This is also why I think NVIDIA won't ever enable async queues on Maxwell.

Thirdly, my post is about the so-called "dedicated async shader" on AMD card. The way those redditors point out is like AMD having this "dedicated async shader" unit that allows compute to be offloaded. This is simply not true about the ACEs. It's almost like saying having a "dedicated PhysX card" that allows PhysX calculation to be offloaded.
Demonic Wrath
post Jul 16 2016, 10:57 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(svfn @ Jul 16 2016, 07:41 PM)
NVIDIA only claim support in their White Papers so far. so you could say that Maxwell does indeed support Async Compute (as in the White Paper), but incapable of using 32 queues for that due to workload imbalance, and it does not bring much performance benefits as they have to rely on slow context switch, usng a single queue for both. that is what Nvidia's document claims by calling it a 'heavyweight switch'.
*
Just FYI, even AMD GCN optimization guidelines also mention:
CODE
GCN Performance Tip 50: Avoid heavy switching between compute and rendering jobs. Jobs of the same type should be done consecutively.
Notes: GCN drivers have to perform surface synchronization tasks when switching between compute and rendering tasks. Heavy back-and-forth switching may therefore increase synchronization overhead and reduce performance.

If the application is doing graphics + compute switch too often for drawing a frame, that means something is wrong with the app already.

But yes, theoretically, if accounting for 20 µs context switching every frame, it will have on average 5-8% idling time (i.e. 5-8% performance loss). Oh wait, that's the perf gain from enabling Async Compute on Pascal lol. biggrin.gif
Demonic Wrath
post Jul 17 2016, 09:57 AM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(svfn @ Jul 16 2016, 11:51 PM)
simple look at the difference in architecture (i quote):
QUOTE
Maxwell 2 (900 series): Queues in Software, work distributor in software (context switching), Asynchronous Warps in hardware, DMA Engines in hardware, CUDA cores in hardware.

GCN: Queues/Work distributor/Asynchronous Compute engines (ACEs/Graphic Command Processor) in hardware, Copy (DMA Engines) in hardware, CUs in hardware.


here's the architecture difference when it comes to command handling:
https://forum.beyond3d.com/threads/dx12-per...36#post-1872750
*
doh.gif Again, the simple look at the difference in arch is misleading.

The Queue and Work Distributor is in hardware. The purpose of Work Distributor is to, as its name indicate, distribute work between SMs. If the SM needs to send its results back to the CPU to redistribute work between SMs, the latency cost would be too high. On a graphic pipeline, the data and instructions will need to go through the Work Distributor and the SMs multiple times. They wouldn't be able to get this kind of performance if they do this.

The driver queue, however, is in software. This is the same for both AMD and NVIDIA. The driver queue will send its data to the hardware queue to process.

Even the simplified look and the beyond3d post (by Ext3h) you linked show different things. Ext3h post indicates the queues, work distributor is in hardware. The simplified look indicates it is in software.

I would suggest you to go through the sources you link first before posting here to avoid misleading people..
Demonic Wrath
post Jul 17 2016, 04:44 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(svfn @ Jul 17 2016, 02:55 PM)
doh.gif dude even tech sites like pcper is for PR only, people from beyond3D are devs that program and test async on GPUs. i'm not saying they are entirely correct, nor that you are to take them as a fact, and anyone should not rely on what people say on the internet lol. did you read in the disclaimer that they also mentioned that it cannot be 100% correct, because all these info is gathered from limited testing/white papers/third party sources only, unless you know more than Nvidia engineers?

i just hope you stop misleading people with your own understanding, and say that Maxwell v2 Async works just the same as ACEs, because apparently it is not the same thing.
*
Fun fact: My post isn't about whether Maxwell can or cannot do Async. That post was whether the unit is in hardware or software.

I didn't in my post mention that Maxwell v2 Async works the same as ACE. My point was that there's a hardware unit on NVIDIA that performs the equivalent task as (GCP + ACEs), it might not have the same capability but its purpose is the same, that is, to schedule and distribute task to the SMs/CUs. This unit is in hardware. You were saying this hardware unit is actually in software.

Even the sources that you posted contradicts between themselves.
Demonic Wrath
post Jul 17 2016, 05:27 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(davidletterboyz @ Jul 17 2016, 05:06 PM)
For GTX970? Needs HSBC/citibank card for 20% discount (RM200 cap). 20CITI
*
Wow rm200 discount.. thats a lot.
Demonic Wrath
post Jul 17 2016, 10:44 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

Seems GTX1080 can get more perf from faster memory overclock.



This post has been edited by Demonic Wrath: Jul 17 2016, 11:38 PM
Demonic Wrath
post Jul 18 2016, 09:27 AM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

QUOTE(sonicstream @ Jul 17 2016, 03:49 PM)
Make a switch from Sapphire 6850 to this ASUS ROG STRIX GTX 1080.

Only complaint that it barely fit into my PC case. This thing is too large and push against sata cable etc. 

Case is CoolerMaster  Elite 430. This is a PC five years ago with a i7-2600K

user posted image
*
Wow that cable routing is really something.. You need to do something on the cable routing...
Demonic Wrath
post Jul 18 2016, 12:25 PM

My name so cool
******
Senior Member
1,667 posts

Joined: Jan 2003
From: The Cool Name Place

https://www.chiphell.com/thread-1618219-1-1.html

Time Spy: GTX1060 - 4100 score only biggrin.gif

5 Pages < 1 2 3 4 5 >Top
Topic ClosedOptions
 

Change to:
| Lo-Fi Version
0.0979sec    0.87    7 queries    GZIP Disabled
Time is now: 22nd December 2025 - 04:17 PM