As gimmicky as it sounds, i've been studying graphics cards for a while now, and have found proof that a higher bit rate (ie 256-bit) and a faster memory interface (ie GDDR5) do not add to graphics card performance. They are just there for marketing benefits to make them look more fancy and making you believe that they perform higher. |
Lies. The advantages of GDDR5 go beyond just a benchmark fps result, for example: DDR3 runs at a higher voltage than GDDR5 (typically 1.25-1.65V versus ~1V)
There are loads of documents on the net about GDDR3 vs GDDR5 I suggest you have a read. |
call me crazy, but the memory on a videocard doesn't interact directly with the cards interface into the computer.
video memory -> GPU -> PCIe interface -> Rest of computer. So why are you making a link between the two? Things like textures and other assets are stored in video memory to be accessed quickly by the GPU. The more memory the more you can store, the faster the memory the quicker the GPU can access it. The GPU then outputs to the screen, not via PCIe to to the computer. |
This is all wrong. It's way too much to bother getting into on a Saturday, but there's nothing gimmicky on late model graphics cards. GDDR5 is needed, and on my 680's I max out the memory interface. It's what limits the performance of my whole system. Soon enough we're going to need to be using a minimum 512bit bus to 7Ghz GDDR5 RAM just to get decent performance onto a 4K panel. Bare minimum. One day in the near future we'll probably have a 1024bit wide bus.
The GTX1000 series will use stacked DRAM to increase bandwidth from the approximate 250GB/s to over 1000GB/s. This is all needed and companies are investing millions upon millions into the tech to move us forward. ---------- I think the part you need to remember is during the loading of a level/map/round in a game, that's when the PCI/e bus is used to transfer texture/geometry information into the graphics memory. You can physically see this by watching software like Precision or Afterburner. The VRAM rises dramatically during a map load. Once the map has loaded, you could be sitting on 2-3GB VRAM of data. From here the CPU is telling the GPU where the camera is, and is triggering physics effects and so forth. But all the beefy work has already been transferred onto the card, and hence all the work is done on the card. The PCIe bus will load more data on the fly (eg: games with seamless levels, etc) but it's certainly not loading 200GB/s work of data onto the cards. This is why maps load in the first place. 1: To get the information from a hard drive into system RAM, and 2: to get all the textures loaded into the graphics card. That's the whole point of all of this. Years ago the CPU did this work with it's own motherboard RAM. Then the GPU came out to offload the work from the CPU. As for how much work a memory interface does per second, - that is a product of how much VRAM is curretly being used vs how many frames per second you want it to display at. Eg: 2GB worth of used VRAM @ 60fps requires ~120GB/s VRAM bandwidth. The 2GB is used by textures. The bandwidth is increased by expanding the width of the memory interface (ie: 256 > 384 bit) and/or raising the memory clock speeds. (Eg 6Ghz GDDR5 to 7Ghz GDDR5). ^ This is also why memory overclocking 'generally' achieves nothing. If you're playing a game at 60fps and using 2GB of VRAM (meaning you're running the bus at 120GB/s) then why would overclocking from 200GB/s to 250GB/s do anything? HOWEVER: If you're running a game at 4GB RAM, then obverclocking from 200GB/s to 250GB/s is the difference between getting below or above 60fps. It's the difference between being able to run V-Sync or not without dorpping to 30fps, etc. These things matter, but they only matter when you understand the complete graphics subsystem architecture. Everyone will say "don't overclock your RAM it does nothing" but that's merely an ignorant blanket statement. The real fact is they don't have a clue for themselves and they are following the sheep. It's totally dependent on the system you have, your monitor, your refresh rate, the game, and the settings in the game. Most people who could really give you an answer to that would need to sit in front of the system themselves to see exactly where the bottleneck is occurring. As a counter question: How come frame rates didn't double when PCI/e 3 came out at twice the speed of PCI/e 2? The answer is because the PCI/e bus isn't a bottleneck as you describe above. Keep learning though - there's too many loud-mouthed cowboys out there in this industry and not enough people who actually have a real understanding. When it comes to building bleeding edge computers for speed - it's handy to know this stuff. ------------ Now.. 10 pack of bundy: Check. - Freshly chopped 50: Check. Drink Red Bull: Check. Time for D3. |
It's way too much to bother getting into on a Saturday, Geez I'm glad you didn't get into it then :-) |
You put up a good argument, but the memory interface is a minuscule of a card's performance. Your 680s that you speak of a capped at 16GB/s each if they are running at 2.0 x16 or 3.0 x8 in SLI or if your motherboard is top of the line, 3.0 x16, which is a maximum of 32GB/s per card. Stock 680s have very high TMUs and ROPs as well as a Pixel Rate o 32.2 GPixel/s, a Texture Rate of 129 GTexel/s and the Floating-point performance is 3,090 GFLOPS. It's not just your memory interface that can hold up to 32GB/s on a single card, but your combination of top of the line specs are what matters the most when you look at the Texture rate, the Pixel rate and the floating-point performance numbers as they don't have any restrictions or logical limitations, which is why people overclock cards as they realise that a card has more potential then what the vendors have put in a box for you.
Another thing that professional gamers and tech heads have worked out over time through bench marking cards together is that cards have a logical and theorhetical transfer rates. For example a GTX 780 might have a theorhetical transfer rate of 330GB/s, but in reality and the bottom lines is that it's still capped at the PCIe 3.0 limitation of 32GB/s. If cards were to have a transfer rate of 1000 GB/s, It would be decades away, all cards have a logical bandwidth transfer rate. It doesn't matter if it's a s***** GT 640 or a high end GTX titan Z black edition ultra, both will still be capped at 32 GB/s as they are both PCIe v3.0 cards. As for 4K, an itty bitty GT 520 can support youtube video playback at 4K resolutions and has a logical transfer rate of 14.4GB/s, nearly two under it's ful given potential |
I think it's better if I ask this way:
What do you think is being transferred through the PCIe bus which is causing a bottleneck in performance? |
I think it's better if I ask this way: Give up Ph33x, he doesn't get it. |
It seems he is saying that data is loaded from the HDD to CPU and constantly being fed into the GPU at PCIe 2.0 or 3.0 rate, and therefor is capping GDDR5 speeds. Which is wrong as you said.
|
Still capped at the PCI 3.0 limitation of 32GB/sec???
Man you just don't get it. Even a Titan doesn't exceed the limitations of PCIe 2.0, let alone 3.0. Just forget about it man. |
It's ok. There was a guy recently saying if you lower your graphics settings, you won't lag as much because your internet doesn't have to stream the higher quality graphics.
Besides, who wants to play at high frames? The game runs way too fast and everything is hard to control. |
To be fair, that does sometimes happen with those old skool games. |
The truth might shock me except you posted a load of horses***.
http://strategicedge.ca/backend/wp-content/uploads/2012/05/Apu-Thank-you-come-again.png |
It seems he is saying that data is loaded from the HDD to CPU and constantly being fed into the GPU at PCIe 2.0 or 3.0 rate, and therefor is capping GDDR5 speeds. Which is wrong as you said. Who is? Lol a 32GB/s cap on PCIe v3.0 has nothing to do with the CPU or the Hard Drive, just purely a feature of the PCIe bus on the motherboard But i understand that ph33x GDDR5 and a high bit rate contribute to more than the bus bandwidth, but the truth still stands that it has a bandwidth limitation |
Who is? Lol a 32GB/s cap on PCIe v3.0 has nothing to do with the CPU or the Hard Drive, just purely a feature of the PCIe bus on the motherboard PCIe has plenty to do with the CPU as the PCIe controller is on the CPU. Eg: The 3930K was rated at PCIe 2.0 speeds (5GT/s) while the 4930K which fits in the same socket on the same motherboard runs at PCIe 3.0 speeds (8GT/s). In the case of PCIe slots, the 'bus' on the motherboard is just copper tracks linking the CPU to the slots and nothing more nowadays. The controller itself is built on the CPU. As for transferring data between cards in an SLI configuration - this is achieved through the use of an SLI bridge. If you don't use an SLI bridge with high performance cards, THIS is when the bus will limit speeds, and quite severely. However running the system with the correct hardware completely alleviates this limitation. - All of these things should signal to you that graphics cards don't need to wait for the bus in a properly built and balanced system. ----------- The library example is always handy for things like this. Think of the PCIe bus as the isle running through a library. The bookshelves are your hard drives. The books are your texture files. You are person number 1, the CPU. You have 4 helpers today, your GPUs. Now, you need to research something, and you already know what books to find the information in. Let's say you are getting the data from Wiki citations. So you get your 4 workers to grab the 20 books you need today. They walk up and down the isle (PCIe bus) picking up all the books (texture data) you've listed for them. The isle (PCIe bus) at this stage is full of workers (max bandwidth), you can't fit any more in. So anyways, they are now transferring the data from the shelf (hard drive) down the isle (PCI bus) to a table (graphics card memory). Finally, the table (graphics card memory) has all 20 books (texture data) sitting on it. Now your 4 workers (GPU's) are at one end of the isle (bus) waiting for your next instruction. You are at the other end of the isle, so occasionally you walk down the isle to them, giving them a list of instructions to perform. "Go to book 5, chapter 7, section 4, look for data and perform this task with it." So your workers at all at a table, the books (texture data) have already been taken down the isle (bus) to the same table (graphics card memory) so your workers (GPUs) can now perform the task faster - without having to walk up and down the isle (bus) to get more information on the fly. From here it's simply you walking up and down (light traffic in the bus) giving your workers more instructions. As for adding SLI to this example. Simply put your 4 workers on separate tables. The same work needs to be on all 4 tables for them all to work really fast together. So lets assume your first worker is photocopying pages from the book and now he needs to walk up and down the isle (bus) passing this information around to the 3 other workers. This is inefficient because you use up space in the isle (bus) and it's not designed to have people racing up and down at 20 times the speed of a normal person. The isle (bus) will slow the process down in this instance. - So what we do is build a bridge that connects the workers tables together. Now the worker can photocopy, and use this 'run as fast as you want bridge' to get the copies to the other workers. Since he's using the 'bridge' - he's not in your isles, islin ur islin'. This means the isle (PCIe bus) is now free for you to walk up and down giving instructions as per before. :) Now the question is: Would making that isle wider help you get your instructions to your workers any faster? I mean, the isle is empty, it's got room for 4 people to walk past each other, but you are only one person. Light traffic. Another question: Would making that isle wider make your workers perform the task any faster since everything they need is now already on the table? Sure, it may have made it faster to GET the information in the first place (textures, etc) but once the work is on the work table (graphics card memory) then the width of the isle (PCIe bus) has no bearing on how quickly they can perform their task. ----------- So basically, once you can explain exactly how (and why) the PCIe bus limits the overall system performance in a game, including what data is being passed through at the time (preferably with citations) and you can also clearly answer the question at the end of my first post - I'll stick with what is correct for now.. Edit: I'm doing this to try and educate you as to the way data is passed along the bus. I'm happy to entertain the idea that you actually want to learn about this stuff. I'm going on a whim here. I await your answer while I have a few cones and a shower. |
But i understand that ph33x GDDR5 and a high bit rate contribute to more than the bus bandwidth, but the truth still stands that it has a bandwidth limitation Of course there is a bandwidth limitation as its a finite resource and always will be. Video memory actually exists to get around this limitation so textures don't need to be constantly loaded across the bus during general gameplay. |
misrepresentation and misinformation - thread removed imo
|
misrepresentation and misinformation - thread removed imo this |
^^No way. I wish political threads wend down with this much decorum. Top posts are the most healthy debate I've seen on qgl in years
|
|
That post of mine gets a D- for failed tags.
|
Hay that Library analogy is actually pretty good, it needs to address the OP's concern that GDDR5 is useless over slower GPU memory.
|
Hay that Library analogy is actually pretty good, it needs to address the OP's concern that GDDR5 is useless over slower GPU memory. Churs. That's only because he is operating on a whim that the GPU memory bandwidth is ultimately limited by the bandwidth of the PCIe bus. As soon as he lets that slide, the sooner he can realise that GDDR5 > GDDR3. The same way he already knows that PCIe 3 > PCIe 2. |
F*** it i quit Good idea, you were never cut out for the gimmicky field of GPU study. |
What an ignorant, assholish way to be. Well, I hope my waste of time helped someone who was keen in this stuff.
|
I had a feeling he posted what he did in an attempt to build himself as some sort of technical authority. So of course when confronted with obliviously superior knowledge he just folded. That's my armchair analysis anyway.
|
Oh, but my chair as no arm rests. :/
|
misrepresentation and misinformation - thread removed imo Nah, maybe the thread just needs a title change from one BuzzFeed style headline to another: Gamer Posts Misinformation about Graphics Memory - You'll Never Believe What Happens Next! Top 10 Ways OP Is Wrong About GDDR5 - ph33x's Reply Will Break His Heart Understanding PC Architecture Before Reviewing Graphics Cards - You Need To See This! ..etc |
What an ignorant, assholish way to be. Well, I hope my waste of time helped someone who was keen in this stuff. I found it helpful! Though it made me wonder what the cpu and system RAM is doing while the game is running for the PCIe bandwidth to not be a factor... is the data loaded into the gfx memory mirrored in the system memory so that the cpu does all the complex calculations and then shows the gpu how to do it? or is there some sort of large-scale processing or compression of data into the form of a simple instruction that's then passed from cpu to gpu?... or is that the same thing... The mind boggles! |
I found it helpful! The only mirroring of data is between the VRAM of videocards in a multiple card setup. This is because each card renders a new frame one after the other in a striped configuration. Just like RAID0 for hard drives. With 3 cards at 60fps, each card individually produces only 20fps worth of visuals over that time, but ultimately they are working with the same pool of data. This is also why running 2x 4GB cards in SLI doesn't mean you have 8GB VRAM total since the data is mirrored. (Again, like RAID1, you could boast your computer has "10TB worth of space" but if it's in RAID1, then you have only 5TB worth of usable space, making your peen ever so much smaller.) On a properly optomised game, the CPU generally deals with things such as input/output with peripherals, sound (to an extent), basic physics, AI, networking (MP), etc - and passes relevant information to the videocard using commands, not so much visual data like textures. It does in some instances assist the graphics card, such as terrain rendering described below: A typical terrain rendering application consists of a terrain database, a central processing unit (CPU), a dedicated graphics processing unit (GPU), and a display. A software application is configured to start at initial location in the world space. The output of the application is screen space representation of the real world on a display. The software application uses the CPU to identify and load terrain data corresponding to initial location from the terrain database, then applies the required transformations to build a mesh of points that can be rendered by the GPU, which completes geometrical transformations, creating screen space objects (such as polygons) that create a picture closely resembling the location of the real world. The CPU tells the GPU where you (the camera) are. It's telling the GPU that 3 of your mates are walking past at xyz coordinates, traveling at x speed, in x direction. The card basically says "Yep" and produces it on the screen. That's the type of data typically being transferred 'while' gaming. All of the major data (data which doesn't change during that map/round/session) was copied during a map load. --------- RAM will contain things like cached sounds, player positional information, and yes, it can also save data that doesn't need to be on the card all the time. This lines up with developers who have been talking about how they want more control over memory - where the game stores its information etc - so they can make the game more efficient, using less VRAM but still maintaining high performance. |
It is the textures that get pushed into the GPU.
Other game assets like maps, actors, voice, all that stuff goes into System memory. Pretty sure texture memory probably goes into system memory first, then when the level/game is loaded it gets pushed into GPU memory. When streaming data, textures would be sent over the PCIe bus as needed which shouldn't require all the bandwidth. A poorly coded game engine might make it happen though. The texture memory in the GPU is also compressed and the decompression gets done on the GPU. |
I found your information ph33x! You always have quality posts from a technical standpoint.
I have nothing really to add except that when referring to PCIe bandwidth, make sure you account for the 20% overhead for 8b/10b encoding for PCIe 1.x/2.x. PCIe 3.x does away with this overhead with 128/130b encoding. Here is a wild thought though: As time goes on, enthusiast GPUs will become less and less common as people and more content with low power portable devices, driving up the price for niche enthusiast GPU equipment. In the end, most of the processing for AAA titles will happen in DCs and streamed to your client device. I think nVidia's GRID/Citrix XenDesktop/XenApp with vGPU or VMware's PCoIP with vSGA shows that this is possible at least in PoC. As internet infrastructure get's better it'll just be more and more attractive to use a subscription based service as opposed to outlaying a thousand for a nice video card. I'm probably way off the mark though. Oh yes: OnLive might have been too early to market / tech wasn't mature enough. Gaikai/Sony's cloud gaming stuff shows promise I guess. |
so he got one thing right with this thread at least;
the truth shocked him |
The Data-center streaming of display data has to;
Receive input data from the player Calculater everything Render it Send it Dummy computer display it All within about 17ms, to give 60fps. So I can't see DC streamed games useful for the hardcore gamer. |
The server itself can probably do what you describe above in less than 4ms, but the idea is not to have the graphics card running 100% for just one user. Limit it to 17ms so it acts like Vsync. Then you can have more users on the one server.
The thing it doesn't take into account is the network lag (if they are, they're most certainly rating that on a local network). It's like screen manufacturers who claim their screen is 1ms GTG. This may be true, but the measurement they don't just throw on a spec sheet is the input lag, which on some screens is up in the 40-50ms range. "Game" presets in screens are typically there to lower the input lag as much as possible. On my U2410 it's about a 30ms monitor (poor) but in game mode it's somewhere around 8ms (much better, but not the best). This type of service will take years to become mainstream. Enthusiasts know it's not as good as dedicated hardware. Non-enthusiasts don't know what it is, or what it does, or how it works with their xbox one. If Steam boxes don't take off, it's an indication that this mightn't take off either. The hard part is convincing someone it's better then what they have without confusing them. Also, we get s***** about games becoming unplayable with network lag, just wait till the entire game is rendered over the network. I could imagine on day one there would be so many pissed off people. It's going to be interesting watching what happens over the next few years. The tech roadmap for now still has plenty of awesome s*** in the pipeline for enthusiasts. For those not familiar with GRID, it's a service offered by nVidia where the rendering is done at their end and sent to you over networks. There are certain advantages beyond gaming, such as using the service to do your rendering/compute. You have a supercomputer in your laptop. Here's an image below of one. Dual Xeon CPU's. 256GB (or so, from memory) of RAM, and 16 GPU's that are even better than a Titan. Many users would share this resource. http://www.pto.hu/wp-content/uploads/2013/03/nvidia-grid-vca.jpg |
That is why I listed in my points 'send it'.
You need a latency of what, 5ms on average for it to be worthwhile. Assuming the datacentre renders 10ms. Even then, you risk input lag having a negative effect. |
Yeah I figured you were including the network latency as you had "Dummy computer display it" in your list. ;)
Then you have the issue with image compression. I can see the difference between FXAA/MSAA, etc, let alone the artefacts I see using tech such as ShadowPlay. I don't think it'll kill the videocard industry though. Things like this usually proliferate new tech. Eg: The Titan/780 as we know it today may not have even existed if it wasn't for the Titan supercomputer. Poor yields were plaguing that project, and the excess of chips they had with dead SMX's meant they had something good for the domestic market at the time. Now that the GK110 is on its last legs (and the manu process has been refined to give much higher yields), they couldn't care less what card they throw it in - which is why the full 2880 core 780Ti was the last chip to come out, way later than the same chip being seen on the Titan - and the original 780. The chip itself is that old that I remember back in the day everyone thought it was going to appear on the GTX 680. GTX 580: GF110 - Full chip. (Fermi) GTX 680: GK104 - Half chip. (Kepler) GTX 780: GK110 - Full chip. (Kepler) GTX 880: GM104 - Half chip. (Maxwell) (Rumour) GTX 980: GM110 - Full chip. (Maxwell) (My guess) |
I think steam streaming uses h.264.
There is no way cloud based gaming will be useful for medium->hardcore gamers in within the next five years, not unless they live in the worlds fastest internet locations, even then.. |
Yeah, H.264 straight off Keplers hardware H.264 encoder. The same encoder is used locally for ShadowPlay and storing the files on the HDD. If you can see artefacts using this, then you can't expect much better through GRID. Mind you, the other day I tried some software called SplashTop (low latency remote desktop) and I was comfortably able to play D3 on my tablet.
Ok, it wasn't comfortable lol, but it did work rather smoothly. |
What is everyones experience on the Phi as opposed to GPU centric compute I've never had the opportunity to play with one of these. At the same time though, I have nothing to compute. IIRC these were used in Chinas latest super computer? Also, IIRC, these are integer based units and not FP? Apparently that supercomputer they built is great at brute forcing encryption. ;) |
rofl |
Top thread, I remember this one well...
|