View Full Version : Vertex Performance of Xenos and RSX+EDGE Comparison
mokmok
05-10-2007, 05:38 AM
I've been reading a couple of threads on both B3D and NeoGAF regarding the Vertex processing capabilities of the 360's Xenos GPU compared to the RSX's combined with the PS Edge Tools.
A few posts on both forums claim the following:
1. The Xenos Vertex Performance is up to 6x greater than the RSX's.
2. The use of the Edge Tools and SPEs brings the Vertex Performance of the PS3 on par with the 360 but prevents the SPEs from being used for Physics, AI etc.
Just looking at the raw specs of the Xenos GPU it does seem to have a Vertex processing advantage.
My questions, firstly are the above claims correct and if so what impact will this have on PS3 games i.e.. No 1080p etc.
Diresu
05-10-2007, 05:40 AM
Ask CPI?
Kabbage
05-10-2007, 05:43 AM
If I'm not mistaken it is because the X360 GPU has the ability to dedicate more shader pipelines to vertex work, or something like that.
And it obviously doesn't have an effect seeing as how there are several 1080p games on the market and more coming. Also, if I'm not mistaken Edge is software, which has no bearing on how much the hardware is ultimately capable of. So adding RSX+Edge doesn't matter when Edge adds nothing to the total amount of vertex processing. Or something like that.
BahnNZ
05-10-2007, 05:52 AM
Yup, clear cut advantage if you take the GPUs as isolated parts in vertex shading for Xenos. Thought everybody knew this...? :)
It's the magic of Unified Shaders...
frosty
05-10-2007, 05:56 AM
Xenos holds the vertex advantage for sure, however it can't hold a candle to RSX's pixel shading capabilities. So, in the end both sort of balance out, save for the fact that Cell can also help, which from what I've seen so far can give PS3 a slight edge visually (Lair is one of the best looking games I've seen on any system, and it runs at more than 2x the resolution 360 outputs. If that's not a graphical advantage nothing is. Also note GT:HD/5 which has the same advantages. more than 2x the res and still better looking than forza 2 or PGR4).
rog27
05-10-2007, 06:09 AM
That statement is just silly because it's not in a real-world context. Theoretically, you can dedicate all the GPU's resources to vertex shading...but that leaves 0 for pixel shading. That's something that never happens in the course of real-world gaming. Unified shaders are good for load balancing. PS3 just does load balancing in a different way...by CELL's SPUs back-culling geometry prior to it being processed by the RSX, thus cutting the number of visible (on-screen) vertices.
mario25
05-10-2007, 06:10 AM
if you dedicate all xenos shaders to vertex processing yes, It has an advantage. But what game dedicates all shaders for vertex processing?
BahnNZ
05-10-2007, 06:16 AM
I've certainly seen 360 do some very impressive tricks I've put down to those Vertex Shaders, like the mall full of zombies in Dead Rising, and wondered if PS3 can do this.
You know how it is, 360,Wii,and PS3 have a list of advantages over each other, they're well known. I've always been of the opinion the PS3s primary advantage is it's Blu Ray, the 360s primary advantage is that fancy GPU.
frosty
05-10-2007, 06:17 AM
well, I've seen an equal number of characters in heavenly sword, if not more.
bigwig
05-10-2007, 06:22 AM
proof should be in the games...now both systems are out
mokmok
05-10-2007, 06:25 AM
I know there is probably a complicated answer to this question but, when would you use a Vertex shader as opposed to a Pixel shader? My understanding is that pixel shaders are mainly used for bump mapping etc. whereas Vertex shader are used to render complex lighting effects etc. - I may be totally wrong though!
cpiasminc
05-10-2007, 06:26 AM
1. The Xenos Vertex Performance is up to 6x greater than the RSX's.
So I've talked about this one a few times before, but it is a flat-out absurdity in the sense that it ignores all other limitations of the respective hardware. So the assumption is that the smallest possible vertex shader is 4 dot products (basically, transform the vertex). And since you've got 8 vertex shader pipes in RSX, and 8/4 = 2 * 500 MHz = 1 billion verts.
On Xenos, you've got 48 ALUs which if you assume are all dedicated to vertex processing (this is actually impossible, but for the sake of theory we'll ignore that), you get 48/4 = 12 * 500 MHz = 6 billion verts.
Sounds that way, but unfortunately, it's completely untrue. The thing is vertices do not get moved in at unlimited speed. You can only move vertex attributes at a fixed number of attributes per clock cycle, and that means in 99% of all *major* render passes, that a single vertex takes more than one clock cycle to get in. So no matter what, it doesn't matter how much you can theoretically process because the data doesn't move through the system fast enough. The real theoretical advantage is still there for Xenos, but it is by no means 6:1. In reality, they both suck pretty bad. RSX simply sucks a little worse.
In the end, Xenos can only set up one triangle per cycle, while RSX can set up 1 every two cycles. It should be noted, though, that because of things like a post-transform cache, if you're smart, you can actually exceed the theoretical limits. And since RSX's post-transform cache is about 8x larger than Xenos', it has more potential for gain. To be fair, though, RSX needs it far more badly than Xenos does. The vertex attribute read rate on RSX is incredibly god-awful, but it's not an insurmountable wall. Xenos simply hits fewer internal limits.
BTW, about the 6 billion verts figure... that kinda ignores a little detail. This may come as a shock to a lot of people, but vertices consist of this thing called DATA. If you take a pretty average-sized vertex, 6 billion vertices per second requires more than double the bandwidth than the entire Xbox360 has... and that's including the totally internal busses which don't actually connect any two separate devices (you know how people like to pretend the the 256 GB/sec on the eDRAM die can be treated like a point-to-point link). You want to move that much data over a main memory bus (which is the real bus of concern for this purpose), that's not going to happen within the next 3 or 4 console generations. Memory architectures simply don't grow that quickly. Currently you can't move 6 billion of even the smallest possible vert (per second) over the main memory bus, and I don't see that happening on Xbox720 or PS4 either.
2. The use of the Edge Tools and SPEs brings the Vertex Performance of the PS3 on par with the 360 but prevents the SPEs from being used for Physics, AI etc.
They're kind of assuming a lot of things because the demos, which were meant for a technical audience, used all the available SPEs in order to demonstrate the concept and showcase techniques that can keep all the SPEs busy. If you actually did it like that in a real game, yeah, you'd certainly tie up all the SPEs for that period of time within the frame. Something that I think nobody outside the industry actually realizes is that the CPU side of rendering does NOT take up a huge portion of the time between frames. Physics, AI, etc. take up much more time than rendering. It's a little hard to see that with the PC as a reference point, of course, because Windows and the API layer robs you of so much.
That aside, the point of Edge is not to fill up all the SPEs. You certainly don't NEED to use more than 1 or 2 SPEs in order to get a huge gain out of it. More importantly, while Edge was specific to graphics, a lot of the same principles can be applied to physics (Havok's tech talks demonstrated that quite handily and nobody talks of Havok precluding the use of Edge) and AI and so on.
Just looking at the raw specs of the Xenos GPU it does seem to have a Vertex processing advantage.
For all you might say about the dynamic allocation of vertex pipes, you end up limited by a lot more external things than anything internal to the GPU. Also, no matter what, on major passes, you're going to end up spending more effort on pixels anyway, and RSX has a moderate advantage over Xenos in that area. All the same, getting a billion triangles per second to the GPU in the first place is basically impossible. It doesn't matter how much power the GPU has to work with them because it can't get to that point. In general, the challenge in getting 100-150 million tris per second moved through the pipe is hard enough whether you're on PS3 or 360, and it's not the GPU itself that's the problem.
I look back at how things looked when the 360 was still a little while shy of release, and back then, the notion of even drawing a scene of up to 750,000 polygons per frame at 30 fps was looking pretty much impossible to almost every developer out there. Nowadays we talk of nearly double that pretty freely. It's certainly not because the GPU suddenly got more powerful or we learned how to dedicate more ALUs to vertex processing. It's because we're doing better on the *CPU* side that we're able to keep that push buffer pushing more often.
My understanding is that pixel shaders are mainly used for bump mapping etc. whereas Vertex shader are used to render complex lighting effects etc.
Vertex shaders are simply for things you would do that operate at the level of a single vertex (transformation, positioning, and more often than not, setting up data for the pixel shaders to use). Pixel shaders are for things you would do at the level of a single pixel (all texturing, all lighting, etc). On hardware prior to programmable shaders, of course, you'd probably do just about everything at the vertex level because that's what you have available to manipulate both at the hardware and software level.
mokmok
05-10-2007, 06:56 AM
CPI thanks for spending the time to walk me through this, if I knew how to +rep you I would do it!
OmniStalgic
05-10-2007, 07:08 AM
Just click on his "bar" icon at the right of his name mate....
This is what some you guys miss about the "good ol days eh?":huh:
Well, nice to see a tech thread pop-up every now and then...
This is what some you guys miss about the "good ol days eh?":huh:
lol. you should've seen us back then. :stirpot:
Segitz
05-10-2007, 10:48 AM
Finally, someone who talks sense into all this stuff :D
But in the end, I couldnt care less, as good games are coming for both consoles in a steady rate (none ATM^^), but come christmas, we are in for a hell of good games!
False_Messiah
05-10-2007, 11:02 AM
+rep cpi, omg that was awesome!!
Do you develop for ps3? What do you think that EDGE will help most in the creative process? What are the biggest difficulties in ps3 development? Like you mentioned with the 360, is the CPU? Sorry for my english :look:
makeitlookreal
05-10-2007, 11:09 AM
Thanks CPI.
+REP
ddaryl
05-10-2007, 01:49 PM
I've certainly seen 360 do some very impressive tricks I've put down to those Vertex Shaders, like the mall full of zombies in Dead Rising, and wondered if PS3 can do this.
You know how it is, 360,Wii,and PS3 have a list of advantages over each other, they're well known. I've always been of the opinion the PS3s primary advantage is it's Blu Ray, the 360s primary advantage is that fancy GPU.
the cell is the true advatage of the PS3... Blu-ray helps, stadard HDD is definitely very much part of the equation and the RSX is no slouch.
BUT the cell is what truly seperates the PS3...
jaxmkii
05-10-2007, 01:54 PM
Xenos holds the vertex advantage for sure, however it can't hold a candle to RSX's pixel shading capabilities. So, in the end both sort of balance out, save for the fact that Cell can also help, which from what I've seen so far can give PS3 a slight edge visually (Lair is one of the best looking games I've seen on any system, and it runs at more than 2x the resolution 360 outputs. If that's not a graphical advantage nothing is. Also note GT:HD/5 which has the same advantages. more than 2x the res and still better looking than forza 2 or PGR4).
and its only running on 2 SPEs and the PPE
BTW... thanx CPI for the nalstogic post from the PSI days.
This thread is asking for a title change and/or lockage.
BruceWayneIII
05-10-2007, 02:29 PM
How about a lock down? Things are settled and the FUD was - once again - disapproved.
Thanks, CPI
Sephiroth_VII
05-10-2007, 02:39 PM
Seeing that this is one of the few new tech threads, I think it should stay. Just my personal opinion, though.
BahnNZ
05-10-2007, 03:12 PM
We got so few tech threads, don't lock it.
*pokes cpi with a stick so he keeps talking*
To me, Cell has all this crazy floating point, but the 360 CPU has three PPE units, so you can't really wave a stick at the CPU and say "This CPU does everything better than that CPU".
But Blu Ray is a clear cut advantage, 7x times the storage available to your game, it's difficult to argue that's not something that clearly seperates the machines.
VonGak
05-10-2007, 03:25 PM
^ Don't read too much into the letter Major PR Nelson sent to IGN. MS spread a lot of falsified info about the CELL SPEs couldn't do integers (people actually believed that FPUs couldn't do Ints), branching (rolling out if loops, branch hints and compares solve most of the issues. Cache flush time is a bigger concern than the clock cycle penalty from a missed branch) and lacked L2 cache (forgetting to tell the about the faster registers and main RAM). MS generally took advantage of people being uninformed.
And the Xenon cores aren't PPEs. :) Also the CELL has a higher transistor budget.
Wow, reading this thread feels like going to the museum.
BahnNZ
05-10-2007, 03:34 PM
No, I try not to take my tech info from fake military people.:)
So is information about the 360 CPU cores out now? They used to be shrouded in mystery. Are these cores indentical to the Cell PPE only with beefer floating point on each core? I think J Allard or somebody said they were similar to the G5 architecture and I remember going *cough* BS *cough*.
I always assume ignoring floating point 360 has the weak puppy PPE processing core Cell has only three of them. About a year ago all the tech articles were based on assumptions, wonder if the waters have unmuddied a bit.
frosty
05-10-2007, 03:42 PM
Man it's nice to have you around CPI.
VonGak
05-10-2007, 04:11 PM
No, I try not to take my tech info from fake military people.:)
So is information about the 360 CPU cores out now? They used to be shrouded in mystery. Are these cores indentical to the Cell PPE only with beefer floating point on each core? I think J Allard or somebody said they were similar to the G5 architecture and I remember going *cough* BS *cough*.
I always assume ignoring floating point 360 has the weak puppy PPE processing core Cell has only three of them. About a year ago all the tech articles were based on assumptions, wonder if the waters have unmuddied a bit.
Nope there's to my knowledge no public info on the Xenon cores other than IBM designed them but it was a different team and according to the devs at B3D neither the CELL PPE nor Xenon cores are like the Mac. G5 chips or each other (minus some core details).
And you are right about there was a lot of assumption back in '05 + '06 where the SPUs only were seen as FPUs and the PPE was supposed to handle all the branch heavy code.
But that was only until more info was released like the PPE's focus will be to divide workload/place date in the SPU Local Store where the compiler and SPU will handle the rest.
In the initial CELL (dd1) presented by STI the PPE wasn't even included in CELL's performance number due to this but to ensure backward compatibility with G5 chips the PPE was enhanced and it should also make it a little easier for devs to make the initial PS3 games (untill there comes better compilers for the SPUs EDIT: like splitting up the code/data in chunks for the SPU LS and solving branch issues).
xbdestroya
05-10-2007, 04:16 PM
This is what some you guys miss about the "good ol days eh?":huh:
Well, describes me at least. ;) But c'mon now, there have been tech threads on this order in the last two years Omni... keep perspective people!
BahnNZ
05-10-2007, 04:28 PM
I think this is the problem, Cell is a multimedia CPU that's used in Blade servers, Toshiba TVs, PS3 and everywhere so almost everything is known about it.
The 360 CPU is for a single application, so it can be shrouded in NDAs and mystery. Frustrating if you're nosey like me.
xbdestroya
05-10-2007, 04:33 PM
I think this is the problem, Cell is a multimedia CPU that's used in Blade servers, Toshiba TVs, PS3 and everywhere so almost everything is known about it.
The 360 CPU is for a single application, so it can be shrouded in NDAs and mystery. Frustrating if you're nosey like me.
The 360 CPU isn't nearly as mysterious as many believe; take the Cell's PPE, add some 'enhanced' VMX, 512KB more cache, and some additional game-centric instructions (such as the vaunted dot product) and there you have it. The mystery surrounding this chip exists more in peoples minds than in reality.
Ars Technica, Anandtech, and Major Nelson all have confused the situation way beyond what need be the case unfortunately. Even worse is that a lot of people aren't in a situation to question the results of such 'luminaries' to begin with, and accept as fact conclusions that are 'confirmed' by other folk they consider to be knowledgeable. And so goes the Internet. The XeCPU does have advantages over the Cell, but those advantages stem from somewhere else entirely than the red herring arguments of "general purpose" computing and "DSPs." Truly, the advantage of the XeCPU doesn't lie in any strength of the architecture, but rather in its ease (relative to learning how to use the SPEs). On top of this, MS has had better tool support up until now. So, it's all relative. As time goes on people will begin to get more out of Cell than they do at present, and when that starts to happen you'll see a tangible effect. The question thus remains how much of Cell will folk be able to tap before the advantages it holds over the XeCPU are made irrelevant by the launch of a new console generation?
The difference will be felt this gen as time goes on, but again, the strength of the XeCPU is the ease (when compared to Cell) of development. Hell, even the word "ease" has to be qualified, as some programmers are well suited to the mindset required to approach it. But they nevertheless represent a minority at the close of 2006. This is why tools like EDGE - or anything that helps developers port to the SPEs in a turn-key/lazy manner - is so crucially important for Sony to work on; because the thinking is that a lot of devs aren't going to pursue an architecture they feel to be daunting when they could do quie well on something more familiar (and in that sense, more robust in the ways it is familiar).
VonGak
05-10-2007, 04:40 PM
I think this is the problem, Cell is a multimedia CPU that's used in Blade servers, Toshiba TVs, PS3 and everywhere so almost everything is known about it.
The 360 CPU is for a single application, so it can be shrouded in NDAs and mystery. Frustrating if you're nosey like me.
Indeed, I have heard that even the public XNA is at a fairly high abstraction level.
As a poor person I do not own a 360, so I can't say if one get to know the different operation penalties when "buying" the XNA tool set.
BahnNZ
05-10-2007, 04:40 PM
So my assumptions are correct. In an integer non game based benchmark in identical conditions, say zipping a large file, the Cell PPE and a 360 CPU core would come out the same.
I remember a report that dot product had been added to Cell, was that real or not?
xbdestroya
05-10-2007, 04:47 PM
So my assumptions are correct. In an integer non game based benchmark in identical conditions, say zipping a large file, the Cell PPE and a 360 CPU core would come out the same.
I remember a report that dot product had been added to Cell, was that real or not?
Reports of dot product being added? It's different things - with the XeCPU it's an explicit addition to its instruction set. With Cell, you can just do it on the SPEs (and do it better I might add). But what's peoples obsession with dot-products anyway? Believe me, smoke and mirrors. neither console will find itself desperately constrained in this regard.
SleazyBig slim
05-10-2007, 04:54 PM
I've been reading a couple of threads on both B3D and NeoGAF regarding the Vertex processing capabilities of the 360's Xenos GPU compared to the RSX's combined with the PS Edge Tools.
A few posts on both forums claim the following:
1. The Xenos Vertex Performance is up to 6x greater than the RSX's.
2. The use of the Edge Tools and SPEs brings the Vertex Performance of the PS3 on par with the 360 but prevents the SPEs from being used for Physics, AI etc.
Just looking at the raw specs of the Xenos GPU it does seem to have a Vertex processing advantage.
My questions, firstly are the above claims correct and if so what impact will this have on PS3 games i.e.. No 1080p etc.
You've been reading post from that Joker guy on the b3d boards then? That guy is a biased developer. According to him basicly the PS3 should not be able to do anything the 360 cant, and no PS3 games should look better then 360 games. I dont see any 360 racers looking as good as MS, or any 360 games with the vast draw distance and detail of lair, the gut is full of it. Just because some one is a developer doesn't mean they are right, thats the problem with this industry. The fans take a developer's word nealry as seriously as the word of GOD. Its affecting the press and the fans.
BahnNZ
05-10-2007, 04:58 PM
Yup xb, makes sense.
So to spell it out, the 360 CPU core is a dual issue 64 bit PowerPC unit just like the Cell PPE.
VonGak
05-10-2007, 05:00 PM
The 360 CPU isn't nearly as mysterious as many believe; take the Cell's PPE, add some 'enhanced' VMX, 512KB more cache, and some additional game-centric instructions (such as the vaunted dot product) and there you have it. The mystery surrounding this chip exists more in peoples minds than in reality.
Ars Technica, Anandtech, and Major Nelson all have confused the situation way beyond what need be the case unfortunately. Even worse is that a lot of people aren't in a situation to question the results of such 'luminaries' to begin with, and accept as fact conclusions that are 'confirmed' by other folk they consider to be knowledgeable. And so goes the Internet. The XeCPU does have advantages over the Cell, but those advantages stem from somewhere else entirely than the red herring arguments of "general purpose" computing and "DSPs." Truly, the advantage of the XeCPU doesn't lie in any strength of the architecture, but rather in its ease (relative to learning how to use the SPEs). On top of this, MS has had better tool support up until now. So, it's all relative. As time goes on people will begin to get more out of Cell than they do at present, and when that starts to happen you'll see a tangible effect. The question thus remains how much of Cell will folk be able to tap before the advantages it holds over the XeCPU are made irrelevant by the launch of a new console generation?
The difference will be felt this gen as time goes on, but again, the strength of the XeCPU is the ease (when compared to Cell) of development. Hell, even the word "ease" has to be qualified, as some programmers are well suited to the mindset required to approach it. But they nevertheless represent a minority at the close of 2006.
I just want to add that the code gaining from Xenon's enhanced VMX registers and instruction set is where CELL uses a SPE (sorry if I sometimes write PPE and SPU instead of SPE :shrug: can't help it) instead.
And the bigger L2 cache is more due to the main RAM latancy than the performance of Xenon.
Oh and it really hurts seeing Ars Technica and Anandtech being way off time after time, shame on them for falling down to the lowest level of Internet journalism.
xbdestroya
05-10-2007, 05:03 PM
Yup xb, makes sense.
So to spell it out, the 360 CPU core is a dual issue 64 bit PowerPC unit just like the Cell PPE.
Uh, well no not "just" like. ;) I mean there are material differences, but in the case where code is being properly offloaded to the SPEs, there are no material advantages to those differences - does that make sense? But they are very very similar - an XeCPU core is essentially like a PPE+, vs some other sort of beast.
Oh and it really hurts seeing Ars Technica and Anandtech being way off time after time, shame on them for falling down to the lowest level of Internet journalism.
It's not "falling to a low level" so much as it simply being the case that their expertise lies more with the traditional desktop paradigm. Since many trust them for all matters computing, in those particular instances it led to inadvertent misinformation.
BahnNZ
05-10-2007, 05:12 PM
Gotcha.
cpiasminc
05-10-2007, 05:15 PM
MS spread a lot of falsified info about the CELL SPEs couldn't do integers (people actually believed that FPUs couldn't do Ints), branching (rolling out if loops, branch hints and compares solve most of the issues. Cache flush time is a bigger concern than the clock cycle penalty from a missed branch) and lacked L2 cache (forgetting to tell the about the faster registers and main RAM).
Beyond just things like calling it a DSP because it has DSP instructions, they kind of acted like it just doesn't have certain hardware because it's not the focal point. I mean, it's not that FPUs do integer work, it's that the SPEs have integer units. There are some oddities to its integer units of course, like a lot of instructions that work on all sizes of integers *except* 32-bit, which really makes for some weird workarounds. They also played up really silly ones like the "no built-in dot product instruction" bit, which is true, but it leaves out the fact that the XeCPU's dot product instruction is ass-slow (and that it's always faster not to use it).
But yeah, cache misses are basically the death of you on any CPU. You can beat the world on everything else, but if your memory accesses cost you thousands of cycles, it pretty much amounts to nothing. In that sense, the fact that 360 has a shared memory pool would never have worked out positively if Xenos didn't put its working framebuffer into eDRAM. The GPU would have ended up hogging all the bandwidth and cache misses on the CPU would have gone from hundreds of cycles to hundreds of thousands of cycles.
MS generally took advantage of people being uninformed.
Well, yeah, but that's what marketing is supposed to do.
So is information about the 360 CPU cores out now? They used to be shrouded in mystery. Are these cores indentical to the Cell PPE only with beefer floating point on each core?
I wouldn't say "beefier" per se. The scalar pipes are spec-wise pretty much the same (though there are a handful of scalar ops the Cell PPE does faster)... VMX-128 has more registers and a bunch of handy-dandy instructions that make life a little easier (though not all of them are practical). Really, the things that make for the bigger part of the performance differences between the XeCPU cores and the Cell PPE are a large collection of little details that are lower level than just number of units and number of registers and special instructions. Instruction issuing, SMT balancing, load/store ordering, what's fully pipelined and what isn't, etc... I can't get into specifics, but that's largely how it is.
That's why it often pays off to use SPEs as much as possible on PS3. I've seen cases where I can take pretty lousy code and move it onto an SPE (assuming the code is compatible with the SPE compiler) and 1 SPE can easily outperform the PPE by a good 20%... without even a change to the code.
You've been reading post from that Joker guy on the b3d boards then? That guy is a biased developer.
Well, he falls into a trap that most all multi-platform developers fell into before the PS3 was available. The biggest weakness the PS3 has over the 360 from a development standpoint is that it came out later. What that means is that you try everything to hammer out all the issues you get on 360 and assume this is kind of the model to follow because there are so many similarities in what you *shouldn't* do on both platforms. And since the 360 is a lot more forgiving than the PS3, you end up writing things a certain way and then the PS3 comes along and you realize "Oh crap, we actually needed to be more stringent than that." And it comes across as if the PS3 is a dog.
Really, it's more like the 360 has a middle ground and the PS3 doesn't. You do something stupid on the PS3, you end up hurting reeeeally bad, but if you do something smart, it pays biiiig dividends. The 360 has more of a spectrum of possibilities.
In our case, we're sort of falling into this because the PS3 codebase is really not very far along right now. But we deliberately staved off of certain things until after the PS3 codebase is up to par (for instance, no multicore-ifying until then). So we save ourselves a little bit of pain in the process, but it does mean that some people are taken offline to work specifically on PS3 for a while.
VonGak
05-10-2007, 05:20 PM
Yup xb, makes sense.
So to spell it out, the 360 CPU core is a dual issue 64 bit PowerPC unit just like the Cell PPE.
Well yes and no.
Dunno if this is a good example but PC CPUs do have the same x86 instruction set but the CPUs in them self are different in how the transistor (building stones of all components) budget; the elements in the PPE/Xenon cores are organized differently and have different execution units.
And they are also quite different from the G5 CPUs because neither of the console CPUs have OOOE logic. Could say it's like comparing Athlon with Intel.
*coughmisleadingthreadtitlecough*
BahnNZ
05-10-2007, 05:25 PM
Damn NDAs, so close but so far. :)
Thanks cpi, this has cleared up a lot for me.
xbdestroya
05-10-2007, 05:27 PM
Believe me, you're not missing out on anything; you know all you need to in order to understand how these two architectures compare to one another.
VonGak
05-10-2007, 05:29 PM
Thanks for the good read CPI :D
EDIT:
"Well, yeah, but that's what marketing is supposed to do."
True, it's like when SONY presents the processing power of the RSX in LFOPs without mentioning how many are programmable or back in the days where the PS2 performance was presented in unrendered polygons.
Maybe the presentation of PS4 will be done like a washing powder commercial.
cpiasminc
05-10-2007, 06:03 PM
Maybe the presentation of PS4 will be done like a washing powder commercial.
Intel are masters of that, and they love to laugh at themselves about it. They actually get a lot of their internal marketing terms from washing powder. It's funny when they speak of how their marketing dept. will go to the architects and say "we need more blue crystals.*"
* for those unaware, "blue crystals" refers to not-so-valuable features that look important. A reference to the "blue crystals" in laundry detergent, which are actually meaningless, but look like they do something special.
wotter
05-10-2007, 06:26 PM
+rep for CPI, nicely written text.
gozirah
05-10-2007, 07:43 PM
What that means is that you try everything to hammer out all the issues you get on 360 and assume this is kind of the model to follow because there are so many similarities in what you *shouldn't* do on both platforms.
Interesting, cpi. So I guess you are saying while there is a large ovelap of don't s there are even more don'ts for the PS3. It sounds like programming for the cell is asking for a specialized methodology so that the do's are a short list? Or is it a large list that's just going to be completely different?
dnpmakkah
05-10-2007, 07:47 PM
wow this thread is moving fast.
cpiasminc
05-10-2007, 07:58 PM
So I guess you are saying while there is a large ovelap of don't s there are even more don'ts for the PS3. It sounds like programming for the cell is asking for a specialized methodology so that the do's are a short list? Or is it a large list that's just going to be completely different?
I wouldn't say that the "do" list for Cell is any shorter or there is that big a difference from XeCPU's. It's just that whereas an item on the "DO" list for XeCPU may look like
"DO blah blah blah, and be careful of blah when you do..."
the same item for Cell might look like
"DO blah blah blah, but watch out for blah, blah, and blah. Remember to account for blah when blah'ing over blah blah."
If that makes sense. There's specialization in the sense that you can give it certain things that would bring a XeCPU to its knees simply because it doesn't have as many execution resources as a whole set of SPEs, but while these are things you might not mind in an HPC environment, you'd probably avoid them like the the plague in a game where CPU time is crucial.
At the same time, there are some "do's" that are specifically more necessary for Cell (but aren't too bad for XeCPU) because you have more computational resources and keeping all of them fed is exactly what makes multicore programming difficult.
It turns out that the job queue model that seems so perfect for Cell works out pretty well on XeCPU, which is kind of counterintuitive at the face of it. However, the peer thread model that seems so obvious for a CPU of 3 identical cores doesn't work too well for Cell. As it so happens, while it works out initially, it is every bit as difficult to get it to scale on XeCPU as it is on Cell. Not so much because of the processor itself, but because of all the cross-context contention and locks and resource wars and the waiting and waiting.
BruceWayneIII
05-10-2007, 08:08 PM
cpiasminc, how do you view the descriptions we have seen lately, that XeCPU and the GPU acts more like two big blocks of functionality - compared to Cell+RSX, that some have called a more 'unified pipeline' as a whole?
Did I communicate this in an understandable way?
woundingchaney
05-10-2007, 08:18 PM
Do we see a split in what one would offload on to each cpu and gpu in comparison to architecture. As to say would one assign different computational requests to different hardware on each platform (will the traditional gpu and cpu split remain on each console)?
With the computational power of the cell how advantageous is it/can it be to move gpu related processes over (example- vertex mesh skinning).
cpiasminc
05-10-2007, 08:20 PM
cpiasminc, how do you view the descriptions we have seen lately, that XeCPU and the GPU acts more like two big blocks of functionality - compared to Cell+RSX, that some have called a more 'unified pipeline' as a whole?
I'd say that describes the normal M.O. when talking exclusively about graphics, anyway. No real restriction one way or another, but that's pretty much how it tends to be done.
Of course, non-graphics related stuff, and the GPUs don't really exist.
Do we see a split in what one would offload on to each cpu and gpu in comparison to architecture. As to say would one assign different computational requests to different hardware on each platform (will the traditional gpu and cpu split remain on each console)?
With the computational power of the cell how advantageous is it/can it be to move gpu related processes over (example- vertex mesh skinning).
In some ways, I think a number of things you do first on PS3 may propagate back to 360 and maybe even the PC. The example you bring up of vertex mesh skinning in software -- it's definitely a big win to do it in software on Cell, but there's a double whammy there in that in addition to simplifying your vertex shaders, you end up sending simpler and smaller vertices and fewer state variables to the GPU for those meshes, so it's a benefit to any platform if you've got the FPU power to handle the job (which even a PC has enough for it to be a win). Additionally, you have the opportunity when doing it in software to include a few more variables into the game and make for a more robust skinning solution.
I can also see, for instance, shadow volume extrusion being another example. You can have a more robust solution when you do it in software as opposed to hardware because you can just store information about more scissor planes and other such details.
It's more like it may happen on Cell first because it forces you to think about these things whereas you might not have otherwise. But ultimately, I think the traditional split between CPU and GPU is already getting blurred on both consoles.
Sounds that way, but unfortunately, it's completely untrue. The thing is vertices do not get moved in at unlimited speed. You can only move vertex attributes at a fixed number of attributes per clock cycle, and that means in 99% of all *major* render passes, that a single vertex takes more than one clock cycle to get in. So no matter what, it doesn't matter how much you can theoretically process because the data doesn't move through the system fast enough. The real theoretical advantage is still there for Xenos, but it is by no means 6:1. In reality, they both suck pretty bad. RSX simply sucks a little worse.
In the end, Xenos can only set up one triangle per cycle, while RSX can set up 1 every two cycles. It should be noted, though, that because of things like a post-transform cache, if you're smart, you can actually exceed the theoretical limits. And since RSX's post-transform cache is about 8x larger than Xenos', it has more potential for gain. To be fair, though, RSX needs it far more badly than Xenos does. The vertex attribute read rate on RSX is incredibly god-awful, but it's not an insurmountable wall. Xenos simply hits fewer internal limits.
Sorry to pick your post apart, but I thought the prevailing wisdom was ATI cache architecture could load more than your bit bandwidth times your gpu cache frequency. If the x1950xt is cpu cycle limited then it would only read verts at 256bit x 670mhz equaling only 21.44 gigabytes a second. If the chip could only read and write once a cycle, you would only use 42.88gb of it's ram bandwidth of 67.2gb/s.
Also we should note that even one triangle a clock is a bit optimistic as a triangle is really 384 bits in size (I know you were thinking in verts, but displaying one vert a cycle doesn't do very much, if you know what I mean!!)
cpiasminc
05-10-2007, 09:32 PM
Sorry to pick your post apart, but I thought the prevailing wisdom was ATI cache architecture could load more than your bit bandwidth times your gpu cache frequency.
It can, and that's pretty much where all the practical vertex processing advantage really comes from. RSX being limited to 1 attribute per cycle (which has already been mentioned in a similar thread on B3D) in normal mode, while Xenos isn't. I didn't really want to get into the actual numbers because that's sort of a dance around NDA limits. But yeah, you can read more than once per cycle on Xenos so that you can get more than one attribute per cycle in the end, and the theoretical advantage of Xenos as well as the need for things like culling verts on the SPEs on the PS3 is almost entirely attributable to that one factor alone.
It's not 6:1, of course, even in a best-case scenario.
Other things you can do besides the software side of things which are typically a win on both platforms (if not a win, it's otherwise harmless) is just a shift towards more shader instructions to acquire data rather than trying to send them all explicitly. Packing basis vectors into a single attribute, or packing multiple UV channels into one or two attributes, using color as a source of parameters for various operations, and so on.
Also we should note that even one triangle a clock is a bit optimistic as a triangle is really 384 bits in size (I know you were thinking in verts, but displaying one vert a cycle doesn't do very much, if you know what I mean!!)
I was referring to a different component, actually -- probably should have clarified that. Specifically, I was referring to the triangle setup rate which is hard limited to n tris per m cycles in both GPUs. Actually being able to transfer or render that much is a different matter entirely. The vertex moving rate and the triangle setup rate are two different things.
CrumCon
05-10-2007, 10:24 PM
I've been reading a couple of threads on both B3D and NeoGAF regarding the Vertex processing capabilities of the 360's Xenos GPU compared to the RSX's combined with the PS Edge Tools.
A few posts on both forums claim the following:
1. The Xenos Vertex Performance is up to 6x greater than the RSX's.
2. The use of the Edge Tools and SPEs brings the Vertex Performance of the PS3 on par with the 360 but prevents the SPEs from being used for Physics, AI etc.
Just looking at the raw specs of the Xenos GPU it does seem to have a Vertex processing advantage.
My questions, firstly are the above claims correct and if so what impact will this have on PS3 games i.e.. No 1080p etc.
you're not making friends for posting such thing in here.
they'll flame your azz
you're not making friends for posting such thing in here.
they'll flame your azzIf you had read the thread, you'd know all those things have been explained and debunked where they weren't true.
Still, the thread title hasn't been changed accordingly.
woundingchaney
05-10-2007, 10:33 PM
If you had read the thread, you'd know all those things have been explained and debunked where they weren't true.
Still, the thread title hasn't been changed accordingly.
hang on Ill change the title to something more appropriate
you're not making friends for posting such thing in here.
they'll flame your azz
The topic has led to a very good and detailed discussion, there is no reason why he or anyone else should be flamed for introducing a subject be it controversial or otherwise. I dont feel as if there was anything vindictive or malicious with his original statement/question.
cpiasminc
05-10-2007, 11:28 PM
The topic has led to a very good and detailed discussion, there is no reason why he or anyone else should be flamed for introducing a subject be it controversial or otherwise. I dont feel as if there was anything vindictive or malicious with his original statement/question.
More importantly, there was the fact that he actually phrased it as a serious question in the first place. He didn't put up some thread title that was punctuated by five exclamation points or start screaming out all sorts of "sky is falling" rhetoric interspersed with all nature of "oh noez!!" and "PS3 claimd to be t3h suxxorz! 360 fanbois ftl!!!" garbage.
We all know certain people aren't capable of avoiding the latter, and it is they who get flamed and have their threads locked.
cliffbo
05-10-2007, 11:56 PM
comes in... sees lots of joined up writing... feels inferior... leaves.
Despite mentioning Edge here, I don't see anyone talking too much about it and the advantages it could offer. I remember reading somewhere in B3D that one SPE could provide enough power to move an additional 0.8 million polygons per second (or per frame?) at 60 fps. Please correct me because I really don't know what I exactly read, only that it was impressive.
cpiasminc
05-11-2007, 01:02 AM
Well, not exactly *additional* triangles because if you're already limited by a slower pipe, you can't exactly get the time to process and move more of them for free. But in the context of already using SPEs to preprocess geometry and not being limited by anything else, an additional SPE means room to process an additional 750k tris.
BTW, the statement was not that you could move 750k tris per frame at 60 fps, but that you could process that many and hopefully cull around 60% of them on average. So while you process 750,000 of them, only around 300,000 are deemed visible, and only those actually make it to the GPU. Certainly, not having to move that many vertices to the GPU in the first place is a huge step forward in eke'ing out some performance when you're otherwise attribute read-rate limited.
venomv
05-11-2007, 02:58 AM
More importantly, there was the fact that he actually phrased it as a serious question in the first place. He didn't put up some thread title that was punctuated by five exclamation points or start screaming out all sorts of "sky is falling" rhetoric interspersed with all nature of "oh noez!!" and "PS3 claimd to be t3h suxxorz! 360 fanbois ftl!!!" garbage.
We all know certain people aren't capable of avoiding the latter, and it is they who get flamed and have their threads locked.
Which is what most people fail to see, it's not what you say, it's how you say it.......
I would coment on your other posst but you kinda lost me in spots, like usual, but I love tech threads anyway.
acousticvan
05-11-2007, 03:52 AM
I'm talking about EDGE tools.
I guess back culling doesn't help a lot in reducing the number of polygons the rsx need to draw. Suppose you have a simple wall with four vertexes and a texture to fill that wall. Now if half of the wall is behind the main character, you can't just cull half of the wall because you need the other two vertexes in order to have the texture to fill in. So I guess back culling only helps the rsx when you have whole objects behind the main character. Is that right? cpi, xbd?
It can, and that's pretty much where all the practical vertex processing advantage really comes from. RSX being limited to 1 attribute per cycle (which has already been mentioned in a similar thread on B3D) in normal mode, while Xenos isn't. I didn't really want to get into the actual numbers because that's sort of a dance around NDA limits. But yeah, you can read more than once per cycle on Xenos so that you can get more than one attribute per cycle in the end, and the theoretical advantage of Xenos as well as the need for things like culling verts on the SPEs on the PS3 is almost entirely attributable to that one factor alone.
It's not 6:1, of course, even in a best-case scenario.
Other things you can do besides the software side of things which are typically a win on both platforms (if not a win, it's otherwise harmless) is just a shift towards more shader instructions to acquire data rather than trying to send them all explicitly. Packing basis vectors into a single attribute, or packing multiple UV channels into one or two attributes, using color as a source of parameters for various operations, and so on.
I was referring to a different component, actually -- probably should have clarified that. Specifically, I was referring to the triangle setup rate which is hard limited to n tris per m cycles in both GPUs. Actually being able to transfer or render that much is a different matter entirely. The vertex moving rate and the triangle setup rate are two different things.
There is definitely a power-struggle, if you will, between size and speed for parts manufactures. If you could only have both bit width and frequency, you would certainly have an advantage.
VonGak
05-11-2007, 05:45 AM
I'm talking about EDGE tools.
I guess back culling doesn't help a lot in reducing the number of polygons the rsx need to draw. Suppose you have a simple wall with four vertexes and a texture to fill that wall. Now if half of the wall is behind the main character, you can't just cull half of the wall because you need the other two vertexes in order to have the texture to fill in. So I guess back culling only helps the rsx when you have whole objects behind the main character. Is that right? cpi, xbd?
G'morning.
To my understanding the EDGE demo from GDC '07 using Getaway assests did culling per polygon, at leasts that's what they seem to be saying in the video-clip.
cpiasminc
05-11-2007, 06:03 AM
I guess back culling doesn't help a lot in reducing the number of polygons the rsx need to draw. Suppose you have a simple wall with four vertexes and a texture to fill that wall. Now if half of the wall is behind the main character, you can't just cull half of the wall because you need the other two vertexes in order to have the texture to fill in. So I guess back culling only helps the rsx when you have whole objects behind the main character. Is that right? cpi, xbd?
Yeah, a big giant polygon like you described can't really be culled away by any method save for the chance that the big polygon is not even within the camera view. Big walls aren't really the ones you have to worry too much about. They're fillrate hogs, but you end up filling a lot of screen real estate for only 4 vertices and 2 triangles.
Conversely, you've got things like the main character model itself, out of which, pretty much every triangle with a normal facing the wrong direction is not visible, so you can pretty much trim away half the tris, and in turn, just a little shy of half the vertices. When a character is some 10,000+ tris (or in our case, we've got a main character pushing 30,000 tris), that's pretty huge.
And it counts for some impact on environmental objects that are complex in shape (terrain, for instance), or just anything that's a closed manifold in general.
The combination of backface culling, occlusion culling, and LOD can reduce the issued polycount by pretty well more than half in a lot of cases. For really small meshes, it doesn't mean too much in the end as opposed to optimizing the meshes such that the vertex to tri ratio is really small, but for anything really dense, and that has really complex shaders on it, it pays off.
If you could only have both bit width and frequency, you would certainly have an advantage.
Yeah, it's often a point that programmers love to bring up that hardware designers don't give a damn what we think when actually architecting the hardware the first time around... In these cases, it's more a matter of fact that the cost restrictions placed on the darn things are just hopelessly inflexible about certain details (specifically those that cause costs to skyrocket very quickly beyond a certain point). If they were softer, Xenos would have at least 12 MB of eDRAM like it *should* have had, and RSX would have had a 256-bit bus like it *should* have had.
Frankly, I would have been able to live with a 2.4 or 2.8 GHz CPU in either machine if only we could have gotten those two extras in the respective platforms.
Segitz
05-11-2007, 12:39 PM
Thanks very much, cpi and the others for this thread.
Finally some people talk sense, instead of lame bashing of "who has the biggest e-penis".
I have some questions still, but I first need to put them into sentences that make any sense (translating from german in "high level langauge" so to say is pretty tiering (sp?)
GTAce
05-11-2007, 12:42 PM
"who has the biggest e-penis".
Yeah this has no sense we all know its VG! :look:
I must spread some rep here in the thread, saw some nice posts here. 8)
Yeah this has no sense we all know its VG! :look::pleased:
Credit to Crossbar! (http://forum.beyond3d.com/showpost.php?p=984076&postcount=234) (come back, man!)
The Playstation Edge libraries are mostly focused on vertex processing in various forms (I have the presentation here) and offloading the GPU by doing a lot of these processes on the SPUs of the Cell. To have that is excellent and it is something 3rd party developers could have developed themselves (and possibly have done) if they had the manpower to do it.
Playstation Edge consist of several things.
GCM Replay (Which has been released) - A tool for capturing GPU output and for testing out various SPU optimizing routines. EXCELLENT tool and something similar is available for the Xbox 360.
Playstation Edge libraries - Various SPU jobs that can do skinning, culling, morphing, blending, compression/decompression for vertex processing. It's main focus is to offload the CPU and the GPU.
Well it's not really the same thing. The GCM Replay tool is much much more useful tool than the Performance Analyzer was on PS1/PS2 (and PS3).
The Perfomance Analyzer basically just showed you lots of bars highlighting what functions were running and how much bandwidth was used on each bus.
The PS3 performance analyzer (And various other tools) was available quite early on. But with a lot of processing having moved over to the GPU a similar and better tool is needed for the GPU.
With the GCM Replay tool you can basically be running your game, wonder how applying one sort of optimization to the scene might affect the scene and without writing a line of code just test it out. You can also do similar things to what the Performance Analyzer does but GPU specific. You can check and debug the shaders, you can examine everything that is done to render a frame. You can step back and forward in the render pipeline. All in the tool. It's an extremely useful tool for getting good performance out of the rendering on the PS3. And it's available now.
Fazares
05-11-2007, 10:29 PM
i love all this tech talk...but at the end of the day....i want a game to look like lbp or the new ratchet and clank...i m just fine with that....
Crossbar
05-11-2007, 10:51 PM
Credit to Crossbar! (http://forum.beyond3d.com/showpost.php?p=984076&postcount=234) (come back, man!)
I'd love to, but I really had to cut down on my posting and the e-mpire was where I spend most time. To much stuff such as; work, house, famliy and I´ve been picking up some sports from the past lately.
I stroll by occasionally to find some golden eggs like this thread and other "exclusives". :happy: Like so many others I guess. I love the stuff you contribute here. The video threads are great, just a suggestion, reserve the first ten posts in the next video thread for content.
CreativeWriter
05-12-2007, 12:49 AM
Could we set up a java script or something that +reps cpi every time he posts? Thanks. ;)
rog27
05-12-2007, 02:04 AM
I'm just happy that STI forced their hand with actually putting CELL into a mass-market product. The long-term benefits of the programming community coming to grips with a massively parallel, assymetrical CPU architecture like that found in CELL today will have great implications in the future when physics dissallows us from gaining substantially in the frequency area because of problems with heat dissipation and power usage. Assymetrical architectures make very efficient use of the transistor budget to extract more power per additional die area used: a magnitude greater than that of their symmetrical counterparts. It sucks to see the programmers stuggle during the paradigm shift, but it should be worth it for everyone in the end.
LiquidEagle
05-12-2007, 02:05 AM
Could we set up a java script or something that +reps cpi every time he posts? Thanks. ;)
I'm sure cpi could set something like that up. :laugh:
VonGak
05-13-2007, 07:22 PM
I'm talking about EDGE tools.
I guess back culling doesn't help a lot in reducing the number of polygons the rsx need to draw. Suppose you have a simple wall with four vertexes and a texture to fill that wall. Now if half of the wall is behind the main character, you can't just cull half of the wall because you need the other two vertexes in order to have the texture to fill in. So I guess back culling only helps the rsx when you have whole objects behind the main character. Is that right? cpi, xbd?
Sorry sorry sorry, I was under the impression that you asked about the culling from the EDGE demo only was done roughly on the perimiter of the objects "So I guess back culling only helps the rsx when you have whole objects behind the main character" hence my answer
To my understanding the EDGE demo from GDC '07 using Getaway assests did culling per polygon.
where ex. on a character model the polygons hidden behind an arm or a leg do not get rendered.
Anyway the big wall you talked about might "hide" other objects. :queer: look at me dancing.
cpiasminc
05-13-2007, 08:10 PM
where ex. on a character model the polygons hidden behind an arm or a leg do not get rendered.
Anyway the big wall you talked about might "hide" other objects. :queer: look at me dancing.
Occlusion culling is definitely a more difficult problem than simple backface culling. About the only really reliable way to occlusion cull is going to involve level designer intervention when constructing visible cells and layout out potentially-visible sets from a given cell, and so on, which in turn ties into streaming requirements and other nasty little things. Trying to do it explicitly is actually a provably unsolvable problem.
A big giant wall may be a comparably trivial case, but out in an open environment, it could still be a hairy case.
Backface culling is nothing. Check the sign of the dot product and that's it.
curryking1
05-13-2007, 08:14 PM
Dot product... lawl :D
Danji
05-14-2007, 05:12 AM
This thread made my night twice. Thanks to everyone who posted in it.
jaxmkii
05-14-2007, 10:43 AM
Occlusion culling is definitely a more difficult problem than simple backface culling. About the only really reliable way to occlusion cull is going to involve level designer intervention when constructing visible cells and layout out potentially-visible sets from a given cell, and so on, which in turn ties into streaming requirements and other nasty little things. Trying to do it explicitly is actually a provably unsolvable problem.
A big giant wall may be a comparably trivial case, but out in an open environment, it could still be a hairy case.
Backface culling is nothing. Check the sign of the dot product and that's it.
... witch brings to mind a secondary problem of ray tracing.
becides the heavy caculation load of ray tracing its self. every triangle and texture needs to be in place for ray tracing in order to figure out the correct reflection angle and color of light in order to apper apropratly.
correct?
antuk15
05-14-2007, 12:47 PM
360's Xenos is always going to have higher vertex peformance then RSX because it has a US architecture. Heck i bet in theory Xenos could have more vertex performance then a X1950 XTX if a developer choose's to use most of its Unified shaders for vertex work. But what most people dont relise is that when developers use most of the unified shaders in Xenos for vertex work they leave LESS for the pixel work. And RSX already out-peforms 360 in the pixel shading department so Xenos cant afford to lose to many of its pipes to vertex work.
cpiasminc
05-14-2007, 06:07 PM
... witch brings to mind a secondary problem of ray tracing.
becides the heavy caculation load of ray tracing its self. every triangle and texture needs to be in place for ray tracing in order to figure out the correct reflection angle and color of light in order to apper apropratly.
correct?
That's correct. It's not what you'd call an "immediate mode" renderer in that respect. Though backface culling of tris wherever applicable is still safe with a raytracer, since those would be cases that would guaranteeably not be visible.
I should add that when I refer to occlusion culling as "unsolvable", it actually means that there does not exist a generic solution other than an exhaustive brute force search, which in the conceptual sense is a problem of infinite scale and granularity. In practice, raycasting/raytracing is that brute force search -- it's just that you do it in a finite subset (i.e. resolution is finite) of all conceivable rays and say "that's enough."
Occlusion using a Z-prepass is normal, though what's often done is testing at the whole object level. It's equivalent to having a raycast done to the first hit and storing off the distances. Using it for triangles is a little fidgety, but it can be done given a little leeway to account for unspeakably horrible lack of precision (which gets worse and worse wrt draw distance).
Heinrich4
05-15-2007, 06:40 PM
Wow incredible posts os cpiasminc! Thanx a lot!
But we have "some chance" to see RC in game(some parts in game,some elements like caracters,guns in first person shooter ,cars etc) in sometime in this generation if guys of Saarcor/opemRT have since 2004 a RPU 90MHz one pipe capable 8 million rays/seg at 512x384?
Segitz
05-15-2007, 10:16 PM
8 million rays is not so much for todays games^^
cpiasminc
05-15-2007, 10:36 PM
He asked that one on B3D as well, but I figure I'll say something similar here. Yeah, the Saarcor kind of stuff has the potential to scale up really well if they can get a specialized RPU going on modern process technologies with modern computational capacities and modern memory architectures. The fact that they get what they get on a few non-pipelined scalar FPUs at 90 MHz on a card with only 250 MB/sec of memory bandwidth is more than a little impressive. But it will be up to determine how good it is when they can build the same thing it on some 500 GFLOPS scale with 100 GB/sec of memory bandwidth over several independent channels.
The IBM raytracing demos pretty much showed that it's almost feasible to do raycasting on a scene in realtime on multiple Cells without running a single other operation (i.e. not at all in a game). And raycasting on a small scale is fine, but a fully software-raytraced game in a real professional title in a non-trivial way will never happen.
hmm.. seems Okay to pass on considering the complexity gpus operate today for shader tasks.
cpiasminc
05-16-2007, 12:30 AM
Computationally, yes, they're probably fine. But there's still a level of complexity that the GPUs can't quite handle since they're really not built to do certain things, and the memory architectures are very much NOT suitable for loads of random accesses -- much to the effect that beyond a certain point, even the most powerful GPU can't really do any better than a software renderer, even on a PC.
For one, putting pixels in the outer loop and geometry in the inner loop is fundamental to raytracing. You can try to just shift everything to pixel shaders, but you will run into the whole memory access problems and the fact that raytracing is not immediate-mode, means that somehow or other, the GPU has to get ahold of every texture in the scene, which may be fine on consoles, but not necessarily so on PC.
Secondly, you've got the problem of recursion and the ability to re-use the results you got out of one pixel and use it in subsequent repeat computations. With current GPUs you can only do this as iterative or multipass, both of which are not good performers by any measure.
Powered by vBulletin® Version 4.1.11 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.