This entire post is under the assumption that the PS3 platform is not going to be an island unto itself. I'm assuming that the PS3 is being designed to do distributed computing, and that machines will collaborate with each other to compute sub problems individually and share answers after the fact.
1) The cost of communicating the solution must be less than the cost of computing the solution.
Corollary: The cost of scheduduling, distributing the problem and receiving the results back must be less than the time to compute the answer.
There's no point in asking someone else what 1+1 is when it takes less time to figure it out than to ask the question. Are there any kind of problems that are easy to describe, require lots of computation and cost little to send the results around?
2) What happens when an ISP hiccups and a block of machines become unreachable, the content of their computation becomes unreachable?
3) What about latency between machines? For a real time game, Machine X is going to have to get data from Machine Y at least 30 times a second.
4) Am I in control of what code is being run on my machine? If I'm playing Game Y will my machine be calculating results for Game Z? Can I donate my down time to a particular game?
5) Dynamically adding and losing processors that are not localized for real-time computation and gaming, has anyone done this before? Are there any working examples, ones where the games noticeably improve with the addition of client machines?
You are correct sir. And let me commend you for posting such an interesting question. I was tempted to answer this last night, before I went to bed, when I recieved notice of your post. I wound up streaming all of these answers through my head and I wanted desperately to responed, alas, I had to goto sleep for I am working drone. Basically I spent this last paragph exlaining just how excited I am to answer you're question, GOD I NEED HELPOriginally Posted by AlgebraicRing
This is a very good point and I thank for bringing it up because all I explained in the Cell article was that there were metrics that governed how and where Cell packets were sent.Originally Posted by AlgebraicRing
To explain in more detail i will provide data metric formulas for governing what data gets sent where. As the actual data and code of the packet isn't to important for understand this point rather the metric formula.
First let me point out that Cell packets would not be transmitted to other Cells unless the transmitting Cell is already reached maximum resource capacity or the Cell packet contains more computational resources than the machine has.
Next lets present some scenarios as to why Cell packet transmission might occur:
In a Client / Application Server (Say for example: SOCOM server interface) Cell relationship the bulk of the static code (basic environments, permanent objects, etc.) would be processed on the server and the end results be transmitted to the Client systems GPU for processing, then output to VRAM for display.
Ok so say we have a populous of networked Cells (WAN Scale). We have a Cell packet generated by a transmitting Cell that is greater than ( < ) 1sec in processing time for said transmitting Cell. So to help understand this further and to easily input this into an equation- let’s assign it a Resource Unit.
This "Resource Unit" metric will map directly to the number of PE's (processing elements) (In a “preferred” embodiment (as the Patent outlined it), a PE comprises eight APUs) divided by the number of seconds for said Cell packet (remember that Cell packets contain data and instructions for processing). So this transmitting Cell (a PS3) contains 6 PEs. So 6/1sec. = (6 RU). Now this information is appended to the Cell Packet before it is processed by the Routing logic and becomes part of Cell packet.
Example Cell Packet (BTW we are assuming Ethernet network and transmission via TCP/IP) I’m not going to replicate the TCP/IP or ETHERNET encapsulation for now, to save time:
--- header --- -----RU------ -----payload----- ---CRC--
[destination / source ] [000110 (6)] [data & instructions] [checksum]
Once this packet hits the Routing logic it will make decision on where to forward this packet based on these metrics:
- Delay (hop count) in MS (milliseconds)
- Member Cells PE load (RU)
- Number of PEs (or Cell Class, eg 8APUs = A class node)
So say for the sake of example we have a member Cell in California, New York, Nevada, Colorado, and Florida. While the transmitting Cell resides in California.
Now let’s assign an RU & Delay Metric to each member Cell:
California: RU load = .3 Delay = 73ms PEs = 6
New York: RU load = .8 Delay = 179ms PEs = 16
Nevada: RU load = 1 Delay = 89ms PEs = 6
Colorado: RU load = 2 Delay = 97ms PEs = 6
Florida: RU load = 10 Delay = 164ms PEs = 1
So in order to understand how to find the best metric, we have to resolve the RU load floating point (decimal). So to resolve we simply find for x; (x being the milliseconds. of processing / transmission time of said softcell).
California: RU load = 20,000ms. + Delay = 73ms product = 20,073
New York: RU load = 20,000ms. + Delay = 179ms product = 20,179
Nevada: RU load = 6,000ms. + Delay = 89ms product = 6,089
Colorado: RU load = 3,000ms. + Delay = 97ms product = 3,097
Florida: RU load = 100ms. + Delay = 164ms product = 264
So this shows us that while the server in New York has the most processing power (currently available); right now it’s CPU and delay creates a prohibitive metric. Conversely the Florida Metric is much better; only producing a cost of 264ms. However this can be deceiving as this task will take 6,000ms longer on the Florida system due to its single PE configuration. So the actual best metric would in turn be the Colorado system which offers 3,097ms for current processes and another 1,000ms to process this request process from the transmitting member Cell; resulting in roughly a total 4,097ms to execute this process and receive the results.
Said process would have otherwise been held in queue until the transmitting Cell completed it's full cycle of processing, which would have been longer than 4,097ms in order for the transmitting Cell to off-load the process onto the Cell network.
I certainly hope I’ve shed some light onto this question. I will explain how QOS can play a roll in certain softcells later on, for now; let me get to the other questions...
This would then cause a check response to be sent to the unreachble Cell, in turn would cause re-transmission of the Cell packet. So yes, the data would be lost. Although this should be rare.Originally Posted by AlgebraicRing
I explained this above.Originally Posted by AlgebraicRing
Most likely not in the intial phases. Probably leaving your system on and connected will make you a part of the collective.Originally Posted by AlgebraicRing
No, it has never been done before, which is why its so interesting. The only examples that come close are of course distributed computing examples, such as; SETI@HOME, UD, etc. However those are simply data processing an NOT realtime instruction and data execution, like the Cell architecture. This is quite a different animal, with an amazingly radical new architecture. I will be back later on to provie more info.Originally Posted by AlgebraicRing
-Long Dead-'Editor-in-Chief' of PSINext.com
Scott,
Thank you for your explanation, I'm beginning to see the metric in my head now and can understand how the load balancing will work. But there's something bothering me fundementally about this.
One advantage of distributed computing like this is that we can reduce the number of redundant parallel computations of the same problem. Machine A doesn't have to calculate the result of problem X if Machine B has already or is currently calculating the result. The hitch is that machine A and machine B have to be working on the exact same problem. If the problem is computing an objective fact of the world, then the result is sharable. If the problem is subjective and relative in nature then the solution to the problem is not usable by anyone else.
The biggest hog of processing power is rendering graphics. But rarely, if ever, will two users be having the same view of the world. Because their perspectives on the world are different their machines cannot collaborate on rendering the different perspectives.
All that can be collaborated on are objective facts: physics modelling, AI algorithms, and anything that isn't going to change from different perspectives within the world.
I am of course assuming that there are no dormant processors lying about that could help in the rendering of one perspective.
I'm worried about people getting carried away with the hype. I really don't think that people will be seeing that great of an improvement in their games without getting higher scores on the $$$ metric. You're going to have to invest in your own processor farm to get better performance.
Here are my assumptions:
1) Rendering graphics is where most of the processing time is going to be spent. Lets say X = 3/4 of the power is put towards graphics, the subjective perspective, and Y = 1/4 is put towards calculating objective facts.
2) Each person is going to have their own individual perspective in the world.
3) Rendering of one perspective lends No Help to the rendering of another perspective.
4) If I'm not playing my PS3 then I've turned it off so its not sucking down electricity and making noise.
5) 2-4 imply that the number of subjective perspectives is equal to the number of available PS3s for processing power.
6) Given enough machines, the cost of calculating objective facts converges to 0.
Doing the math, the performance increase in a distributed environment is (1/4)/(3/4) = 33% more performance than a standalone machine. The increase in performance is the ratio of your Objective Load Y to your Subjective Load X. My bet is that there will be a high Subjective Load and low Objective Loads.
The hype being tossed around is that the PS3 will provide me a Subjective Perspective that is light years ahead of the competition. But that is just not going to be true unless I shell out money to create a processor farm of Idle PS3s or other compatible cell processors. All that I can reasonably expect to get for 300 bucks is a machine that spends less time computing AI and physics and more time pumping graphics. I can't expect to magically have a processor farm of distributed PS3s all working towards giving me a real-time Shrek quality graphics. The leap in graphical technology will be on par with other machines of the time.
Very true. Isn't this the fundimental flaw in computing? Too many calculations and not enough resuts stored. Then again, its faster to recalculate most of the time than it is to house large data stores of results. However this architecture provides a unique ability utilize crossbar results amongst member cells. While Sony didn't go into detail how it optimize multi-cell results, we can however take an educated guess as to how it can be done.Originally Posted by AlgebraicRing
Let us consider two scenarios.
First say you have multi-client to server cell relatinship. Say for example a team of SOCOM players (8 players). Now the server could process the base environment; grid points, even specific polygons; trees buildings, and even provide the results (on demand) for physics processing (a grenade exploading) to the requesting Cells. So in effect all the requesting (client) cell needs to do is take the results and send them to the GPU for effects, and onto VRAM for display. Personally I think this is where a majority of the computing will emass; at the server level. I can go into more detail, if you need a further understanding.
Secondly with the shared system memory the DMAC could simply query if the instruction is a part of the local memory sandbox, if so it simply loads, if not, based on metrics, it will process the data itself.
This is the fortunet part as perspective can be re-drawn without having to completely redo the environment. Currently games are desgined whereby the environment is simply drawn around the perspective of the player, think of a kilidascope, instead of having to do the calculations of the relation of the player in the game world along with all the other objects. While this shift in logic would cause a learning curve for programmers, in the long run due to the Cell architecture it would allow for much more effeciency. Also remember that most redering (the process intesive filters, lighting, effects, etc.) is done on the GPU.Originally Posted by AlgebraicRing
That I totally dissagree with. Due to all of the agregate power floating around and especially the server to client scenario I showed you, i think that performance will be more than adiquate. BTW they are going to sell consumer electronics devices based on Cell architecture so, you could very well have a fiber attched Cell idiling most of the time for your PS3 to utilize. BTW even if NO processing was off-loaded to other Cells, each Cell is more than capable of being a stand alone beast of a system.Originally Posted by AlgebraicRing
This could very well be an accurate percentage. However with power of the routing logic, you can optimize this a bit. And based on my perspectives above I think you'll see that indeed more processing can be had than your estimate.Originally Posted by AlgebraicRing
This I dissagree with for even the reasons you stated in this very paragraph. Any task that can be off-loaded can facilitate better graphics and environments. Therefore (provided each Cell is at least close in equivelency to competing consoles in terms of power) a higher percentage of power can be obtained from this archeticture than others. However I think I have resolved some of your fears that have lead you to this conclusion in the first place.Originally Posted by AlgebraicRing
Please ask me to elaborate on anything that doesn't quite make sense. I had to rush this one because im at lunch and im trying to get this banged out.
Very astute observations though. Together I think we can solve all of the probable issues that the CELL architecture is to face. And provide possible solutions. Unfortunetely we wont know for sure, if they are truly resovled until Sony officially announces the technology. But please let us conitnue, I want to see if we can resolve answers for all the possible pitfalls of this grand technology.
I await you're response my friend.
-Long Dead-'Editor-in-Chief' of PSINext.com
I don't know if we're speaking the same language yet.I hope this is more clear.
Ah you see, you mentioned GPU. This is the achilles heel of the distributed architecture. Everything upto calculating the wireframe and the skeleton of the world per time slice can be done in a distributed way easily, this is the objective data. The subjective part is with respect to a camera's point of view which texturizes the wireframe and then skews everything to give it perspective. Its this skewing and texturizing that is the responsibility of the GPU. If the GPU itself is not also scalable over a distributed computing environment then there is no point in increasing the number of polygons calculated by the distributed cell network. The GPU is going to have a fixed polygon throughput and increasing the detail of the world isn't going to make the GPU process any faster. As it stands today, the maximum graphical potential of a game is determined mostly by the GPU. If the PS3 has a static GPU then that is going to be the limiting factor on graphical performance. I was actually thinking that the PS3 wouldn't have a dedicated GPU, but that the cell architecture's basic processor would be structured to tackle graphic problems well.First say you have multi-client to server cell relatinship. Say for example a team of SOCOM players (8 players). Now the server could process the base environment; grid points, even specific polygons; trees buildings, and even provide the results (on demand) for physics processing (a grenade exploading) to the requesting Cells. So in effect all the requesting (client) cell needs to do is take the results and send them to the GPU for effects, and onto VRAM for display. Personally I think this is where a majority of the computing will emass; at the server level. I can go into more detail, if you need a further understanding.
I am a little puzzled by your statement. Any multiplayer 3d game has an objective environment, it has to be synched between all the players. Even if the objective environment was stored on a server, the cost in rendering is the transformation done to the objective environment to skew everything so it looks like I'm looking into the world and can see depth. I'm not quite understanding the advantage of the cell architecture here. The skewing and texturizing must still be done at the local level. It can't be done in a collaborative way because each machine belonging to a different player perspective will have to skew and render differently. Only the abundance of idle machines on the network would add any visible difference, and even then only if the GPUs can collaborate with each other.This is the fortunet part as perspective can be re-drawn without having to completely redo the environment. Currently games are desgined whereby the environment is simply drawn around the perspective of the player, think of a kilidascope, instead of having to do the calculations of the relation of the player in the game world along with all the other objects. While this shift in logic would cause a learning curve for programmers, in the long run due to the Cell architecture it would allow for much more effeciency. Also remember that most redering (the process intesive filters, lighting, effects, etc.) is done on the GPU.
Ah but this is where your metrics come in. If every machine out there is competing for time to use idle machines, there is no guarantee that your machine will actually be able to farm out any of its computation. Other machines with faster connections will always beat you out. The ONLY way to guarantee that you see benefits is to keep locally connected idle machines. Which means you have to fork out the cash to buy them. Even if they are appliances you were planning to buy anyway, you'll be paying extra and likely be forced to buy a Sony brand to get the embedded cell processors. Without the distributed network, a standalone PS3 is going to be about as beastly as the rest of the next-gen consoles.That I totally dissagree with. Due to all of the agregate power floating around and especially the server to client scenario I showed you, i think that performance will be more than adiquate. BTW they are going to sell consumer electronics devices based on Cell architecture so, you could very well have a fiber attched Cell idiling most of the time for your PS3 to utilize. BTW even if NO processing was off-loaded to other Cells, each Cell is more than capable of being a stand alone beast of a system.
With the distributed environment, I'm still not convinced that there will be a significant improvement over other next-gen consoles because its sounding like the GPU is still going to be a bottleneck in terms of graphical power. The GPU embodies all the calculations that transform the objective world into the subjective world. Even if there is no GPU and its all distributed, there becomes a competition for idle processors as everyone tries to render a better subjective world. It really boils down to the cost of turning the objective into the subjective. I think that cost is high relative to the cost of calculating the state of the objective world. It would be interesting to hear from a developer that is intimately aware of this cost.
If we were both in a room looking at a 3D picasso like sculpture, we can help each other build the infrastructure of the room and the sculpture. But because I'm in one end of the room and you're in the parts of the sculpture we each see are vastly different. In this respect we are in competition for resources to render our subjective views of the sculpture.
Its the subjectifying the objective world that limits the graphical potential of any system. Its true now in the non-distributed case, and its true in the distributed case. Am I making sense, or am I missing something major in my thought processes?
Appologize for not going into detail with my previous post, which is why there is most likely some confusion. First I want to finish off this metric issue as it came up again towards your last paragraph.
This is not necessarily true; consider a) the numbers of the member cells (millions) and b) that my metric can adopt additional logic for setting criticality and prioritization (QOS). Which I meant to touch on, but forgot. So let me explain it now.Ah but this is where your metrics come in. If every machine out there is competing for time to use idle machines, there is no guarantee that your machine will actually be able to farm out any of its computation. Other machines with faster connections will always beat you out.
First off lets take QOS at two layers; the IPv4 layer and Cell header (not to be confused with IP header). So not only can priorities be set at the IP header to ensure faster routes over internet links (minimizing IP latency). Also allowing for QOS to be implemented at the Cell routing logic-level. This would dicatate that payload A (containing object updates or environment transforms) be processed sooner than payload B (text message). Additionally inside this header would be a processing recource identifier as describle before: (RU). Based on these variables the Cell routing logic can expidite processing and trafficing of softcells (as I like to call them, instead of packets).
Now you're still wondering how Cell A wont get its process request beat by Cell B due to Cell B having a less latent pipe. Enter: sandbox architecture. The sandbox achitecture is actually discribed in the Cell patent so we can actually take this with some validity (as my metrics are only educated guess of what I would do, albeit anyone with my level of knowledge in networking would think the same, so I have no dobut that some sort of routing logic similar to my metic is being planned).
Since there are a an expansive plurality of member cells sandboxes are created to share resources amongst a specific group of closely located (based on latency) Cells. For sake example and the fact that specifics are detailed, we'll say 25 Cells are members of this sandbox. So within these cells in the sandbox you have QOS and Routing metrics to handle where the process gets off-loaded.
Most importantly you have shared memory. With shared memory no longer does a system have to reprocess results. Any member of this sandbox can share its memeory to any other member.
Now I know your concern is subjective transform of the environment for each cell. And you seem to think that this transform is the most intensive process tree that can occour with gaming software. Now let first state that I am by no means a game programmer, I do however follow in great detail how games are coded. From my understanding this is how an environment is rendered:
You have a game environmnet each object is assigned a CORD on a grid. When an object (such as a team mate in SOCOM) changes CORs this reference is computed and displayed during the next frame. The environment is then displayed from your perspective and only the data that can be seen by said perspective is rendered.
What the distributed architecture allows a developer to do is to render the entire world and give everyone a CORD including the requesting node. Then based on this information which will be rendered on say a server CELL or distributed amongst the sandbox cells. The member Cell can say I need this grid reference data from my perspective of X,Y,Z CORDs and only that data. Server says ok: send. Then the requesting Cell member takes that data and sends it to the GPU; adds effects, etc. then sends it onto VRAM then ultimately onto the GIF renderer for display.
I dissage as I mentioned above, there will be more than adiquate time for processes to be distributed amongst cells. Not to mention 30-50% off-loading of generic CPU processing is better than noneThe ONLY way to guarantee that you see benefits is to keep locally connected idle machines. Which means you have to fork out the cash to buy them. Even if they are appliances you were planning to buy anyway, you'll be paying extra and likely be forced to buy a Sony brand to get the embedded cell processors. Without the distributed network, a standalone PS3 is going to be about as beastly as the rest of the next-gen consoles.
I agree a developers insite would be grand. However I severly doubt that the transform of perspective is a instensive as you say. Consider that all generic variables are done already, nothing to really calculate anew, mostly rendering of perspective.With the distributed environment, I'm still not convinced that there will be a significant improvement over other next-gen consoles because its sounding like the GPU is still going to be a bottleneck in terms of graphical power. The GPU embodies all the calculations that transform the objective world into the subjective world. Even if there is no GPU and its all distributed, there becomes a competition for idle processors as everyone tries to render a better subjective world. It really boils down to the cost of turning the objective into the subjective. I think that cost is high relative to the cost of calculating the state of the objective world. It would be interesting to hear from a developer that is intimately aware of this cost.
BTW distibuted AI could be a huge leap for gaming and might be an easy aspect to implement towards the infancy of the technology. Imagine what can be done with AI... hrmm...
Right, but only perspective transform. See you have all of the general variables calculated between each other. Then all that would be left to do is a) render the environment which base variables are already in shared DRAM from the new CORDs locally. or B) share the transform between the two or more precisely plurallity of member cells.If we were both in a room looking at a 3D picasso like sculpture, we can help each other build the infrastructure of the room and the sculpture. But because I'm in one end of the room and you're in the parts of the sculpture we each see are vastly different. In this respect we are in competition for resources to render our subjective views of the sculpture.
To be honest with you I really think that you are over assuming the intesesity of the processing for perspective transform. Espeically when that is ALL an individual cell has to accomplish. The only reason why it can even be considered an intesive task in a none-distributed scenario is that the system has to handle much more than simply perspective transform, but AI, effects, world grids, and genaric variables.Its the subjectifying the objective world that limits the graphical potential of any system. Its true now in the non-distributed case, and its true in the distributed case. Am I making sense, or am I missing something major in my thought processes?
-Long Dead-'Editor-in-Chief' of PSINext.com
BTW have you had a chance to read the Cell patent yet? I think you're insite would be wonderfully astute. I would certainly appreciate your posting your intrepretations of it.
here is the link:
Cell Patent
-Long Dead-'Editor-in-Chief' of PSINext.com
I think my main problem here is how the perspective I am showing you is scewing way too granular. The design of the Cell architecture is simply like one large CPU only split in to many slices (member Cells). Think of it this way, the Cell network is simply a processesor that is as powerful as there are members in the network say 25 million cells for example. So you say every cell can process 1teraflop /sec. Multiply that by 25million = 25mil Teraflops for simplicity. While never one single application will run at one time on this network, it is the very fact of the aggragate power that will allow for impressive new graphics and gaming methodology. Consider that even with the PS3 a lot of excess calculations will be idle 30-50% of the time, even while running a game. Combine this on an aggragate scale and allow access to it combined and distributedly, you can add processing power at an exponetial rate.
-Long Dead-'Editor-in-Chief' of PSINext.com
I am considering the number of member cells out there. I think part of the confusion is that you're focussing on the raw processing power of this distributed network and I'm focussing on the fact that I'm not going to be the only person tapping this powerhouse of computation. You say there will always be more processors than programs, I want to know the expected ratio. I think 1 program to 5 processors is being very very generous. And this says nothing about the quality of the connection, is that 5 full processors or functionally 2.5 due to overhead of distribution of computational cells?This is not necessarily true; consider a) the numbers of the member cells (millions) and b) that my metric can adopt additional logic for setting criticality and prioritization (QOS). Which I meant to touch on, but forgot. So let me explain it now.If every machine out there is competing for time to use idle machines, there is no guarantee that your machine will actually be able to farm out any of its computation. Other machines with faster connections will always beat you out.
I'm getting back into this. I need to sit down and figure out what metrics I think are important. I'm reading the patent now and will be reading what more you have collected here.
alright, I'm going to have to complain about the obviousness of some of your suggestions. Also that you're beginning to get into smoke and mirrors. Even if you minimize latency, there still is latency, and it is significant when compared to processing time. We're just throwing ideas around here, and we don't have any kind of measure of scale. They talk about an absolute timer in the patent... I'm very curious about this beastie, its going to have to keep all the clocks in synch and latency is going to effect the process.So not only can priorities be set at the IP header to ensure faster routes over internet links (minimizing IP latency).
Whoa, whoa, whoa. You're reading something into the sandbox architecture that I'm not. Now when I read the patent, sandbox is specifically mentioned as purely memory protection, meaning an FPU zed dedicated to SoftCell A can't write in memory allocated to FPU gamma dedicated to SoftCell B. But this is only valid when the resources have already been allocated to the SoftCells. I'm talking about before resource allocation. SoftCell A and SoftCell B are going to compete for resources and they can't share the exact same resources. This is a zero sum game. If A gets the resources, B can't get those same resources. Suppose A and B don't know about each other and they both see that resource Beta is available. The both dispatch to lock down the resource and the faster SoftCell wins. The slower SoftCell has to stop and retest the computational waters for other free resources. I'm worried that this scenerio leads to starvation of SoftCells. This is a well known problem in multithreaded apps where the threads block waiting for a finite resource. Many common implementations have no guarantee on the order the threads are granted access to the resource. Considering that this is all realtime processing, SoftCells can't spend time blocking waiting for a specific resource to free itself. I haven't seen anything that puts an upperbound of how long a SoftCell will take to computational completion. What is the worst case scenerio?Now you're still wondering how Cell A wont get its process request beat by Cell B due to Cell B having a less latent pipe. Enter: sandbox architecture.
Careful now. This is not shared memory outside of a box. The memory is copied over the net. Shared memory only exists inside a physical box, and its the FPUs within the box that have access to it. The FPUs also have their own private memory. But physical Box A can't directly access the memory of physical box B can it? I didn't get that out of the patent. And that would require streaming over the net, which is slow as all hell.Most importantly you have shared memory. With shared memory no longer does a system have to reprocess results. Any member of this sandbox can share its memeory to any other member.
Right and when that member cell makes a request to the server with its coordinates and point of view, the server has to do a computation to determine what exactly is visible and what gets rendered. And if this information isn't on the client, it has to send it to the client. And once the client gets it, then it has to be rendered. This is all subjective.What the distributed architecture allows a developer to do is to render the entire world and give everyone a CORD including the requesting node. Then based on this information which will be rendered on say a server CELL or distributed amongst the sandbox cells. The member Cell can say I need this grid reference data from my perspective of X,Y,Z CORDs and only that data. Server says ok: send. Then the requesting Cell member takes that data and sends it to the GPU; adds effects, etc. then sends it onto VRAM then ultimately onto the GIF renderer for display.
To illustrate. Lets say our server has 5 clients. Clients are continually asking the server to send them their subjective data. For each client, the server has to compute what data to send it. This is 5 different calculations for each of the 5 different clients. And then the server has to send the results of the 5 different computations to the 5 different clients. And then each client does its own separate rendering of that information.
Right Now the way that MMOX games are done is that the server just keeps tabs of position information and it broadcasts and synchs up all the position information. This is the minimal dataset that must shared by all players. When we're talking about distributed models of the world, it sounds like you're suggesting that the server isn't just keeping track of position information, its actually doing all the modelling information as well. That's an exponential explosion in the amount of data that needs to be sent between the client and the server. And the server is now doing the perspective computation for each client. You haven't eliminated the computation based on perspective. And now there is a huge amount of data flow from server to client. All that information used to be stored locally per client, now it has to be fetched from the server.
I admit, AI and physics modelling is where this system might really shine. but we'll see.BTW distibuted AI could be a huge leap for gaming and might be an easy aspect to implement towards the infancy of the technology. Imagine what can be done with AI... hrmm...
Are you kidding me? Its the graphics card that is the bottle neck. All the AI, physics and world data is handled by the main CPU. People are writing games now that the graphics cards of the future are barely going to be able to render. AI and Physics don't steal cycles from GPUs. Its the GPUs that prevent us from seeing more beautiful games which higher poly counts.To be honest with you I really think that you are over assuming the intesesity of the processing for perspective transform. Espeically when that is ALL an individual cell has to accomplish. The only reason why it can even be considered an intesive task in a none-distributed scenario is that the system has to handle much more than simply perspective transform, but AI, effects, world grids, and genaric variables.
And I didn't read about any GPU in the patent...
WOW holy CHIT!!! They went all out on that Cell Patent :shock: :shock:
Originally Posted by Viper
I was under the impression that ps3's would be connected 24/7, and that there would therefor be plently of machines awaiting network use.
Well you tell me. Would you be willing to leave your PS3 running 24/7, its making noise, heat and sucking down electricity like you wouldn't believe? The PS3 network sounds like a communistic resource, everyone pays all the time for the good of everyone else.
That aside. I'm also assuming that games aren't going to be the only thing out there that suck down the distributed resources. One question to consider is whether or not Sony can choose what applications your particular box is working on? Is Sony going to sell distributed computing power to the highest bidder? Or will you the consumer be able to choose to dedicate your idle box to helping out the server of a particular game?
Do you know what your PS3 is doing when you're not home? Are you going to let it do anything without your knowledge and consent? Are you going to pay for other people to use your machine when you're not using it?
The problem I see for real-time rendering, for now ( and I am a supporter of the Cell concept... I love those little cute Apulets) at least, is latency...
Each frame, at 60 fps, takes 16.67 ms and during this "frame time" we have to process inputs, process physics and A.I. and completely render our scene and have it sent to the CRTC and have the TV display it.
Think about an FPS, each movement of the player potentially changes every single pixel on screen.
In 16 ms to share power with a connected game server you would need to send the controller input to the server ( if the server were rendering data based on the input ) receive it and finsih the scene rendering...
Let's admit that there are some set-up calculations the uber-powerful Cell blade server can do before the actual rendering from the local machine would start... even admitting that the bandwidth from your machine and the server is high enough that the data can arrive before most of the frame time has elapsed the problem will be have server-farms able to manage thousands of users in the same way...
We need calculations that sending to the server and back would take let's say 7-8 ms and the same calculation on your PlayStation 3 would take 9-10 ms or more then we could do it... I am just unable to imagine, with current broadband networks, what calulations we could off-load...
I do not think that in current networks ( regular cable connection ) we can even send, have processd and receive a nice amount of data in less than 8 ms, but i might be a bit tired to analize things in detail...
This would be great for load balancing on the server side as it woudl increment over-all efficiency distributiing the traffic across the servers... this would be great also for non real-time distributed processign applications.
Another good use of Apulets and the fact that all APUs share the same basic ISA would be for easy data sharing of data across devices on the same LAN segment or across the internet even, it would allow Cell based devices to communicate and inter-operate without the need of slow and cumbersome abstraction layers.
Cell is a very scalable and modular architecture designed to fit PDAs, Phones, HDTVs, Servers, Digital Cameras, etc... by varying number of APUs, number of PEs, etc... it would be very nice if all these devices could be plugged in your home WAN or LAN and find each other and interact/share data with each other without requiring the user or the device vendors to install particular software and have to configure each device... the patent seems to think about such a scenario and the ability of Apulets to migrate and be processed by basically any APU makes perfect sense under such a light ( economically it would make sense to provide an improoved experience when all your appliances are Cell compatible).
Certainly a good point, more and more, I keep having this nagging question of “acceptable latency” running through my head. I believe we need to quantify just what metric is an acceptable latency for “game oriented” programming. I know this will vary based upon what type of code is being executed at a said particular time, however considering the “massively parallel” nature of games, there is most likely an average metric in which developers look for. I think we need to quantify this. ANYONE HAVE SUGGESTIONS????Originally Posted by AlgebraicRing
There is some confusion about the timer, as the patent seems to contradict logic.
alright, I'm going to have to complain about the obviousness of some of your suggestions. Also that you're beginning to get into smoke and mirrors. Even if you minimize latency, there still is latency, and it is significant when compared to processing time. We're just throwing ideas around here, and we don't have any kind of measure of scale. They talk about an absolute timer in the patent... I'm very curious about this beastie, its going to have to keep all the clocks in synch and latency is going to effect the process.
If it is indeed to synch processes then problems could arise due to latency, specifically due to the shared nature of the DRAM. While a process is awaiting space this delay could through off the synching.
Also it has been speculated that it is used to ensure that the overall game code does not process faster on a more powerful cell system- ensuring a game runs at 60hz on all platforms.
However both seem to contradict logic, we need more input before we can surmise further I suppose. How about you Algibric? – any ideas?
Certainly, I was taking the idea and expanding it on a WAN scale, whatever you call it, if you were to utilize WAN processing, you would have to implement some sort of encapsulation...
Whoa, whoa, whoa. You're reading something into the sandbox architecture that I'm not. Now when I read the patent, sandbox is specifically mentioned as purely memory protection, meaning an FPU zed dedicated to SoftCell A can't write in memory allocated to FPU gamma dedicated to SoftCell B. But this is only valid when the resources have already been allocated to the SoftCells. I'm talking about before resource allocation.
Hrm... very true, this could be where that darn absolute timer comes in. They could be utilizing it as processing time metric for just such a problem- synching processes.
SoftCell A and SoftCell B are going to compete for resources and they can't share the exact same resources. This is a zero sum game. If A gets the resources, B can't get those same resources. Suppose A and B don't know about each other and they both see that resource Beta is available. The both dispatch to lock down the resource and the faster SoftCell wins. The slower SoftCell has to stop and retest the computational waters for other free resources. I'm worried that this scenerio leads to starvation of SoftCells. This is a well known problem in multithreaded apps where the threads block waiting for a finite resource. Many common implementations have no guarantee on the order the threads are granted access to the resource. Considering that this is all realtime processing, SoftCells can't spend time blocking waiting for a specific resource to free itself. I haven't seen anything that puts an upperbound of how long a SoftCell will take to computational completion. What is the worst case scenerio?
Certainly true, I did not get that from the patent, however it was an idea I was playing around with. I should point the parts which may be conjecture rather than specifically outlined in the patent. However stream processing could very well be their aim.Careful now. This is not shared memory outside of a box. The memory is copied over the net. Shared memory only exists inside a physical box, and its the FPUs within the box that have access to it. The FPUs also have their own private memory. But physical Box A can't directly access the memory of physical box B can it? I didn't get that out of the patent. And that would require streaming over the net, which is slow as all hell.
Some links I have yet to read. Once I do I’ll come back and see if there is any added insight I can provide on the process of possibly streaming between cells.
http://graphics.stanford.edu/sss/
http://cva.stanford.edu/imagine/index.html
What the distributed architecture allows a developer to do is to render the entire world and give everyone a CORD including the requesting node. Then based on this information which will be rendered on say a server CELL or distributed amongst the sandbox cells. The member Cell can say I need this grid reference data from my perspective of X,Y,Z CORDs and only that data. Server says ok: send. Then the requesting Cell member takes that data and sends it to the GPU; adds effects, etc. then sends it onto VRAM then ultimately onto the GIF renderer for display.Good point, the bandwidth that sort of data would cause would make the metric highly un-feasible.
Right and when that member cell makes a request to the server with its coordinates and point of view, the server has to do a computation to determine what exactly is visible and what gets rendered. And if this information isn't on the client, it has to send it to the client. And once the client gets it, then it has to be rendered. This is all subjective.
To illustrate.Lets say our server has 5 clients. Clients are continually asking the server to send them their subjective data. For each client, the server has to compute what data to send it. This is 5 different calculations for each of the 5 different clients. And then the server has to send the results of the 5 different computations to the 5 different clients. And then each client does its own separate rendering of that information.
Right Now the way that MMOX games are done is that the server just keeps tabs of position information and it broadcasts and synchs up all the position information. This is the minimal dataset that must shared by all players. When we're talking about distributed models of the world, it sounds like you're suggesting that the server isn't just keeping track of position information, its actually doing all the modelling information as well. That's an exponential explosion in the amount of data that needs to be sent between the client and the server. And the server is now doing the perspective computation for each client. You haven't eliminated the computation based on perspective. And now there is a huge amount of data flow from server to client. All that information used to be stored locally per client, now it has to be fetched from the server.
I’ll start thinking of some ways in which the distributed AI would benefit games then put a metric to it and in-game result so that the gamers could understand why this would be interesting. Be back soon to comment.BTW distibuted AI could be a huge leap for gaming and might be an easy aspect to implement towards the infancy of the technology. Imagine what can be done with AI... hrmm...
I admit, AI and physics modelling is where this system might really shine. but we'll see.![]()
Good point, I would certainly like to know how intensive perspective transform is, more so for understanding, than anything. Hrm... time to go find some game developers.To be honest with you I really think that you are over assuming the intesesity of the processing for perspective transform. Espeically when that is ALL an individual cell has to accomplish. The only reason why it can even be considered an intesive task in a none-distributed scenario is that the system has to handle much more than simply perspective transform, but AI, effects, world grids, and genaric variables.
Are you kidding me? Its the graphics card that is the bottle neck. All the AI, physics and world data is handled by the main CPU. People are writing games now that the graphics cards of the future are barely going to be able to render. AI and Physics don't steal cycles from GPUs. Its the GPUs that prevent us from seeing more beautiful games which higher poly counts.
And I didn't read about any GPU in the patent...
-Long Dead-'Editor-in-Chief' of PSINext.com
BTW great post, kinda sums up one of the strategic points made in this thread- kudos. This is where I would like to focus next.... what processes CAN be offloaded onto a WAN? Personally I think distributed AI might be of interest. Any thoughts?Originally Posted by Panajev2001a
-Long Dead-'Editor-in-Chief' of PSINext.com
Scott,
with regards to this link: http://cva.stanford.edu/imagine/project/im_arch.html
Streaming through Kernels in this case is just another implementation of pipelining. What seems unique about this architecture is that the pipelines are reprogrammable. This is not distributed computing in the sense that machines that are physically separate are collaborating together to solve a problem. This is local machines with a multiplicity of processors and each processor is given a dedicated task. The processors act in the manner specified by their Kernels.
This is very much like an assembly line in a factory. The data is being "streamed" through each Kernel to undergo a transformation to prep it for the next Kernel. The implementation details of this is that the data isn't streamed at all. It sits in shared memory, the kernel that needs to operate on it locks the memory down, does its transformation, releases the lock, and the next kernel is notified that input is ready and waiting.
This works for graphics because most of the graphics processes can be broken up into kernels of transformation. What the data looks like doesn't matter because all start data has to go through the same transformation. This sounds like a GPU implemented as software on multiple processors that are well suited to performing the math needed in all GPU like transformations. The hardware architecture itself reminds me of the PS3, how the software is being split up is not reminiscent of the PS3
The critical difference here is that the PS3 has this notion of a Software Cell, the Stanford box has no such notion. A Software Cell is the combination of both the program and the data the program operates on. This is what gets passed around in a distributed fashion for the PS3. We can think about the program as being a set of Kernels. All the Kernels and the start data get sent out in a SoftCell, the PS3 configures itself to run the Kernels on the start data and then it returns the result data. The Kernels get thrown away never to be used again when the SoftCell completed its task.
In the Stanford case. The Kernels never get thrown away, they only become idle when there is no more data to operate on. The assembly line never disappears, it is always ready to operate on new data. Every kernel can be busy and when it finishes its current work it passes it off to the kernel next to it, and so every kernel stays busy.
Sure, we can write SoftCells which never get thrown away, they can do the same thing that Stanford does. All the SoftCells collaborate with each other and pass data around from one to another. They come to completion when they get told to shut down. The fundemental difference in this case now is the locality of the machines.
The Stanford architecture is meant to be local. Stuff a room full of these boxes, they're all dedicated to one task, rendering graphics. Data starts in box 1, visits each kernel in order, goes to box 2, visits each kernel in order, goes to box 3, visits each kernel in order, etc. All data follows the same path, because they all share the same transformation process. The transfer between boxes is just data, not program information. The ordering of the boxes matter because the kernel ordering matters. The boxes must be local because the data transfer between boxes needs to be as fast as possible.
The Sony architecture does not have to be local. And initially there will not be many local boxes to tap for this kind of Kernel-Streaming. It's going to be a while before people have more than just a PS3... The cost of transferring data over the internet is going to slow the streaming process down.
Furthermore this is still a subjective transform. Lets suppose that the internet's lag wasn't a factor. My view into the world is different than your view of the world. My start data through the pipeline is therefore different than your start data through the pipeline. A frame rendered for Me is a frame that you couldn't render for you. There might be some benefit in that our machines are now dedicated and don't have to swap out memory for loading paged out kernels. This might actually mean that our machines experience a 3 times power increase instead of just the expected 2 times. We've reduced time and power in terms of the machine having to page in memory. But the cost of paging in memory is nothing compared to the cost of streaming data over the net from one box to another.
Stanford gets away with it because their machines are dedicated for one purpose, implementing a GPU in software, and their boxes are all locally connected in a linear fashion.
The PS3's main drawback is the internet lag. Now someone could just buy a bunch of PS3's to create their own local powerhouse GPU...
you know what?. i just read that entire thread from page 1 and i didn't understand a single thing![]()
. i don't wanna understand it either, it will just get me even more confused than what i already am. anyways my suggestion to you guys is don't stress it, things will be unveild soon enough. lol
damm still wish i could understand all that but oh well 8)
The PS3's main drawback is the internet lag. Now someone could just buy a bunch of PS3's to create their own local powerhouse GPU...
this, IMO, with my limited knowledge on the subject, is the only way the power of 2 or more PS3s could be used for realtime rendering/gaming. It just isnt going to happen over the internet next-generation. maybe not even the generation after (PS4).
I'm back! Did ya miss me? Well find someone who's a better shot then!
Okay, lets discuss "The Grid" for a bit. This is an idea some physicists had, for standardising how their beowulf cluster stuff worked together. The standards for this are OSGA and OSGI. A Grid-compliant service provides an interface which conforms to a subset of the WSDL definition. WSDL is a way of describing what inputs and outputs a web-services provides.
You're probably staring confusedly at the screen now. Just hold with me a few more paragraphs and it should make sense.
Grid services normally talk SOAP, which is an XML based language. This is normally sent over HTTP. So, forget the amount of overhead on just, this is SOAP over XML over HTTP over TCP over IP. The latency is huge, in game terms (significant fraction of a second on a LAN).
Lets briefly look at the brightly coloured website of Butterfly.net. They're selling the idea of using the Grid for MMORPGs. Now this makes sense. The latency, while still a little high, would be tolerable. Still, I wouldn't want to try Quake over the Grid, so maybe Sony will be using something other than SOAP - the standards do provide for doing such, even if I'm unaware off hand of any implementations.
So, we can use the Grid for multiplayer gaming. Good start, but what else. Well, they're probably going to release two versions of the PS3 (I'll be buying the all singing, all dancing one for reference), and it would make sense to me that other Sony hardware will be able to talk to it in a Grid like manner.
Oh, one last thing. Scott, fiber isn't all that fast - the speed of light in fiber (as opposed to a vacuum) is 2*10^8, while electricity in cabling goes at about 3*10^8. The big advantage of fiber is the distance - copper cabling can only go for about 200metres, while fiber can do kilometres (although I can't remember how many off-hand).
Will try to be back with more coherent messages later...
So what youre saying is that fiber optics arent all theyre hyped up to be?
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks