Quantcast
Cell + RSX load balancing & raytracing mechanism?
Results 1 to 19 of 19

Thread: Cell + RSX load balancing & raytracing mechanism?

  1. #1
    Join Date
    Jan 2006
    Location
    Finland
    Age
    35
    Posts
    2,266

    Cell + RSX load balancing & raytracing mechanism?

    For the techie buffoons/fellow nerds here this could be a treat

    Here is a Nvidia patent http://appft1.uspto.gov/netacgi/nph-...RS=20060059494 applied on March 16, 2006, vaguely telling about specific implementation where CPU can aid GPU with the graphics workload with load balancing. Sounds like Cell+RSX co-operation idea ... Too bad I couldn't find a way to look at the figures they are referring to at the patent, would be a lot easier to read about the whole mechanism.

    A method of load balancing between a programmable GPU and a CPU comprising: forming a two-ended queue of separate work units each capable of being processed at least in part by said GPU and said CPU; and processing said work units by having said GPU and said CPU select work units from respective ends of said queue.
    In graphics, one typical and frequent computation is referred to as "ray tracing." Ray tracing is employed in a variety of ways in graphics, such as for simulating illumination effects, including shadows, reflection, and/or refraction, as well as other uses. In general, ray tracing refers to a process to determine the visibility of surfaces present in a particular graphical image by tracing imaginary rays of light from the viewer's eye to objects in the scene. ...
    ... Such an approach, therefore, may be time consuming and may not fully utilize the parallel processing capability available via a programmable GPU. Additional techniques for using a programmable GPU to perform ray tracing for graphics processing are, therefore, desired.
    ...

    Another aspect of this particular embodiment is employing the parallel processing capability of the GPU. In particular, intersecting a plurality of rays with a hierarchy of bounding surfaces suggests a repetitive calculation that may potentially be performed effectively on a GPU. The following discussion focuses on processing by the GPU itself and how the GPU interacts with the CPU to load balance and compute ray-primitive intersections. Thus, yet another aspect of this particular embodiment involves load balancing between the GPU and the CPU.

    Referring now to FIG. 3, block 310 depicts, for this particular embodiment, subdividing an image into work units to assist in performing ray tracing. As previously indicated, a programmable GPU is employed to compute intersections between a plurality of rays and a set of surfaces hierarchically bounding a set of graphical objects. Initially, however, the image is divided using non-overlapping surfaces that surround the objects. Thus, in this embodiment, the image is divided based at least in part on bounding objects, regardless of object shape, with a surface so that spatially the objects are separated. In this particular embodiment, an object comprises a mesh of quadrilateral primitives, although, of course, the claimed subject matter is not limited in scope in this respect. In this context, a primitive may comprise any polygon. It is noted that the shape of the bounding surfaces may take any form. For example, the shape of a bounding surface may comprise a sphere, square, rectangle, convex surface, or other types of surface. For this particular embodiment, the bounding surface comprises a box, referred to here as a bounding box....
    One issue when employing a GPU is determining when processing has ceased. To determine this, the CPU queries the GPU. However, such querying has some efficiency implications. Querying the GPU results in the GPU stopping its' processing so that it is able to provide data to the CPU. Thus, it may be undesirable to query too frequently because this may result in processing inefficiency. However, it is, likewise, desirable to not query the GPU too infrequently because once the GPU has ceased, it may sit idle, representing wasted processing time.

    For this particular embodiment, a two sided-queue, as previously described, provides a mechanism to balance these considerations, as shall be described in more detail later. Within this context, as suggested, the frequency at which the CPU queries the GPU mechanism may affect efficiency of processing by the GPU. Thus, depending on the particular implementation or embodiment, it may be desirable to vary this frequency.

    As previously described, the GPU and the CPU begin separate work units initially, as illustrated by block 330 of FIG. 3. Thus, for this embodiment, the CPU and the GPU compute intersections between a plurality of rays and a set of surfaces bounding one or more graphical objects. It is noted, however, that in this context a graphic object comprises a set of primitives. For this particular embodiment, although the claimed subject matter is not limited in scope in this respect, and as is further illustrated by block 340 of FIG. 3, whether and when the CPU has completed its' work unit is a decision point. In this context, completing a work unit refers to ceasing processing of that particular work unit and, if available, beginning processing of another work unit.

    If the CPU has not ceased or completed processing, then both the CPU and GPU continue processing, as illustrated by block 370. However, once the CPU has finished, it queries the GPU regarding whether the GPU has ceased processing, depicted by block 350 in FIG. 3. If the GPU has additional processing for the latest work unit it began, the GPU continues. If additional work units remain in the queue, then the CPU, which has completed, pulls another work unit from the end of the queue. Then, as before, the CPU continues until it has completed its' work unit and then, again, it queries the GPU. If, at this point, the GPU has ceased processing, then the GPU provides information back to the CPU, such as, if a "hit" or intersection has occurred. If no hits have occurred, this indicates that none of the rays have intersected bounding boxes for the voxel or work unit processed by the GPU. Thus, this work unit is complete since no rays intersect primitives.

    If there are additional work units, the GPU and the CPU then take additional work units and the loop continues. This is illustrated in FIG. 3 by the loop that includes blocks 385, 335, 365, and 355. It is noted, of course, as illustrated, for example, by block 386, that once there are no more work units and once the CPU and GPU have no additional processing for their respective work units, then the process has completed, for this particular embodiment.


    If, alternatively, however, the GPU has uncovered a hit, this means that some rays intersected bounding boxes for the particular voxel. The GPU, by providing data back to the CPU regarding the rays where this intersection has taken place, assists the CPU to determine the number of rays that still remain "active" for further processing. This information allows the CPU to schedule another work unit in the two-sided queue previously described. This scheduling by the CPU determines whether additional processing for this particular voxel will performed by the GPU or the CPU.

    At some point, however, there are no additional bounding boxes in the hierarchy. Once this occurs, assuming the GPU has uncovered a "hit," it indicates that a computation be performed to determine whether the ray or rays intersect the primitives bounded by the bounding boxes. In this particular embodiment, this latter computation is performed by the CPU rather than by the GPU. Therefore, the CPU computes intersections between one or more rays and one or more graphical objects based at least in part on the computations performed by the GPU. The CPU completes such processing for a particular work unit by determining whether the ray or rays intersect any primitives. This is illustrated in FIG. 3 by blocks 375 and 371. As depicted in FIG. 3, at block at 380, once the CPU has completed ray-primitive intersection calculations for the work unit, both the CPU and the GPU take another work unit, if available. As before this is depicted by the loop that includes blocks 335, 365 and 355.

    It is possible for a ray to intersect two or three objects. In order to address this, intersections between rays and primitives are cached and sorted using a z-buffer to determine which primitive is the first or closest intersection.
    It's normal, very vague patent talk but you can read from it that they have really thought about these collaboration aspects on a hardware level

    I'm no more wondering why PS3 games are starting to show raycasting/raytracing action, which is extremely nice.

    Now that we know there exists libgcm http://www.gaming-age.com/cgi-bin/im...006/ps3/51.jpg for coding commands straight for the RSX's command list you can code "down to the metal" aka utilize RSXs strengths in the best way possible.
    So many games so little time..

  2. #2
    Wow, very nice find m8. I´m gonna read it later cause now I´m leaving, but anyway it´s a great find !

  3. #3
    Join Date
    Aug 2005
    Age
    31
    Posts
    602
    Someone should alert Nerve-Damage!

  4. #4
    Join Date
    Jan 2006
    Location
    Finland
    Age
    35
    Posts
    2,266
    Yeah, and Cpiasminc (is it written that way? ) and xbdestroya. They have been kinda absent lately...
    So many games so little time..

  5. #5
    Is that you Nerve? (J/K) Where is that man, just when we need him...
    PSN: Sephiroth_VII

  6. he is infiltrating classified R&D labs all over the world. KB smoker is the wheel man!
    "With or without religion, you would have good people doing good things and evil people doing evil things. But for good people to do evil things, that takes religion."
    - Steven Weinberg

    “If Jesus had been killed twenty years ago, Catholic school children would be wearing little electric chairs around their necks instead of crosses.”
    - Lenny Bruce

  7. It's perfect for Cell-RSX and FlexIO (35G/seg).

  8. #8
    i could have swarn i saw this before, interesting
    Less invasions, more equations!

  9. #9
    Quote Originally Posted by NeoPlayStation
    It's perfect for Cell-RSX and FlexIO (35G/seg).
    Yeah man, now I undertand what this Flex IO is for. Very good, this should help the PS3 to achieve unbelievable graphics when developers learn to use it correctly

  10. I... don't understand...

    But it does sound good... I think haha.

    Would anyone be interested in fluffing this information down to a less technical level plz? I'm just assuming the patent means that the Cell and RSX will be able to shift calculations more to whichever side to get more performance for different tasks (that's what I got from the thread title lol, someone tell me if I'm even close haha).

  11. #11
    Join Date
    Jan 2006
    Location
    Finland
    Age
    35
    Posts
    2,266
    Basic rundown is that Cell (especially SPEs) can assist RSX with about any of its graphical activities when it is stuttering under load (the patent doesn't say, or I can't read it, which activities those are, but developers at B3D have spoken many times about Cell doing vertex geometry calculations and pre-postprocessing work), there are also additional mechanisms to help the combination in raytracing procedures.
    So many games so little time..

  12. #12
    It's a nice patent, but I have to wonder how much these ideas will play into PS3 development. The examples used in the patent itself seem to focus on bringing the GPU into the mix to assist CPUs with ray-casting, and as 'cool' as it all is, I'm not sure that would be the best use graphically of the RSXs strength.

    The Cell's ability to assist in geometry and image processing has been discussed at length before - with some degree of confusion ever lingering - but I think we still have to view those cases as being very much different than what is being discussed here in this patent, even if it's in the same 'family' of concepts.

    I defer to GodMachine or Cpi though for any 'clear' indications here I might not be picking up on. In truth I didn't read the whole patent, but just skimmed over it.
    Respect to all those who debate their positions using facts and reason rather than rumor and passion.

  13. #13
    Join Date
    Jan 2006
    Location
    Finland
    Age
    35
    Posts
    2,266
    Heh, of course there is, as always, the aspect of "I want to believe", hence the question mark in the thread title In this case though it sounds too familiar for not being at all related with PS3 technology. Just to think how many processors (CPU) parts are able to do such amounts of geometry work as Cell's SPEs and how in this patent there is the concept of job queue where in certain loop the CPU and GPU start initially working on separate work units, CPU polls time to time if the GPU has finished its job, if no hits has occurred and CPU has finished its job it takes another work unit and starts working with it, now if the GPU is ready it shouts to CPU "I'm ready" so it has found an intersection or hit within a bounding box (voxel?) and GPU gives this info to CPU. "If no hits have occurred, this indicates that none of the rays have intersected bounding boxes for the voxel or work unit processed by the GPU. Thus, this work unit is complete since no rays intersect primitives."

    "If, alternatively, however, the GPU has uncovered a hit, this means that some rays intersected bounding boxes for the particular voxel. The GPU, by providing data back to the CPU regarding the rays where this intersection has taken place, assists the CPU to determine the number of rays that still remain "active" for further processing. This information allows the CPU to schedule another work unit in the two-sided queue previously described. This scheduling by the CPU determines whether additional processing for this particular voxel will performed by the GPU or the CPU."

    It would be a lot easier to read with the images showing the actual workload and which (CPU/GPU) is doing which work or are they both able to do both (with reasonable latency and efficiency).

    I'm still happy that some raytracing algorithms are already being used in PS3 games even if these kind of patents didn't have anything to do with it
    So many games so little time..

  14. #14
    Yeah, and Cpiasminc (is it written that way? ) and xbdestroya. They have been kinda absent lately...
    Well, I have a perfectly good reason -- I had to move about 1700 miles.

    It's a nice patent, but I have to wonder how much these ideas will play into PS3 development. The examples used in the patent itself seem to focus on bringing the GPU into the mix to assist CPUs with ray-casting, and as 'cool' as it all is, I'm not sure that would be the best use graphically of the RSXs strength.
    Pretty much. It seems like it has a lot more to do with the whole Gelato idea of accelerating offline renders. Basic concept seems to be about using the GPU and a bounding volume hierarchy to accelerate ray tests and use occlusion queries or render-to-texture buffers themselves to feedback information about the visibility and the particular rays of concern. Most likely is only valuable for helping to solve the first hit.

    It also seems to lead a little down the path of determining which ray groupings should have values for the first hit solved by the GPU and which by the CPU based on the number of rays that actually intersected with the bounding volumes. Doesn't seem to go into much detail on this point except to say that larger groups are better for the GPU.
    Cell phones have changed mankind. Finally, men have something they can flip out and argue "mine is smaller than yours."

  15. #15
    Join Date
    Jan 2006
    Location
    Finland
    Age
    35
    Posts
    2,266
    Quote Originally Posted by cpiasminc
    Well, I have a perfectly good reason -- I had to move about 1700 miles.
    Wow, so did you "move" or move?

    For the rest of the post, well it was a nice dream again

    Probably we should put up a new thread about what we do know about RSX and what it definitely isn't. At least it definitely hasn't got anything to do with audio processing on a hardware level, it outputs the sound for PS3 but AFAIK does nothing else audio related.
    So many games so little time..

  16. Quote Originally Posted by sct-i/on
    (the patent doesn't say, or I can't read it, which activities those are,
    it said in the patent that gpu's do various tasks such as calculate vertex points and apply shading. Im assuming that the cell (idle spe's) will become active and help the rsx with calculating vertex positions and raytracing (shadows/visibility)

  17. #17
    Quote Originally Posted by BillCosby
    it said in the patent that gpu's do various tasks such as calculate vertex points and apply shading. Im assuming that the cell (idle spe's) will become active and help the rsx with calculating vertex positions and raytracing (shadows/visibility)
    this is how i see things panning out.

  18. #18
    Wow, so did you "move" or move?
    I'm not sure what that question even means, but what I meant is that I packed up all my stuff and moved to a new region of the country that's about ~1700 miles from where I was living (went from Dallas to Silicon Valley).
    Cell phones have changed mankind. Finally, men have something they can flip out and argue "mine is smaller than yours."

  19. #19
    Join Date
    Jan 2006
    Location
    Finland
    Age
    35
    Posts
    2,266
    Sorry, English is not my mother tongue so in my ears you can move certain amount of distance and then come back, you can also move ie. pack your stuff and change your home temporarily or for good.

    So it sounds like you got another job in the industry, good for you. You can't have such technical knowledge without having worked on the "field".
    So many games so little time..

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

User Tag List

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •