The latest V-Ray 5 for 3ds Max hotfix makes bucket rendering faster than ever before. Find out how Chaos’ clever algorithm solved this age-old computing problem.
Bucket rendering has many well-known advantages: it makes it easy to distribute the workload, uses little memory, and produces high-quality results. But it has one equally well-known major issue: the so-called Last Bucket Syndrome.
Distributing work across threads, when the task is unknown in size and complexity, is a recurring theoretical problem in modern computing, not just a common one for rendering.
Each bucket is calculated by a single CPU thread, and as soon as the buckets become less than the available threads, the idling threads mean that the CPU is being used sub-optimally.
Further, what gives name to the issue mentioned above is that one bucket will always have to be the last one to complete, and there are cases when this last little square area will take, alone, more time than the rest of the image combined, with all threads of the CPU but one idling in wait.
This is, unfortunately, a common scenario, as renders are uneven in complexity across the image plane: for example, a clear sky will render more quickly than complicated vegetation on the ground. There will often be a particularly obstinate area, usually due to needing many samples to properly converge to the desired noise threshold, such as a tiny but very very bright specular highlight.
Over time, a library of mitigating approaches has been developed both in scene setup, generally being ways to diffuse the energies so that they are easier to converge to a clean result, and in code, such as buckets getting smaller as the image approaches the end of rendering. But none of these could get away from the fact that the efficiency of bucket rendering was less than optimal, particularly as the end of the render approached, and nothing could fix the single stuck bucket issue.
Both users and programmers alike have wrestled with this conundrum over the years.
Here are some workarounds users may be familiar with from the Chaos Forum.
Making the buckets smaller would help with modern multiple-core CPUs. But this also ends up costing more, as bucket edges need to be computed twice, and smaller buckets mean more edges. And it would not help at all with the last bucket being stuck.
The issue could be marginally mitigated with the bucket order of rendering following the mouse cursor — but only if the user knew in advance which parts would take longer to compute.
This proposition isn’t tenable for animations or new images where the content isn’t known.
Another way to optimize the workload was to split buckets into smaller ones as fewer were left to complete the image.
This helped with maintaining core occupation high, but it’s also not ideal for very uneven images, where a big, initial bucket may stay stuck well after every other smaller one has been completed.
Using the Light Cache:
The conundrum is that rendering is, by definition, finding the solution to an unknown problem, so attempts were made at making that problem better known in advance, with the Light Cache, for example.
Unfortunately, using the Light Cache prepass to figure out what parts of the image will take the longest is not ideal: the sampling isn’t high enough, and it’s not to a noise threshold, so the image representation wouldn’t be faithful enough to make choices with.
Restarting stuck buckets:
This issue has become so severe in some cases that the suggestion to stop and restart the hung buckets has been taken into consideration.
This is not ideal, as already calculated samples would be thrown away at some arbitrary point, and a wrong estimation would potentially lead to even longer rendering, especially if a bucket that was close to the end was to be stopped and restarted. Different variations to this approach were considered, and all of them were suboptimal in one or more aspects.
Using previous frames:
In animations, it was suggested that we use the previous frames to find out which areas would take the longest to compute in the current one.
Unfortunately, there isn’t any guarantee of continuity between frames, new slow-to-render parts could come into view in the new frame, and there might not be an animation to rely on at all.
To find a viable solution, Radoslav in the Chaos R&D team worked backward.
The Adaptive Bucket Splitting algorithm works in such a way that if a thread is left without work, it will run to the aid of threads that are still working away.
A bucket will be split, and all the sampling completed by the original thread will be transferred to the new one so that no sample is lost. The new thread then continues towards convergence.
This splitting is purely governed by how many idle threads are available and, of course, how much of the image is left to render.
It can split buckets down to the size of one pixel, as the area inside a given bucket is also discontinuous with parts requiring less sampling and others requiring more.
The benefits of bucket splitting
As the data is never thrown away, it’s always safe to split a bucket.
As the splitting happens only when there are idle threads, it’s always happening at the ideal time.
As the splitting happens without knowledge of the rendered image, the technique naturally lends itself to being optimal in sharing the workload, regardless of scene contents, or time already spent rendering: so long as there are free threads and pixels left to converge, the free threads will be put to work.
The general result is a decrease in render times, which vary depending on the scene and the number of cores in the CPU.
We have not yet measured a case where the new algorithm is slower than the older ones, such as no splitting, or naive splitting.
The technique currently leaves the individual buckets single-threaded. This means that if, by sheer bad luck, the part taking the longest is one pixel in size, only one thread will be able to work on it. The stuck bucket will then be one pixel in size, but still present, and theoretically still able to take a long time to compute.
This specific approach is currently only viable for local rendering and doesn’t apply to distributed rendering, as that requires a slightly different approach to ensure network traffic stays low.
Research is ongoing to alleviate the current limitations.
Changes in workflows
The technique is entirely transparent to the user, requiring absolutely no change to the way a scene is set up. It also works nicely with mouse follow and region rendering.
In the case of region rendering, there is no need to change bucket sizes to make them fit a small region, while mouse follow will also automatically split any remaining buckets, regardless of user interaction.
It will, however, allow for bigger buckets to be used and it will maintain the bigger, more efficient buckets for longer compared to the old splitting method.
Care should be taken to not choose too big an initial size for the buckets, , as this may cause other drawbacks.
In essence, there is no need to choose small bucket sizes for the image to complete more efficiently: the new algorithm will perform optimally with default (or bigger) sizes.
As a corollary, it will also ameliorate the known issues with Depth Of Field and Motion Blur sampling (small buckets have a higher chance to miss samples).
Where to get it
The algorithm will be present in V-Ray 5 for 3ds Max, update 2, hotfix 3 and onwards. You can get the update here. It will later be extended to all other DCC integrations.