Sunday, April 3, 2016

Debugging by sweeping under rug

Somebody already found several errors in my new minibook (still free for until Apr 5 2016).    There are some pesky black pixels in the final images.

All Monte Carlo Ray Tracers have this as a main loop:

pixel_color = average(many many samples)

If you find yourself getting some form of acne in the images, and this acne is white or black, so one "bad" sample seems to kill the whole pixel, that sample is probably a huge number or a NaN.   This particular acne is probably a NaN.   Mine seems to come up once in every 10-100 million rays or so.

So big decision: sweep this bug under the rug and check for NaNs, or just kill NaNs and hope this doesn't come back to bite us later.   I will always opt for the lazy strategy,  especially when I know floating point is hard.

So I added this:

 There may be some isNaN() function supported in standard C-- I don't know.   But in the spirit of laziness I didn't look it up.   I like to chase these with low-res images because I can see the bugs more easily.    It doesn't really make it faster-- you need to run enough total rays to randomly trip the bug.   This worked (for now!):

Left: 50x50 image with 10k samples per pixel (not enough for bug).    Middle 100k samples per pixel.   Right: with the NaN check. 


Now if you are skeptical you will not that by increasing the number of samples 10X I went from 0 bugs to 20+ bugs.   But I wont think about the possibly troublesome implications of that.   MISSION ACCOMPLISHED!


friedlinguini said...


Peter Shirley said...

You are right-- looks like it is in the standard! Thx.

75seconds video said...

Your Work are awesome,you are rock in near future.

Light Lux said...

Assuming this is from the same framework used in "Ray Tracing: The Rest of Your Life", then I believe the NaNs are due to a division of 0 by 0. The bug is due to the value() method of the cosine_pdf class, where you have the following:
if (cosine > 0)
return cosine / M_PI;
return 0;
Since the color function returns emitted+throughput*brdf/pdf_value, when cosine<=0 pdf_value==0, meanwhile the brdf function (scattering_pdf in your code) has the cos term in the numerator too so a 0 by 0 division occurs resulting in a NaN value. That can be easily fixed by changing the value() method to return 1 when cosine<=0, this will not be an issue since there'll still be a 0 in the numerator from the brdf, so you'll have 0/1 (at which point the recursion can be stopped).