dipityPix app

Sunday, May 15, 2016

Prototyping video processing

I got a prototype of my 360 video project done in Quartz Composer using a custom Core Image filter.     I am in love with Quartz Composer and core graphics because it is such a nice prototyping environment and because I can stay in 2D for the video.   Here is the whole program:

A cool thing is I can use an image for debugging where I can stick in whatever calibration points I want to in Photoshop.   Then I just connect the video part and no changes are needed-- the Core Image Filter takes and image or video equally happily and Billboard displays the same.

The Filter is pretty simple and is approximately GLSL     

One thing to be careful on is the return range of atan (GLSL atan is the atan2 we know and love).

I need to test this with some higer-res equirectangular video.    Preferably with fixes viewpoint and with unmodified time.   If anyone can point me to some I would appreciate it.

Saturday, May 14, 2016

What resolution is needed for 360 video?

I got my basic 360 video viewer working and was not pleased with the resolution.   I've realized that people are really serious that they need very high res.   I was skeptical of these claims because I am not that impressed with 4K TVs relative to 2K TVs unless they are huge.   So what minimum res do we need?    Let's say I have the following 1080p TV (we'll call that 2K to conform to the 4K terminology-- 2K horizontal pixels):

Image from https://wallpaperscraft.com
If we wanted to tile the wall horizontally with that TV we would need 3-4 of them.   For a 360 surround we would need 12-20.   Let's call it 10 because we are after approximate minimum res.  So that's 20K pixels.   To get up to "good" surround video 20K pixels horizontally.   4K is much more like NTSC.   As we know, in some circumstances that is good enough.

Facebook engineers have a nice talk on some of the engineering issues these large numbers imply. 

Edit: Robert Menzel pointed out on Twitter that the same logic is why 8K does suffice for current HMDs.

Thursday, May 12, 2016

equirectangular image to spherical coords

An equirectangular image, popular in 360 video, is a projection that has equal area on the rectangle match area on the sphere.   Here it is for the Earth:

Equirectangular projection (source wikipedia)
This projection is much simpler than I would expect.    The area on the unit radius sphere from theta1 to theta2 (I am using the graphics convention of theta is the angle down from the pole) is:

area = 2*Pi*integral sin(theta) d_theta = 2*Pi*(cos(theta_1) - cos(theta_2))

In Cartesian coordinates this is just:

area = 2*Pi*(z_1 - z_2)

So we can just project the sphere points in the xy plane onto the unit radius cylinder and unwrap it!   If we have such an image with texture coordinates (u,v) in [0,1]^2, then

phi = 2*Pi*u
cos(theta) = 2*v -1

and the inverse:

u = phi / (2*Pi)
v = (1 + cos(theta)) / 2

So yes this projection has singularities at the poles, but it's pretty nice algebraically!

spherical to cartesian coords

This is probably easy to google if I had used the right key-words.   Apparently I didn't.   I will derive it here for my own future use.

One of the three formulas I remember learning in the dark ages:

x = rho cos(phi) sin(theta)
y = rho sin(phi) sin(theta)
z = rho cos (theta)

We know this from geometry but we could also square everything and sum it to get:

rho = sqrt(x^2 + y^2 + z^2)

This lets us solve for theta pretty easily:

cos(theta) = z / sqrt(x^2 + y^2 + z^2)

Because sin^2 + cos^2 = 1 we can get:

sin(theta) = sqrt(1 - z^2/( x^2 + y^2 + z^2))

phi we can also get from geometry using the ever useful atan2:

phi = atan2(y, x)

Friday, May 6, 2016

Advice sought on 360 video processing SDKs

For a demo I would like to take come 360 video (panoramic, basically a moving environment map) such as that in this image:

An image such as you might get as a frame in a 360 video (http://www.airpano.com/files/krokus_helicopter_big.jpg)
And I want to select a particular convex quad region (a rectangle will do in a pinch):

And map that to my full screen.

A canned or live source will do, but if live the camera needs to be cheap.   MacOS friendly preferred.

I'm guessing there is some terrific infrastructure/SDK that will make this easy, but my google-fu is so far inadequate.

Tuesday, May 3, 2016

Machine learning in one weekend?

I was excited to see the title of this quora answer: What would be your advice to a software engineer who wants to learn machine learning?   However, I was a bit intimidated by the length of the answer.

What I would love to see is Machine Learning in One Weekend.    I cannot write that book; I want to rread it!    If you are a machine learning person, please write it!   If not, send this post to your machine learning friends.

For machine learning people: my Ray Tracing in One Weekend has done well and people seem to have liked it.    It basically finds the sweet spot between a "toy" ray tracer and a "real" ray tracer, and after a weekend people "get" what a ray tracer is, and whether they like it enough to continue in the area.   Just keep the real stuff that is easy, and skip the worst parts, and use a real language that is used in the discipline.   Make the results satisfying in a way that is similar to really working in the field.   Please feel free to contact me about details of my experience.  

Monday, April 25, 2016

Level of noise in unstratified renderers

When you get noise in a renderer a key question, often hard to answer, is is it a bug or just normal outliers?   With an unstratified renderer, which I often favor, the math is more straightforward.   Don Mitchell has a nice paper on the convergence rates of stratified sampling which is better than the inverse square root of unstratified.

In a brute force ray tracer it is often true that a ray either gets the color of the light L, or a zero because it is terminated in some Russian Roulette.   Because we average the N samples the actual computation looks something like:

Color = (0 + 0 + 0 + L + 0 + 0 + 0 + 0 + L + .... + 0 + L + 0 + 0) / N

Note that this assumes Russian Roulette rather than downweighting.   With downweighting there are more non-zeros and they are things like R*R'*L.   Note this assumes Color is a float, so pretend it's a grey scene or think of just each component of RGB.

The expected color is just pL where p is the probability of hitting the light.    There will be noise because sometimes luck makes you miss the light a lot or hit it a lot.

The standard statistical measure of error is variance.    This is the average squared error.   Variance is used partially because it is meaningful in some important ways, but largely because it has a great math property:

The variance of a sum of two random quantities is the sum of the variances of the individual quantities

We will get to what is a good intuitive error message later.   For now let's look at the variance of our "zero or L" renderer.   For that we can use the definition of variance:

the expected (average) value of the squared deviation from the mean 

Or in math notation (where the average or expected value of a variable X is E(X):

variance(Color) =  E[ (Color - E(Color))^2 ]

That is mildly awkward to compute so we can use the most commonly used and super convenient variance identity:

variance(X) = E(X^2) - (E(X))^2

We know E(Color) =  pL.    We also know that E(Color^2) = pL^2, so:

variance(Color) =  pL^2 - (pL)^2 = p(1-p)L^2

So what is the variance of N samples (N is the number of rays we average)?

First it is the sum of a bunch of these identical samples, so the variance is just the sum of the individual variances:

variance(Sum) = Np(1-p)L^2

But we don't sum the colors of the individual rays-- we average them by dividing by N.   Because variance is about the square of the error, we can use the identity:

variance(X / constant) = variance(X) / constant^2

So for our actual estimate of pixel color we get:

variance(Color) =   (p(1-p)L^2) / N

This gives a pretty good approximation to squared error.   But humans are more sensitive to contrast and we can get close to that by relative square-root-of-variance.   Trying to get closer to intuitive absolute error is common in many fields, and the square-root-of-variance is called standard deviation.   Not exactly expected absolute error, but close enough and much easier to calculate.    Let's divide by E(Color) to get our approximation to relative error:

relative_error(Color) is approximately   Q = sqrt((p(1-p)L^2) / N) / ( pL)

We can do a little algebra to get:

Q = sqrt((p(1-p)L^2) / (p^2 L^2 N) ) = sqrt( (1-p) / ( pN) )

If we assume a bright light then p is small,  then

Q is approximately sqrt(1/(pN))

So the perceived error for a given N (N is the same for a given image) ought to be approximately proportional to the inverse squareroot of pixel brightness, so we ought to see more noise in the darks.

If we look at an almost converged brute force cornell box we'd expect the dark areas to look a bit noisier than the bright ones.   Maybe we do.   What do you think?