If I were trapped on a desert island with only one data structure allowed I would bring my trusty BVH. I have been liking the BVH for some time in spite of all the k-d tree advocates (see the SIGGRAPH 05 notes for a discussion before ray packets were integrated) despite them being "slow". One reason is simplicity-- dividing on objects leads to cleaner implementations than dividing space, and leads to O(N) bounded memory use. My basic software rule is KISS rules! . Another reason is they handle many dynamic models well as has been shown by the collision detection community. It turns out they can also be fast.
Solomon Boulos and Ingo Wald have done a bunch of nice work to packetize a BVH (a ray packet is a set of rays that are traced together). The first optimization (and a huge one) is to descend the BVH as soon as any of the rays hit a box. Maintaining a "first not dead ray" index makes this more efficient. Another optimization is to use interval arithmetic to test a whole packet against a box for a conservative trivial reject. See Solomon's tech report for details (it is really simple even though it sounds fancy, and has a nice mapping to C++ code). Note that both of these optimizations are algorithmic and are mostly orthogonal to SSE optimizations (the descend if one hits optimization makes SSE less helpful).
Finally, the BVH traversal has a low penalty for false positives on boxes, so conservative boxes like you get for moving objects, and well as the lower intrinsic coherency of secondary packets, are not so bad for the BVH.
An important disclaimer: I have previously (in the early 90s) been an advocate of k-d, and later the grid. So maybe this is just the end of the first cycle in my opinion!
It does help if you use the surface-area heuristic to build (that was what it was first invented for in fact-- the BVH). Nobody has ever justified to my satisfaction why the SAH greedy algorithm works so well.