Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Monte Carlo Crash Course: Sampling (thenumb.at)

89 points by ibobev 22 hours ago | 11 comments

vitus 17 hours ago [-]

Perhaps the most commonly-cited (but not actually all that practical) example for rejection sampling is estimating the value of pi (as hinted by the first example under uniform random sampling): generate two random numbers between 0 and 1, then check the fraction of points that satisfy x^2 + y^2 <= 1. As number of samples tends to infinity, this will converge to pi/4. (You can instead take x,y from (-1,1) if you want to generate points on the full circle.)

> However, rejection sampling is only efficient when f_Omega can make use of a significant proportion of the probability density in f_D.

Perhaps a more relevant example: the unit n-sphere encloses a vanishing amount of volume as the number of dimensions increases.

https://en.wikipedia.org/wiki/Volume_of_an_n-ball

This is one of those weird consequences that gets labeled as the curse of dimensionality, especially in ML contexts.

"As the dimension d of the space increases, the hypersphere becomes an insignificant volume relative to that of the hypercube."

https://en.wikipedia.org/wiki/Curse_of_dimensionality#Distan...

jxjnskkzxxhx 11 hours ago [-]

Why does the volume of the hypersphere relative to the volume of the hypercube matter?

jdhwosnhw 11 hours ago [-]

Because if you are trying to draw samples from a hypersphere via rejection sampling from samples of the hypercube, your sampling efficiency will go down exponentially with dimensionality. A smaller and smaller percentage of your samples will go unrejected as dimensionality grows

jxjnskkzxxhx 3 hours ago [-]

Hmmm I don't think that's how it works. If you're trying to estimate the ratio of the volume between A and B and B is contained in A, and you randomly sample from A, the probability that the sample also belongs to B is smaller the smaller the B/A ratio, yes that's the point of the methodology. It's not less "efficient", every single data point contributes to the estimate the same way.

If the ratio B/A is 1/1 and you sample 1M times, 100k of the samples will also be contained in B. Are you saying that the other 900k somehow go unused?!

If the ratio B/A is 1/100, and you sample the same number of times, only 10k of the samples will be contained in B. Are you saying that's less "efficient"? You need both the numerator and the denominator!

What is the ratio B/A is 1/100 and you'd say that the methodology is very inefficient here, and I tell you that actually we're estimating the ratio A/B. Does it become very efficient? Surely it can't change as the two ratios arent independent.

No, I think your comment misses the point (or I missed the point of your comment).

vitus 1 hours ago [-]

(note: I am going to be imprecise here by describing the volume of the unit hypersphere, when such a concept does not exist, as the hypersphere only describes the surface of such a construct. It is more correct to call it the volume of the n-ball, or the volume enclosed by the unit hypersphere, but I'm not going to use that terminology throughout, in the interest of expediency.)

> What is the ratio B/A is 1/100 and you'd say that the methodology is very inefficient here, and I tell you that actually we're estimating the ratio A/B. Does it become very efficient? Surely it can't change as the two ratios arent independent.

That's exactly it, though. Rejection sampling is not primarily used to estimate ratios, but rather to draw points from a distribution by sampling a broader distribution and then rejecting points that don't satisfy some criteria. If you're attempting to sample the unit hypersphere by picking N points between 0 and 1, and rejecting any sets where sum(x_i^2) > 1, then you're going to be throwing away most of your data points in the process.

Going back to the topic of volume ratio estimation, though: both of you are underestimating quite how quickly the volume drops off. The unit hypersphere's volume decreases super-exponentially (there's a factorial in the denominator!), so if you're dealing with a mere 50 dimensions, you're looking at a ratio of 10^-13 [0]. In that regime, adding an extra dimension shrinks that volume by a factor of about 3; by the time you get to 100, that factor increases to 4.

If you're still only sampling a million points, chances are very good that you're going to decide that the ratio is 0. In order to accurately estimate the ratio, you need progressively more and more samples just to compensate for a slight increase in dimension.

I'd expect it to be much more efficient to use the change-of-coordinates approach the article also mentions, although I have personally never thought about a coordinate system well-suited to the 50-dimensional hypersphere. Probably 49 angles and a radius that's basically almost always 1, and a weird Jacobian that I don't want to compute by hand?

Also, as I mentioned, using rejection sampling to estimate the value of pi is a cute exercise but quite useless in practice compared to any of the more direct methods. If you sample a million points in the unit square, you'll end up with 3.142 +/- 0.0016. Meanwhile, if you use the fourth continued fraction for pi (355/113), you already have 3.141592(9)... for far less computation.

[0] https://www.wolframalpha.com/input?i=volume+of+unit+49-ball

jxjnskkzxxhx 1 hours ago [-]

> That's exactly it, though.

Ok, so instead of estimating B/A=1/100 just estimate A/B=100. What's the problem with this argument?

vitus 1 hours ago [-]

My point is that we're not just talking about A/B of 100, but rather A/B of 10000000000000 or more. If you're inverting the ratio we're estimating, then instead of estimating the value at 0, we're estimating it at infinity (or, okay, if you want to use Laplace smoothing, then a million, which is far too low).

There are situations where you would use rejection sampling because generating 100x more random numbers is much cheaper than doing the complex calculations required to accurately model the space in question. Maybe 50 dimensions (and the 13 orders of magnitude difference) isn't big enough to raise those concerns. If we instead talk about 100 dimensions, then we're dealing with a difference of 40 orders of magnitude, and if you tell me that still doesn't matter, then I don't know what to say.

jxjnskkzxxhx 42 minutes ago [-]

You haven't addressed my question at all. If you think estimating 1e-1000 is a problem, then estimating 1e+1000 shouldn't be. But one is just the inverse of the other. What's the problem with this argument, you haven't answered it at all.

vitus 31 minutes ago [-]

> If you think estimating 1e-1000 is a problem, then estimating 1e+1000 shouldn't be.

They're both problems. If you want to estimate 1e1000 via sampling individual points, then you need at least that order of magnitude of samples. If all of your data points fall in one class, then it doesn't matter what you're trying to calculate from that.

As I said: "If you're inverting the ratio we're estimating, then instead of estimating the value at 0, we're estimating it at infinity (or, okay, if you want to use Laplace smoothing, then a million, which is far too low)."

27 minutes ago [-]

jxjnskkzxxhx 20 minutes ago [-]

> If you want to estimate 1e1000 via sampling individual points, then you need at least that order of magnitude of samples.

Ok so if the ratio is 1/2 how many samples do you need?

atum47 16 hours ago [-]

Slightly related, I once created an app that calculate the volume of a random shape using Monte Carlo https://github.com/victorqribeiro/monteCarlo

Rendered at 12:40:59 GMT+0000 (Coordinated Universal Time) with Vercel.