Prediction and Probability - Final

Hamze Dokoohaki
Apr 13, 2015
2 min read

Hi there

Now, I'm back with my last post on prediction and probability. I tried to give you a little bit of time to meditate on the topic as Dr.Sherman usually says. So what I have been trying to explain so far is the fact that you need to be aware of the uncertainty associated with the output of your model. But I would like to take that one step further and tell you that you also need to be able to quantify that variability.

In my last post, I claimed that I will be suggesting a methodology to quantify the uncertainty associated with the any given statistical model. Even if you are interested you can apply this method to a broader range of models like numerical models.

Let's get back to the Gompertz example I brought up in my previous post (Prediction and Probability 2). Basically, the whole idea is to simulate your outputs for huge number of times and then generate its CDF (Cumulative Distribution Function), which it helps to find out how likely is it for a certain value of output to happen.

But how are we supposed to do that?

I'll try to suggest different ways that they all lead to the same answer and I let you to decide which one of them may work better for your case.

Maybe the first thing that you need to do is to find out what distribution does your coefficients follow? You have two ways to answer this question:

1) Try to find the similar studies in literature and see if anybody has already tried to find those coefficients (You can extract your model's coefficients from literature and look for a suitable distribution)

2) Simulate your X and Y data for huge number of times with same sample size and then try to fit the model for each single pair of X, Y. For example if have simulated your X and Y for 1e6 times it means that you have been able to collect 1e6 of your coefficients. Then you can try fitting some distribution functions.

The final step that you need to take is to write simple code to take random sample form the distribution of your coefficients and find the simulated Y for the corresponding coefficients. So you end up having a pretty nice set of simulated Y for different possibilities of your models coefficients. Then you just need to make a simple CDF of your outputs and then you are good to go!

I will be really glad to see your comments and also don't hesitate to ask question!

Good luck