Approximative Bayesian Computation (ABC) and nonlinear agronomic models
Hello everyone,
I'm soooo, soo glad that I have the chance to share one of my new experiences with you.
Well,... as the title suggests, I will be talking about two different Approximative Bayesian Computation (ABC) methods called importance sampling and rejection sampling for optimizing nonlinear models. Wikipedia defines ABC as the follows:
"Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, thelikelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. ABC methods bypass the evaluation of the likelihood function".
Basically, I'm using ABC methods to estimate the posterior distribution of some input parameters to a model with a given prior distribution.
These two very well known methods can be applied to any model, but since I was more comfortable with the agronomic ones, I chose to work with a coupled photosynthesis-stomatal conductance model for simulation of photosynthesis. This model is implemented within the BioCro package in R by Dr.Fernando Miguez.
The idea behind the importance sampling and rejection sampling is fairly similar. In these methods, you take a large number of samples from the prior distribution of parameters and then accept/reject them or assign them a weight based on thier likelihood.
I found the following algorithm very straightforward and easy to follow (This is taken from a poster presented by Hugo Winter - h.winter@lancaster.ac.uk ):
So as an example, I implemented the rejection sampling algorithm to optimize the the two parameters of the photosynthesis model. The photosynthesis function looks like this :
A <- c4photo(Qp, Tl, RH, vmax = 39, alpha = 0.04, kparm = 0.7) Arguments:
Qp quantum flux (direct light), (micro mol m-2 s-1). Tl temperature of the leaf (Celsius). RH relative humidity (as a fraction, i.e. 0-1). vmax maximum carboxylation of Rubisco according to the Collatz model. alpha alpha parameter according to the Collatz model. Initial slope of the response to Irradiance. kparm k parameter according to the Collatz model. Initial slope of the response to CO2.
Where I had Qp, Tl and RH measured and I was trying to optimize the vamx and alpha parameter for observed A (Co2 assimilation).
Based on the algorithm I just presented, we take a large number of samples from the prior distribution of alpha and vmax and then we feed the model with them along with the measured Qp, Tl and RH. The model then spits out a set of simulated 'A' for random vamx and alpha samples. What we need to do then is to find the likelihood of the observed 'A' given the simulated values ( P(A.observed|M(vmax,alpha)) ).
Finally, we use a threshold (h) for throwing out the set of samples which their likelihood is less than that (the likelihood also can be found using the Euclidean distance between the observed or simulated ).
Actually, since picking out the threshold may get a little tricky sometimes, I didn't throw out anything until I figure out what 'h' may work the best for me. In the figure below, you can see that I have plotted my two parameters with their likelihood and I have colored different 'h' values to find my area of interest. The light grey is the least likely set of parameters while the black is the most likely set of parameters in my case.
You can find my colpete code here:
I just hope you enjoyed reading my post .
Good luck
Hamze