Bayesian Kriging by Hand

Overview

Under a certain set of conditions, the posterior distribution for a spatial process at locations Y0 given observed data and known covariance functions parameters is easy to calculate in R. This section includes two data simulations, one in 1D and another in 2D, and Bayesian kriging was done. Traditional kriging was also done to have something to compare the Bayes results to. All code for this section can be found here.

Important Mathematical Definitions and Derivations

Before going on to simulate and analyze data, it will be important to define a few definitions. The first being a Gaussian Process. A process is Gaussian if all finite-dimensional distributions are multivariate normal (MVN). That is, for any (s1, s2, …, sn) we always have (Z(s1), Z(s2),…,Z(sn)) ~ MVN(). The next term to define is stationarity. A process Z() is strongly stationary if finite-dimensional distributions are invariant under shifts. In other words, P(Z(s1) ≤ s1, …, Z(sn) ≤ sn) = P(Z(s1+h) ≤ s1, …, Z(sn + h) ≤ sn). When a process is mean zero, stationary, and Gaussian, the posterior distribution for a set of new locations is easily defined. That distribution can be seen in Figure 1. For this section of the project, mean zero stationary Gaussian data are worked with so it was easy to do Bayesian kriging.

Figure 1: Posterior of Y0 under certain conditions

How does one ensure that the data is stationary and Gaussian? In 2D there is a function in R called grf() that is explained later, but for 1D there is a trick involving the Cholesky Decomposition. For example, if Σ is a positive definite matrix (every eigenvalue is positive), then the Cholesky decomposition is Σ = LL^T, where L is a lower triangular matrix. To simulate from an exact Gaussian spatial process, all that is needed is the Cholesky factor of a covariance matrix and a vector of iid normal random variables. Multiplying these today create an exact simulation, as seen in Figure 2 for an exponential covariance function with σ^2 = a = 1.

Figure 2: Example of how to simulate a Gaussian process using the Cholesky factor

For both the 1D and 2D simulation below, the simulated data were mean zero Gaussian processes and the Bayesian kriging was done by simulating from the MVN distribution shown at the bottom of Figure 1.

1D Simulation

The exact data simulated for the 1D simulation is what was shown in Figure 2. 100 points were simulated, and the plot of them connected by a line is seen in Figure 3. For kriging, a grid of 1000 points from [0,10] were created. In order to do Bayesian kriging as shown above, four covariance matrices needed to be created. What they represented can be seen in Figure 4. Once those were calculated, all that was needed for Bayesian kriging was to simulate from a MVN distribution, which was done easily with one line of code. The resulting Bayesian and traditional kriged values can be seen in Figure 5. The overall shape between the predictions for both methods were very similar, but the traditional kriging method appeared to more closely look like the image in Figure 3.

Figure 3: 1D Gaussian Process
Figure 4: What each covariance matrix represents
Figure 5: Bayesian and tradition kriged values for 1D simulation

2D Simulation

As mentioned above, the grf() function was used to generate 2D mean zero stationary Gaussian data. The covariance model was set to be exponential with all parameters equal to one, and again 100 points were simulated. The simulated data can be seen in Figure 6, and it is clear that there was spatial correlation. Moving from left to right, the data values increased.

Figure 6: 2D Gaussian Process

For predictions for 2D, the goal was to fill in the grid from Figure 6. A grid of 4,225 points were created to be kriged on. Because of this, after calculating the variance of the posterior distribution, the dimension of it was 4225 x 4225. Because this was extremely large, even drawing one set of predictions from a MVN took my computer a little under two minutes to compute. By simulating from a MVN distribution, doing this twice using the same mean and covariance matrix could produce somewhat different results. An example of this is shown in Figure 7. Here one can see the traditional kriging estimates along with two possible bayesian estimates. The two bayesian predictions were similar in the pattern of colors, but one can see the one on the bottom left predicted a bigger space of blue values compared to the one on the bottom right. You can also see that the traditional kriging estimates were much smoother than the Bayesian ones. Overall, both Bayesian realizations followed the structure of the traditional kriging predictions, but it is be best practice to simulate from a MVN many, many times to get a good estimate. Like stated earlier, even estimating one possible set of values from the MVN took about two minutes, so it wasn’t feasible to simulate thousands of values on this big of a grid. That being said, 25 values for each grid point were simulated and the average of these samples can be seen in Figure 8. Taking the mean of many samples closely resembles the results from traditional kriging. The mean squared difference between traditional kriging and the mean of 25 samples was about 0.003, so they were practically the same.

Figure 7: Different possibilities of kriged values
Figure 8: Mean of 25 Bayesian predictions from the posterior

Conclusions

Although the mean of multiple Bayesian kriged predictions were extremely similar to what was achieved by traditional kriging, there were some differences between the two. When performing simple kriging over a set of points, the weights on the data are always the same. This means that only one set of kriged values are able to be predicted for a particular grid. These are most likely close to the true values of the process, but since there is only one possible prediction, the uncertainty is large. One benefit of Bayesian kriging is that we can simulate many different possibilities of predictions, as seen in Figure 7. One drawback of Bayesian kriging is that with a large grid of points to predict on, it is a lot more time consuming. When practicing Bayesian statistics, it is best to simulate many times from a posterior distribution and look at the mean, and even sampling only 25 times took almost an hour to compute.

One thing to note is noticing how long it took to sample 25 times from the posterior with such a simple problem. Recall at the beginning of the section we made the assumption and simulated the data so the data were mean zero, stationary, and Gaussian. In real life problems, like the rainfall data analyzed in Linear Regression Parameter Estimation, these assumptions can’t be made and the properties of the data are much more complex. That being said, it is much harder to do Bayesian kriging by hand. Although the uncertainty and interpretation is much better to understand doing Bayesian kriging, the cost is complexity and computational time, so in some situations it might make sense to do traditional kriging. Built in R functions and packages can help with computation time which is analyzed in: Bayesian Kriging Using a Built in R Function.