Introduction

Motivation

Throughout my time studying statistics, most classes focus on the frequentist approach. That is, when analyzing data, parameters are unknown and fixed and probability theory describes the way data is generated. In Bayesian statistics parameters are still unknown and fixed, but they are modeled as random variables. This has a lot of advantages, as statisticians can use Bayes theorem to combine their prior beliefs and the observed data to form a new probability distribution over the parameter. This is known as the posterior distribution, and a lot of estimates can be made about mean or mode of the parameter. Figure 1 shows the formula for the posterior distribution and we can see it takes into account prior information about the parameter and the data observed. Generally the more data that is observed, there is less weight put on the prior. Different priors can lead to different posterior distributions, so it is important to be mindful in selecting them. If one is unsure about the parameters before observing data, there are certain ways to make ‘uninformative’ priors. Figure 2 is one example of how the posterior is a balance between the prior beliefs and the likelihood/data.

Figure 1: Posterior distribution formula
Figure 2: Example prior, likelihood, and posterior distribution

In this class, we have learned about estimating the posterior over a one or multi dimensional space, estimating linear regression parameters, estimating covariance matrices, getting point estimates of an unknown posterior, and some ways to sample from the posterior distribution if it is not a know probability distribution. Monte Carlo methods such as rejection sampling, Gibbs sampling, and Metropolis-Hastings sampling are ways to summarize entire posterior distributions.

This semester, I am also taking a class in spatial statistics. In this class we have learned about how to model to covariance relationship between spatial data, how to simulate spatial processes, and how to predict values of a spatial process in places where there is no observed data. This class has been very interesting as it is different than the average statistics class and it would be interesting to apply what was learned in this Bayesian class to some spatial methods. The approaches that we have learned in spatial are all frequentist methods, so the goal is to see if applying Bayesian methods could produce different results, or at the least compare the two methods. Below, there is a brief overview of some basic spatial statistics that will be helpful in understanding a lot of what was done in this project. There is so much to understanding spatial statistics, but only the topics relevant to this project were explained (most images/spatial definitions were taken directly from my spatial statistics class). This project is split into three different modules, which can be accessed below or through the projects tab at the top of this webpage.

Spatial Statistics Overview

One thing that is helpful to understanding spatial processes is understanding stochastic processes. If Z(s) is a process (think of Z as temperature at a set of s locations), it is stochastic if Z(s) is a random variable for all s. The most important part of spatial processes are the mean and covariance functions, which are needed to do spatial prediction, which is also known as kriging. These functions can be seen in Figure 3. All covariance functions involve the distances between points, which will be talked about in depth next.

Figure 3: Mean and covariance functions of a spatial process

In order to krige the process at new locations, one needs to know how the data are spatially correlated. The covariance function needs to be estimated by the observed data. Before talking about estimating the covariance function parameters, it is important to describe the nature of the function and some common examples. The general format of the covariance matrix for an observed spatial process is seen in Figure 4. One can see that along the diagonal, it is the covariance function evaluated at zero, which happens to be the variance of the process. This is because the distance between a location and itself is zero. The value the covariance function is evaluated at is always the distance between two points. Some properties that must be satisfied for a covariance function are: C(s,s) = C(0) ≥ 0 (variances are non negative), C(s1,s2) = C(s2,s1) (symmetric), the covariance function is a nonnegative definite function, and if the covariance function is isotropic then C(0) ≥ C(r) where r is the distance between two points. Some common covariance functions can be seen in Figure 5. Some common models have one or two parameters that need to be estimated and this will be explored throughout the project in a traditional and Bayesian sense. In addition to the covariance parameters, the variance of the process (σ^2) and the measurement error or nugget effect (τ^2) need to be estimated. In real life, there is almost always a nugget effect, but in this project it will be assumed/set to be zero to simplify calculations.

Figure 4: General outline of covariance matrix
Figure 5: Common covariance functions

Next, it is important to understand the traditional way geo-statisticians estimate the spatial structure (covariance function) of a spatial process. The way to do this is to fit a semivariogram to the model residuals. In other words, given a spatial process, there is likely a mean. By fitting an ordinary least squares model, a linear regression formula is created and that is known as the mean trend. Subtracting off this trend from the data leaves the residuals. Looking at the correlation of the residuals is a way to estimate the spatial structure of the entire process. A semivariogram takes binned means of neighboring points, as shown in Figure 6, and various different covariance models are fit to the plot. The best fit line will tell you the covariance function parameters. Most of the time, the best model is selected by looking at the lines on the plots, but Bayesian methods estimate the parameters by setting priors on them. After estimating the parameters of the covariance function and assessing if there is a mean or not, prediction can finally be done. The next section describes some important things about spatial kriging.

Figure 6: Example of a semivariogram

Kriging Overview

One of the main goals when working with spatial data is predicting the underlying process at locations where there is no observed data. Spatial prediction is called kriging. To understand this, here is a small example. In Figure 7 below, there are locations throughout the midwest where ozone concentration is measured. If someone works for an environmental agency or a company that cares about ozone concentration, they most likely care about the ozone concentration across the entire state, not just where there are measurement devices. After looking at the way the observed data are correlated and fitting a semivariogram to estimate the covariance function, you can krige to a grid of new locations. One example of kriged ozone concentration to the domain can be seen in Figure 8.

Figure 7: Observed ozone concentration at various sites
Figure 8: Kriged (predicted) ozone concentration over a domain.

There are many variations of kriging, such as simple, ordinary, and universal. Simple kriging is when the mean of the spatial process is known, and ordinary is when the mean is unknown and has to be estimated. Universal kriging is the most complex type, and occurs when the mean of the process is a linear function (say longitude + latitude), and the parameters have to be estimated. Throughout this project different assumptions will be made, such as a known mean or known covariance function, to easily show the Bayesian version of spatial methods. In real life though, it is very likely that nothing is known and it all has to be estimated.

Here is a little bit of the math behind kriging. Kriging is the best linear unbiased predictor. When performing kriging, weights are assigned to each observed data point. For example, Figure 9 shows the formula for simple kriging at location s0 when the mean is known to be zero. The covariance matrices in front of the Y are known as the weights assigned to each observed Y.

Figure 9: Simple kriging formula when the mean of the process is zero

Project Layout

The original goal for this project was to do Bayesian universal kriging by hand on a complex dataset. Quickly issues arose as this is very complicated. Looking at the kriging formula in Figure 9 note that the kriging weights involve the product of covariance matrices. One role Bayesian analysis can play in the spatial statistics space is estimating parameters of the covariance functions. When the predicted space is large, like the grid in Figure 8, this can be very time consuming and computationally expensive as you have to compute the product between large matrices many, many times. When doing universal kriging, depending on the covariance function of the data, up to six different parameters (an intercept term, two or more ordinary least squares parameters, the variance of the process, the range and smoothness of the covariance function, the nugget effect, etc.) may have to be estimated. Because of this, the project was split into three different parts which can be accessed below:

  • Regression Parameter Estimation: In this module, the North American Rainfall dataset from the fields package in R is analyzed. This dataset is very large and the mean is unknown. The problem is simplified by estimating the mean using a linear model and fitting a semi-variogram to estimate covariance parameters. These estimated covariance parameters are then assumed to be ‘known’, and normal spatial kriging is performed on the data. Instead of performing a complete Bayesian version of kriging, Bayesian techniques are used to estimate the mean of the process (regression parameters). Metropolis-Hastings sampling is then used to simulate various mean values and use that mean when kriging. After doing this, the predictions from kriging when estimating the mean by ordinary least squares and by Bayesian techniques are compared.
  • Bayesian Kriging by Hand: This section contains very simple case where data is simulated so the correlation function parameters are known. This specific simulation method used leads to a known conditional posterior distribution. Samples are then taken from this distribution and compared the predicted values from traditional kriging.
  • Bayesian Kriging Using a Built in R Function: In this final module, parameters of the covariance function are estimated using built in R functions from the spBayes package. This package is also used to krige at a new location. The kriged estimates between traditional and Bayesian kriging are looked at as well as the uncertainty between the two.