Linear Regression Parameter Estimation

Overview

For this section, the North American Rainfall data was used to perform simple kriging. The mean of this dataset was not zero, so in order to do simple kriging the mean had to be estimated. The mean was estimated in two ways, using the traditional spatial approach and using a Bayesian method. A traditional spatial approach using ordinary least squares (OLS) was used to estimate the mean. A Bayesian approach set a prior on the β’s and used the likelihood to find the posterior distribution. An algorithm called Metropolis-Hastings was described and explained why it is used to sample from the posterior. The kriging results were compared using the two methods. The code for traditional kriging can be found here, and the Bayesian version found here.

Analysis of Data

As stated above, the North American Rainfall dataset from the fields package in R was used for this portion of the project. The description of the dataset from the help file is: “Average rainfall in tenths of millimeters for the months of June, July and August for the period 1950-2010. Data is based on 1720 stations located in North America.”. Before analyzing the data, the response variable was converted to inches of rain so it was easier to interpret. A sinusoidal projection of the coordinates was also performed. Because of the way the earth is shaped, some packages in R will not accurately calculate the distance between two coordinate pairs. Projecting the coordinated eliminates any issues with this. A plot of the average precipitation at various sites on a projected coordinate scale is shown in Figure 1. Looking at this plot, there are obvious spatial trends. Near Florida and the gulf coast there was a large amount of average precipitation. It also appears that average precipitation increased as one moved from the west to the east. The mean of this data was also not zero. Figure 2 shows a histogram of all of the values recorded for average precipitation in inches. After analyzing this histogram, the mean appeared to be somewhere around 10.

Figure 1: Plot of the data
Figure 2: Histogram of the response values

Traditional Simple Kriging with an Unknown Mean

In order to do simple kriging with an unknown mean, it had to be estimated. Since there was evidence of spatial correlation, it made sense to model the data as a function of latitude and longitude. To simplify this for when Bayesian kriging was performed, once the mean was estimated it was considered to be ‘known’. The formula for simple kriging with a known mean is seen in Figure 3. After creating a linear model, the mean was calculated by using the coefficient estimates. The code for this can be seen in Figure 4. The estimates for β by OLS were 15.1, 14.4, and -7.12.

Figure 3: Simple kriging with known mean formula
Figure 4: Estimating the mean of the data and calling it known

After finding the mean of the data, to estimate the covariance parameters of the data a semivariogram was fit on the residuals of the model. Looking at Figure 5, it was clear that there was correlation in the residuals. The residuals tended to be higher on the outskirts of the domain and lower in most of the United States. A semivariogram of the residuals can be seen in Figure 6. This plot is a way to assess the correlation between the residuals. Since there is less data that are a large distance apart, it is best practice to only look at the first half of the plot (in this case, only look at the distances below 0.5). After creating this plot there are several different ways to fit a line to it to estimate the covariance parameters, but the eyefit() function was used. After fitting this, the best covariance model was found to be an exponential with a range of a = 0.24 and a variance of σ^2 = 12.94. These parameters were used when calculating the kriging weights.

Figure 5: OLS residuals
Figure 6: Binned semivariogram of the residuals

After computing the needed covariance matrices for simple kriging, the kriged values can be seen in Figure 7. Overall, the estimates looked reasonable and fill the map out nicely. Since this was such a large dataset, it was very hard to compare the differences between traditional parameter estimation/kriging and the Bayesian version. Therefore, a prediction was also made at one point, which is shown in red in Figure 8. The traditional kriging estimated this point to be 1.7073, which is compared to the Bayesian prediction later on.

Figure 7: Observed data and traditional simple kriging predictions
Figure 8: Prediction at just the red point

Bayesian Parameter Estimation Using Metropolis-Hastings Algorithm

The traditional way to estimate the mean was seen above, which was by using OLS. The Bayesian way to estimate the β’s is to set a prior on them. The math is a bit messy, but it can be shown that if there is a normal prior placed on β and the data given β is also normal, then the posterior of β given the data will also be normal with the following parameters shown in Figure 9. In Bayesian statistics, it is often advised to calculated priors, likelihoods, and posteriors on the log scale, so that was done in the code. To find the posterior of β, Σn was said to be a diagonal matrix of 9 (good guess since σ^2 was estimated above at about 13). Σp was set to be a diagonal matrix of 1s, since we were not sure if there is any covariance between the different β parameters. The laplace() function in the LearnBayes package in R is helpful in finding the posterior mode. It works by plugging in a function for the log of the posterior and then a vector of initial guesses for β. After using this function, the estimates for β were 13.34, 13.60, and -4.76. When comparing these to what was estimated by OLS (15.1, 14.4, and -7.12) they were a little different.

Figure 9: Posterior for β

Instead of using the estimates from the laplace() function to do kriging, it would be best to generate many samples of possible β’s and calculate the mean of the process for all of these samples. One way to generate samples from a posterior distribution is the Metropolis-Hastings Algorithm. The format of this algorithm can be seen in Figure 10. The way this algorithm works is that a user proposes an initial value for the β’s, then a proposal β is sampled from a proposal distribution that is similar in shape to the posterior. The proposal is accepted with the probability defined in bullet 3 of Figure 10. This algorithm works because the simulated chain is a Markov chain with a unique stationary distribution, and this distribution is equal to the target posterior distribution. The Metropolis-Hastings method is a Monte Carlo method. These methods are great because taking the mean of the chain is a great estimate of the posterior mean.

Figure 10: Outline of Metropolis-Hastings algorithm

In this algorithm, 20,000 iterations of it were run. It is best to exclude a burn-in period when looking at the samples. A burn-in period is necessary because if the starting value is not in the bulk of the posterior distribution, it will take a few iterations to get to the bulk, and the information outside of the bulk is not important. This insures the chain has time to converge to the true value. The burn-in period chosen in this algorithm was 5,000 iterations. After excluding the burn-in period, the means of each chain were 13.36, 13.60, and -4.77, which were very similar to what was calculated above, indicating that the chain converged correctly. Another way to assess the convergence of chains is to look at trace plots. The trace plots for the algorithm can be seen in Figure 11. Overall, the trace plots looked okay. Flat parts of the chains were evident, which indicated a lot of proposals getting rejected and chain remaining at the same value. Ideally there wouldn’t be any noticeable flat parts, but they still seemed to converge around a single value. If more time allowed, it would be important to improve these.

Figure 11: Trace plots of β0 (top), β1 (middle), and β2 (bottom)

Kriging with Bayesian Parameter Estimates

Now that samples of the β’s were created, it was time to do simple kriging. This was not a full on Bayesian kriging since the traditional simple kriging formula was used, but the estimates of the β’s were used to calculate the process mean. The chain calculated after the burn-in was excluded is a 15,000 by 3 matrix. That meant that the first row of the matrix was used to calculate the mean and simple krige. That predicted value was stored and then the mean was recalculated using the second row of the matrix and kriging was done again. This was repeated for the entire chain and produced 15,000 predictions for each point. The mean of those predictions are shown in Figure 12. This plot looks very similar to the one in Figure 7, but there are some subtle differences. It was easier to assess these differences by focusing on a single point. The distribution of 15,000 predictions of the single point in Figure 8 is seen in Figure 13. The mean of all of the predictions was 1.70728, which was extremely similar to that estimated by traditional kriging using OLS estimates to estimate the mean.

Figure 12: Mean of 15,000 kriged predictions
Figure 13: Distribution of predictions for a single point

Conclusions

Overall, estimating the mean by OLS versus Bayesian methods produced very similar kriging results. One benefit to estimating the parameters the Bayesian way was that there is a way to simulate many values of the β’s through Metropolis-Hastings once a posterior distribution is defined. This allowed for a range of values to be predicted for a single point, as seen in Figure 13. The trace plots in Figure 11 should have been less flat, but overall from the kriged results it can be concluded that the samples of β were sufficient.