Model validation is the most important step in the model building process; however, it is often neglected. Even when the model is validated, it is often not done adequately. It often consists of taking a few experimental data points and plotting these points on the same graph as the model. There are two different types of models: engineering or scientific models and statistical models. Engineering and scientific models are often built using equations from the literature and from derived equations. In this model type, there is often a combination of equations that are used to calculate the natural phenomena that is occurring -- and there is often little to no available data to help build the model.
Figure 1. Comparison between fuel cell engineering model and experiments at 298 K and 1 bar.
The other model type is a statistical or empirical model. These models are generated from known datasets. Often mathematical model validation may only consist of quoting the R2 statistic from the fitted line or curve. Unfortunately, a high R2 value does not mean that the data actually fits the model well. If the model does not fit the data well, this negates the purpose of building the model in the first place.
There are many statistical tools that can be used for model validation. One of the most useful tools is graphical residual analysis. There are many types of plots of residuals that allow the model accuracy to be evaluated. There are also several methods that are important to confirm the adequacy of graphical techniques. To help interpret a borderline residual plot, a lack-of-fit test for assessing the correctness of the functional part of the model can be used. The number of plots that can be used for model validation is limited when the number of parameters being estimated is relatively close to the size of the data set. This occurs when there are designed experiments. Residual plots are often difficult to interpret because the number of unknown parameters.
The residuals from a fitted model are the differences of the responses at each combination of variables, and the predicted response using the regression function. The definition of the residual for the ith observation in the data set can be written as:
with denoting the ith response in the data set and represents the list of explanatory variables, each set at the corresponding values found in the ith observation in the data set. The distance from the line at 0 is how bad the prediction are for that value. Since the Residual = Observed – Predicted, then the positive values for the residual (on the y-axis) mean the prediction was too low, and negative values mean the prediction was too high; 0 means the guess was exactly correct.
Figure 2. Graphs of predicted, actual values, and standardized residuals. (http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/#the_top)
If a model is adequate, the residuals should have no obvious patterns or systematic structure. The primary method of determining whether the residuals have any particular pattern is by studying the scatterplots. Scatterplots of the residuals are used to check the assumption of constant standard deviation of random errors. The residual pattern is good is they are:
(1) symmetrically distributed, tending to cluster towards the middle of the plot
(2) clustered around the lower single digits of the y-axis (e.g., 0.5 or 1.5, not 30 or 150)
(3) there are no clear patterns
Figure 3. Example of good residual plot. (http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/)
If the plots are not evenly distributed vertically, they have an outlier, or they have a shape to them. If you can detect a clear pattern or trend in your residuals, then your model has room for improvement. Most of the time a decent model is better than none at all. So take your model, try to improve it, and then decide whether the accuracy is good enough to be useful for your purposes.
Figure 4. Examples of bad residual plots. (http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/)
Drifts in the measurement process can be checked by creating a “run order” or “run sequence” plot of the residuals. These are scatterplots where each residual is plotted versus an index that indicates the order (in time) in which the data were collected. This is useful when the data have been collected in a randomized run order, or an order that is not increasing or decreasing in any of the predictor variables. If the data are increasing or decreasing with the predictor variables, then the drift in process may not be separated from the functional relationship between the predictors and the response -- this is why randomization is encouraged when planning out the design of experiments.
Figure 5. Example Run Order Plot.
A lag plot of residuals helps to assess whether the random errors are independent from one to the next. If the errors are independent, the estimate of the error in the standard deviation will be biased, which leads to improper inferences about the process. The lag plot works by plotting each residual value versus the value of the successive residual. Due to the way that the residuals are paired, there will be one less point than most other types of residual plots.
There will be no pattern or structure in the lag plot if the errors are independent. The points will appear randomly scattered across the plot, and if there is a significant dependence between errors, there will be some sort of deterministic pattern that is evident.
Figure 6. Example Lag Plot.
Figure 7. Potential Model Issues Exposed by Residuals.
When we fit a model to a particular data set, one or more problems may occur. Most common among these are the following:
1. Non-linearity of the response-predictor relationships.
2. Correlation of error terms.
3. Non-constant variance of error terms.
5. High-leverage points.
An assumption of many models is that the error terms have constant variance. However, it is often the case were variances are not constant and can increase with the value of the response. One common scenario is when the residual plots have a funnel shape. The funnel shape indicates that the residuals increase with the fitted values.
Figure 8. Non-constant Variance of Error Terms.
Residual plots are the most valuable tool for assessing whether variables are missing in the functional part of the model. However, if the results are nebulous, it may be helpful to use statistical tests for the hypothesis of the model. One may wonder if it may be more useful to jump directly to the statistical tests (since they are more quantitative), however, residual plots provide the best overall feedback of the model fit. These quantitative tests are termed “lack-of-fit” tests, and there are many of them in any statistics textbook.
The most commonly used strategy is to compare the amount of variation in the residuals with an estimate of the random variation in the model using an additional data set. If the random variation is similar, then it can be assumed that no terms are missing from the model. If the random variation from the model is larger than the random variation from the independent data set, then terms may be missing or unspecified in the functional part of the model.
Comparing the variation between experimental and model data sets is very useful, however, there are many instances where a replicate measurement are not available. If this is the case, then the lack-of-fit statistics can be calculated by partitioning the residual standard deviation into two independent estimators of the random variation in the process.
One estimator depends upon the model and the means of the replicated sets of data (σm), and the other estimator is a standard deviation of the variation observed in each set of replicated measurements (σr). The squares of these two estimators are often called “mean square for lack-of-fit”. The model estimator can be calculated by :
where p is the number of unknown parameters in the model, n is the sample size of the data set used to fit the model, nu is the number of combinations of predictor variable levels, is the number of replicated observations at the ith combination of predictor variable levels.
If the model is a good fit, the value of the function would be a good estimate of the mean value of response for every combination of predictor variable values. If the function provides good estimates of the mean response at the ith combination, then σm should be close in value to σr and should also be a good estimate of σ. If the model is missing any important terms, or any of the terms are correctly specified, then the function will provide a poor estimate of the mean response for some combination of predictors, and σm will probably be greater than σr.
The model dependent estimator can be calculated using :
Since σr depends only on the data and not on the functional part of the model, this indicates that σr will be a good estimator of σ, regardless of whether the model is a complete description of the process. Typically, if σm > σr, then one or more parts of the model may be missing or improperly specified. Due to random error in the model, sometimes σm will be greater than σr even when the model is accurate. To insure that the model hypothesis is not rejected by accident, it is necessary to understand how much greater can σr possible be. This will insure that the hypothesis is only rejected when σm is greater than σr. A ratio that can be used when the model fits the data is :
The probability of rejecting the hypothesis is controlled by the probability distribution that describes the behavior of the statistic, L. One method of defining the cut-off value is using the value of L when it is greater than the upper-tail cutoff value from the F distribution. This allows a quantitative method of determining when σm is greater than σr.
The probability specified by the cutoff value from the F distribution is called the “significance level” of the test. The most commonly used significance value is α = .05, which means that the hypothesis of an adequate model will only be rejected in 5% of tests for which the model really is adequate. The cut-off values can be calculated using the F distribution described in most statistics textbooks.
Sometimes models fit the data very well, but there are additional unnecessary terms. These models are said to “over fit” the data. Since the parameters for any unnecessary terms in the model usually have values near zero, it may seem harmless to leave them in the model. However, if there are many extra terms in the model, there could be occurrences where the error from the model may be larger than necessary and may affect conclusions drawn from the data.
Over-fitting often occurs when developing purely empirical models for experimental data, with little understanding of the total and random variation in the data. This happens when regression methods fit the data set instead of using functions to describe the structure in the data. There are models that sometimes are made to fit very complex patterns, which actually may be finishing structure in the noise if the model is analyzed carefully.
To determine if a model has too many terms, statistical tests can also be used. The tests for overfitting of the data are one area in which statistical tests are more effective than residual plots. In this case, individual tests for each parameter in the model are used rather than a single test. The test statistics for testing whether or not each parameter is zero are typically based on T distribution. Each parameter estimate in the model is measured in terms of how many standard deviations it is from its hypothesized value of zero. If the parameter’s estimated value is close enough to the hypothesized values that any additional deviation can be attributed to random error, then, the hypothesis that the parameter’s true value is not zero is accepted. However, if the parameter’s estimated value is so far away from the hypothesized value that the deviation cannot be plausibly explained by random error, the hypothesis that the true value of the parameter is zero is rejected.
The test statistic for each of these tests is simply the estimated parameter value divided by its estimated standard deviation:
Equation provides a measure of the distance between the estimated and hypothesized values of the parameter in standard deviations. Since the random errors are normally distributed, and the value of the parameter is zero, the test statistic has a Student’s t distribution with n - p degrees of freedom. Therefore, the cut-off values from the t distribution can used to determine the amount of variable that is due to random error. These tests should each be used with cutoff values with a significance level of α/2 since these tests are generally used to simultaneously test whether or not a parameter value is greater than or less than zero. This will insure that the hypothesis of each parameter equals zero will be rejected by chance with probability α.
Fuel cell validation is the most important step in the model-building process. However, little attention is usually given to this important step. A fast method for analyzing the validity of a model is look at plots of residuals versus the experimental factors, run plots and lag plots. These plots give a good feel for how accurately a model fits the experimental data, and how dependable it is. If residual scatterplots are used with one or more common statistical tests to discern fit, there will be substantial evidence that a model is a good fit to the experimental data.