Forecast Uncertainty: Statistical Prediction Intervals Through Clustering Of Model Output

1968 words - 8 pages

Abstract. This paper presents a new method to calculate variable prediction intervals (PIs) to complement numerical weather prediction (NWP) forecasts. Direct outputs of NWP models are point, deterministic predictions that provide crisp values of meteorological attributes. They can be used in a number of applications ranging from air quality and climate modeling to estimating the impact of severe weather and forecasting of energy production. However, many applications would benefit from forecasts augmented by information about their uncertainty. The most common way of describing uncertainty is the use of prediction intervals (PIs), e.g., minima, maxima, and confidence level percentages. In this paper, we apply automatic clustering of NWP model outputs to obtain conditional PIs. In particular, the described approach relies on using forecast history as a source of information for uncertainty analysis. The historical forecasts are first grouped into clusters corresponding to distinct weather situations. Each cluster is then examined to determine its specific distribution of forecast errors. This leads to forecasts with different amounts of uncertainty, depending on the forecast context. To further improve the quality of PIs, we use several derived variables and reduce the number of features using principal component analysis (PCA). We also examine several clustering algorithms to determine their suitability for PI calculation. All presented methods are empirically evaluated using a set of experiments. To establish a sound way of evaluating prediction intervals, we develop a new, improved PI quality measure. Results show that the proposed clustering-based uncertainty analysis yields prediction intervals of high resolution and accuracy.
Keywords. Statistical prediction interval, Numerical Weather Prediction (NWP), Forecast verification, Probabilistic forecast.
1 Introduction
Weather prediction has numerous applications in various domains. Weather forecasts are typically made and reported in the form of an expected value for the attribute of interest in a particular time and location. Numerical weather prediction (NWP) models are advanced computer simulation systems that provide expected forecasts for a number of attributes. They capture physical atmospheric processes to model atmospheric behaviour. Although the deterministic interactions of these physical simulations yield the expected values of different weather attributes with high precision, such forecasts are uncertain due to the inaccuracy of initial conditions, low spatial resolution, and various simplifying assumptions .
In many applications, it is desirable that forecasts be accompanied by the corresponding uncertainties. Information about forecast uncertainty may be as significant as the forecast itself. For instance, to predict the amount of ice likely to accrete on a power transmission line as a result of an ice storm, uncertainty analysis would allow a more...

Find Another Essay On Forecast Uncertainty: Statistical Prediction Intervals Through Clustering of Model Output

Computer Science: Data Mining Essay

1690 words - 7 pages Clustering and Classification. Keiko A Herrick, F. H. (2013). A global model of avian influenza prediction in wild birds: the importance of northern regions. Veterinary Research. Nagabhushanam D, N. N. (Oct - Dec 2013). Prediction of Tuberculosis Using Data Mining Techniques on Indian Patient’s Data. International Journal of Computer Science and Telecommunications. Nauert, R. (2014, Jan 21). Data Mining Google May Help Predict Disease Outbreaks

Data Mining Essay

1595 words - 7 pages information in the database, analysts are able to use reduction as a reporting tool. Reduction is described to be a quantified and summarized output of information in a standardized structure based on totals, sums, statistics, and other analytics. The most important aspect about the data mining process is to structure and establish a system that will create an effective and efficient model that will represent the data in the best way. Regardless of which technique an organization uses, the system set in place to represent that model will determine what would be a wise approach based on previous data sets.

Better Data Beats Big Data

2447 words - 10 pages fitting, columns – what data was used for training. Diagonal sub-graphs correspond to fitting and prediction of schools from the same group. Values along x- and y-axes represent model accuracy. Bottom and left border correspond to the minimum of the mean accuracy minus standard error, top and left borders – maximum accuracy plus standard error. Dashed lines set standard error intervals for the row and column category for the ease of comparison

Learning from Data

9920 words - 40 pages program interface. When a software system features an API, it provides a means by which programs written outside of the system can interface with the system to perform additional functions. For example, a data mining soft system may have an API which permits user-written programs to perform such tasks as extract data, perform additional statistical analysis, create specialized charts, generate a model, or make a prediction from a

Flood Models

2204 words - 9 pages factors for the pooling of homogeneous regions. These variables were selected based on the availability and their influences on hydrological responses. The selected attributes (Table 1) were applied for multivariate statistical analysis. The analysis presented in this paper comprises three major steps. First, selected clustering variables were determined using GIS and statistical methods. Secondly, principal component and clustering analyses were

Soil Surface Temperature

1273 words - 6 pages as the training set and the rest is used as the validation data. Data analysis was performed through standard statistical methods using MATLAB. Learning methods employed are linear regression and artificial neural networks. Results show that a neural network, performs slightly better than linear regression but the amount of improvement does not justify the use of the more complicated models. Keywords: forecast, air temperature, soil temperature


4045 words - 16 pages 99.99 to specify the confidence level for the two Prediction Intervals. Mean or Individual must be selected before entering this value. Typical confidence interval values are 90, 95, and 99. Residuals. The actual value of the dependent variable minus the value predicted by the regression equation. " Unstandardized. The difference between an observed value and the value predicted by the model. " Standardized. The residual divided by an estimate of

Spaital interpolation techniques

4177 words - 17 pages from discrete point data, this will allow for the visual and statistical selection investigation of the most appropriate model for the data set. For comparitive purposes the majority of the parameters have been standardised and the default 10 category view of the surfaces used for the interpretation of result.The point data used for the study was collected in 2002 by the author through the use of a total station and handheld GPS. By predicting

Database Systems: Big Data Evolution and Efficiency

2224 words - 9 pages patterns can be looked at on the local data sources and exchange statistical information so then model correlation analysis can be evaluated to determine the relevance of the correlation [2]. Big Data will continue to be an ever changing process of both hardware and software keeping up with its needs to be adequately mined to obtain useful information. Data is constantly being created and saved because companies think they will be able to make use of

A Unique Expert System for Optimum Oil Price Estimation by Integration of Fuzzy Cognitive Map, Neural Networks and GA

1145 words - 5 pages several different univariate and multivariate statistical models such as TGARCH and GARCH to forecast daily volatility in petroleum future price returns. Kulkarni and Haidar (2009) used a model based on multilayer feed forward neural network to forecast crude oil spot price direction in the short-term. Several data preprocessing methods were tested by them. Manera et al. (2007) compared the performance of several static and dynamic forecasting

Recent Trends in Document Clustering with Evolutionary-Based Algorithms

2695 words - 11 pages Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Previously, a number of statistical algorithms had been applied to perform clustering to the data including the text documents. There are recent endeavors to enhance the performance of the clustering with the optimization based algorithms such as the evolutionary algorithms. Thus, document clustering with

Similar Essays

The Purpose Of This Project Is To Construct A Model To Attempt To Forecast The Interim Earnings Of Harley Davidson

2386 words - 10 pages 1. SYNOPSISThe purpose of this project is to construct a model to attempt to forecast the interim earnings of Harley Davidson. Harley Davidson, Inc. manufactures premium motorcycles, recreational vehicles, specialized commercial vehicles and parts and accessories. The Company's Motorcycle Division manufactures and markets heavyweight touring and custom motorcycles and a broad range of related products including riding apparel, motorcycle

Statistical Output Of Self Esteem And Life Satisfaction University Research Paper

661 words - 3 pages % Confidence Interval of the Difference Lower Upper MeanSe -10.801 213 .000 -.35748 -.4227 -.2922 T-TEST GROUPS=gender(1 0) /MISSING=ANALYSIS /VARIABLES=MeanSe /CRITERIA=CI(.95). T-Test Group Statistics gender N Mean Std. Deviation Std. Error Mean MeanSe 1 0 64 3.0000 .47709 .05964 150 2.7040 .46094 .03764 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means F Sig. t df MeanSe Equal variances assumed Equal variances

Critically Evaluate A Model Of Transnational Social Enterprises Which Helps Solving Environmental Problems Through Studying The Desertification Pr

1838 words - 8 pages third sector: transnational social enterprise can build a partnership with NPOs. Through this, NPOs can help social enterprise with experience of trees planting and project planning. Moreover, NPOs can also help social enterprise with providing numbers of volunteers. How to become a transnational social enterprise: Prediction of the operation process of this business model: A Chinese-Japanese transnational social enterprise will be taken as an

Forecasting Methods Essay

1546 words - 6 pages ways that many couldn't foresee. The ones who did manage are now in the top of all classifications. This states the importance of good predictions.A business forecast is a prediction based on past performance and an analysis of expected market conditions. The great value in making a forecast is that it forces a company to look at the future in an objective manner. In taking note of the past it stays aware of the present and thoroughly analyzes that