Print Version

Ensembles Lessons

Table of Contents

Collapse Menu | Expand Menu

Ensemble Statistics

Normal Distribution

Histogram of New York Central Park 21 December maximum temperatures from 1878 through 2011 in 10 degree Fahrenheit increments.

The real power of ensembles in weather forecasting is that they allow us to assign a probability to a forecast. To do that, we first need to be reasonably sure that the 20 or more members of the ensemble produce a range of forecasts that have a normal distribution.

What is a normal distribution?

Let’s look at the high temperature for a given day, say the Winter Solstice on December 21, over a period of many years in Central Park, New York City. Here we have a histogram of observed frequencies for these data grouped in 10°F increments. In the plot, we see a peak around the middle of the temperature range with the count falling away symmetrically to either side.

Histogram of New York Central Park 21 December maximum temperatures from 1878 through 2011 in 10 degree Fahrenheit increments, with approximate fit to normal distribution.

If we fit a curve to the histogram, we see a nice bell shape. This is approximately a normal distribution. The peak is near the center of the range of temperatures, and is neither too sharp nor too flat.

In contrast, a non-normal distribution may show a peak that's significantly off-center from the middle of the range, a peak that's too sharp or flat, or even multiple peaks!

Question

Which of the following plots shows a normal data distribution? (Choose the best answer.)

Hypothetical probability distributions for Ensemble Statistics question.

Feedback: The correct answer is (c).

Plot C shows a single peak with a nice bell shape. Plot B shows a single peak, but it is off-center in the distribution. Plot A shows more than one peak.

When we run an ensemble, we get 20 or more forecasts. As the forecast lead time increases, those forecasts will diverge. For temperature, we will see warmer and cooler temperatures for different ensemble members. If a histogram of the forecast temperatures resembles a bell-shaped curve, we have a normal distribution, and the statistics we compute will give us accurate probabilities.

Please make a selection.

Mean

Histogram of New York Central Park 21 December maximum temperatures from 1878 through 2011 in 10 degree Fahrenheit increments, with sample mean value annotated.

The mean is probably the most commonly examined attribute in all statistics. The arithmetic mean or average of a data sample is simply the sum of the values in a series divided by the total number of values in that series. In this graphic, we show the previous NYC temperature distribution with the average shown with a vertical line. As the distribution is close to normal, the mean is close to the center of the range.

Standard Deviation

Histogram of New York Central Park 21 December maximum temperatures from 1878 through 2011 in 10 degree Fahrenheit increments, with std dev.

The standard deviation (SD) measures distance from the mean, as shown in this graphic. In a normal distribution, ±1 SDs around the mean encompasses about 2/3 (68%) of the values and ±2 SDs encompasses about 95% of the values. In a mean-and-spread plot, the spread is one SD.

Rank Statistics - Median

Hypothetical, approximately normal histogram from an ensemble 5-day forecast for 2-meter temperature for a grid box.

We can sort a data set from lowest to highest and develop rank statistics. In the following examples we will consider the histogram of a hypothetical forecast from a 21 member ensemble.

On the histogram, we can locate the value where one-half of the data are lower and one-half are higher. This defines the median of the data sample. Using the median better represents the middle of a distribution than the mean, because it removes the influence of extremely high and/or low outliers in the data sample.

Question

Hypothetical, approximately normal histogram from an ensemble 5-day forecast for 2-meter temperature for a grid box.

Suppose this graphic shows 21-member ensemble forecast of temperature for Baltimore, MD. Temperatures are in degrees Celsius. Of the following, which temperature is the median? (Choose the best answer.)

Hypothetical, approximately normal histogram from an ensemble 5-day forecast for 2-meter temperature for a grid box.  Answer indicated graphically for lesson question.

The correct answer is (c) 7°C. When ranked from low to high, the median is the middle value. For an ensemble of 21 members, counting from the left in the graphic, the 11th value is 7°C.

Please make a selection.

Rank Statistics - Percentiles

When we broke our cumulative distribution at the midpoint to determine the median, we also determined the 50th percentile. That is, 50% of the temperature readings fell at or below 7°C. We can extend this idea and determine other percentile rankings. Commonly used percentiles include the 10th, 25th, 50th (or median), 75th, 90th, and 99th. In terms of weather, the 99th percentile represents an extreme event.

Mode

The most frequently observed value or interval is called the mode of the data sample. In the case of the mode, no values other than those in the most frequently observed category affect the statistic. In the case of a tie, where more than one value has the same high frequency, each value is a mode.

Questions

Hypothetical, approximately normal histogram from an ensemble 5-day forecast for 2-meter temperature for a grid box.

In the 21-member ensemble temperature forecast for Baltimore MD shown again below, which temperature is the mode? (Choose the best answer.)

Hypothetical, approximately normal histogram from an ensemble 5-day forecast for 2-meter temperature for a grid box.  Answer for mode question indicated graphically for lesson question.

The correct answer is (c) 7°C. The mode is the category with the highest count, which is the four members at 7°C.

Please make a selection.