# Data Analysis: Baseball Era/Team Paper/Res 341 With Instructor Notes

Data Analysis: Baseball ERAThe problem to be examined will be whether or not teams that have above average ERAs (above the mean) will be able to win as many games as teams with lower than average ERAs (below the mean). The form for the hypothesis chosen was two-tailed hypothesis, and it states:Two-tailed hypothesisNull Hypothesis:(verbal form)Ho: No statistically significant difference in the mean wins between teams that have ERAs below the mean and teams that have ERAs above the mean.(Numeric form)Ho: μ (ERA below mean) = μ (ERA above mean)Alternate Hypothesis:(verbal form)H1: A statistically significant difference exists in the mean wins between teams that have ERAs below the mean and teams that have ERAs above the mean.(Numeric form)H1: μ (ERA below mean) ≠ μ (ERA above mean)The outcome of the two-tailed hypotheses will be that Team A will either reject the null or fail to reject the null.Four peer-reviewed research articles that are relevant to the research topic have been chosen and summarized as follows:Article 1: All Time Career Pitching Leaders provides information on the mean ERA of the all time leaders in both ERA and wins.Article 2: Inside Dish provides news briefs relating to baseball in 2008.Article 3: Pitching POWER discusses the importance of pitchers and ERA in general.Article 4: The Influence of Salary Arbitration on Player Performance provides a study that used ERA and strikeout to walk ratios to determine a pitcher's effectiveness.A sample size of 30 Major League Baseball teams from the Major League Baseball Data Set has been used for data hypothesis testing of the ERA and Wins for each team (Lind, et al, 2008). The actual population samples used for the research came from the sub data set X10: ERA (earned run average), within the Major League Baseball Data Set, for the sampling design seeing as the elements of each team to include individual team players are located within the ERA population samples. The population frame consists of the stats of the elements within the population samples. The sample statistics have been used as estimates of the population parameters. For the data collection a display of descriptive statistics data in both tabular and graphical format can be found in the following table and graph.Table 1.1: Descriptive Statistics Data Tabular Format (page 4).Graph 1.1: Descriptive Statistics Data Graph Format (page 5).Descriptive Statistics DataHiERA WinsLoERA WinsMean Wins HiERAMean Wins LoERA95887487.1259595Median HiERAMedian LoERA74937285.56799Standard Dev. HiERAStandard Dev. LoERA798010.495420256.7614101097183Variance of the HiERAVariance of the LoERA6990110.153846245.716666675689Min. HiERAMin. LoERA778356797382Max. HiERAMax. LoERA67819510071100Range HiERARange LoERA7583392167888179Table 1.1: Descriptive Statistics Data Tabular FormatDescriptive Statistics DataGraph 1.1: Descriptive Statistics Data Graph FormatSome of the primary data collection methods that could be used include but are not...

