Daytime Statistics

(Satisfies the Quantitative Reasoning requirement.) An elementary course in descriptive and inferential statistics emphasizing procedures commonly used in measurement, evaluation, and research in the social and behavioral sciences, as well as in business, education, and theology. Included are the basic concepts of sampling distributions, probability, statistical inference, t-tests, ANOVA, Chi-square, correlation, and regression. Students must have access to Microsoft Excel and the Internet. Prerequisite: SAT score of 480/ACT score of 20, or two years of high school algebra.

Since the Statistics class is held in the Health and Science Center’s Computer Lab on computers also used by other students, statistics students must store their work on personal, portable USB memory drives, available through the NU Bookstore for a nominal price. Students bring these USB drives to every class to record their lessons and work.

This Introduction to Statistics class utilizes Microsoft Excel and various online statistical calculators to provide data analysis because they (or equivalent programs) are readily accessible in nearly all business offices, are provided to all students by NU’s Microsoft Software Agreement, and allow the class to concentrate on the principles involved rather than on manual calculations. HOWEVER, the professor acknowledges that some of Excel’s data analysis routines have made assumptions over which users do not have any options and that may inject small errors into the results. It is understood that businesses which demand high accuracy in the data they analyze and intermediate/advanced statistics courses use computer programs specifically designed for the appropriate data analysis. NU students have access through computer labs to SPSS.

Some humorous Statistics links:

 

Section 1 (Weeks 1 & 2) Introduction & Data Distributions

 

The common thread which runs through all branches of science is the scientific method which includes deduction and induction. The technology of deduction is mathematics, while that of induction is statistics. In order to make use of technology of deduction or induction, that is, mathematics or statistics, one has to translate his information into numbers. Scientific instruments are the devices which quantify the observed phenomenon.

The age of a branch of science can almost be determined by the amount of mathematics and statistics it uses. An old science which has already developed a number of basic principles can use mathematics to deduce more information from these principles; while a young science depends upon statistics to develop the basic principles. For example, physical sciences use more mathematics and less statistics than biological sciences. Genetics furnishes a good illustration of this point. In the early days of genetics, statistics alone was used. Now, after some basic laws of genetics have been established, mathematics is being used extensively. Some population geneticists do not do any experimental work. Their primary job is to deduce new information from previously established principles.

Li, Jerome C.R. (1964). Statistical Inference I. (p. 1). Ann Arbor, MI: Edwards Brothers, Inc

HOMEWORK:  The Homework problems are required to be submitted prior to the section quiz.  During class sessions students are able to ask any questions desired and problems can be worked out in class.  Selected students will be asked to lead the class through particular problems during the class session prior to the quiz, so students must come to that class prepared to discuss all homework questions.

  • Introduction – Excel: Data Analysis
  • Central Tendency
  • Power Point Lecture Notes: see the Class Discovery Site
  • Sampling Excel File: Sampling Excel File Notes (insure that you computer opens this file with Excel and not the Internet Explorer)
  • Homework Problems: see the Class Discovery Site
  • Quiz 01-Prepssee the Class Discovery Site

=======================================================================

Section 2 (Weeks 3 & 4) Normal Distributions &

Confidence Intervals Tests

[Normal Distribution] When we look at a distribution of data, we should consider three characteristics of the distribution: its shape, its center, and its spread….The center and spread are numerical summaries of the data. The center of the data set is commonly called the average. There are many ways to describe the average value of a distribution. In addition, there are many ways to measure the spread of a distribution. The most appropriate measure of center and spread depends on the shape of the distribution. Once these three characteristics of the distribution are known, we can analyze the data for interesting features, including unusual data values, called outliers.

Sullivan, Michael. (2007). Statistics: Informed Decisions Using Data. (p. 120). Upper Saddle River, NJ: Prentice Hall.

[Confidence Intervals] A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.  If independent samples are taken repeatedly from the same population, and a confidence interval calculated for each sample, then a certain percentage (confidence level) of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the unknown parameter.  The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter (see precision). A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter. Confidence intervals are more informative than the simple results of hypothesis tests (where we decide “reject H0” or “don’t reject H0”) since they provide a range of plausible values for the unknown parameter.

Statistics Glossary; Easton, McColl

  • Normal Distribution – Excel: Data Analysis
  • Hypothesis Testing
  • Confidence Intervals – Quantitative Data
  • Confidence Intervals – Qualitative Data (called Proportion Data)
  • Power Point Lecture Notes:  Confidence Intervals (this is a rather large pdf file)
  • Homework Problems:  Homework Assignment 5

    Confidence Interval Practice Problems

     

  • Power Point Lecture Notes: see the Class Discovery Site
  • Homework Problems: see the Class Discovery Site

z-score class exercise (Excel file:  open correctly)

=======================================================================

Section 3 (Weeks 5 & 6) t-Tests & Analysis of Variance (ANOVA)

[t-Test] The t-test is called a parametric test because your data must come from populations that are normally distributed and use interval measurement. The t-test is used to answer to this question: Is there any difference between the means of the two populations of which our data is a random sample? The t-test is also called a test of inference because we are trying to discover if populations are different by studying samples from the populations, i.e., what we find to be true about our samples we will assume to be true about the population.

[Georgetown University, Dept of Psychology]

[ANOVA] A statistical technique which helps in making inference whether three or more samples might come from populations having the same mean; specifically, whether the differences among the samples might be caused by chance variation.

[Statistics.com]

  • t-Test – One Sample
  • t-Test – Two Samples
  • t-Test – Two Samples – Excel/Data Analysis – t-test for two unrelated samples, assuming equal variances
  • Power Point Lecture Notes:  t-Test (this is a rather large pdf file)
  • Homework Problems:  Homework Assignment 3

t-Test class exercise (Excel file:  open correctly)

=======================================================================

Section 4 (Weeks 7 & 8) Chi Square Tests (X2)

[X2] The chi-square distribution is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significant tests. [Wikipedia]  The chi-square (chi, the Greek letter pronounced “kye”) statistic is anonparametric statistical technique used to determine if a distribution of observed frequencies differs from the theoretical expected frequencies. Chi-square statistics use nominal (categorical) or ordinal level data, thus instead of using means and variances, this test uses frequencies.

[cnx.org]

  • ANOVA – Excel: Data Analysis
  • Chi-Square – Goodness of Fit
  • Chi-Square – Test of Independence

Class Practice Problems

=======================================================================

Section 5 (Weeks 10 & 12) Correlation & Regression Tests

Correlational statistics are important because they permit us to determine the strength and direction of the relationship between different sets of data or to predict scores on one distribution based on our knowledge of scores on another. If the correlation between two sets of data were a perfect 1.00, we could predict one score from another with complete accuracy. But because correlations are almost always less than perfect, we predict one score from another only with a particular probability of being correct–the higher the correlation, the higher the probability.

[McGraw Hill Higher Education, Statistics Primer for Sociology]

  • Correlation – Excel: Data Analysis
  • Regression – Excel: Data Analysis
  • Multiple Regression – Excel Data Analysis
  • Power Point Lecture Notes:  Correlation & Regression (this is a rather large pdf file)
  • Homework Problems:  Homework Assignment 6

Regression Examples

=======================================================================

Section 6 (Weeks 13 – 15) REVIEW & FINAL EXAM PREPS

Research Project The Research Project is intended to assist students who are better at “doing” than “describing” statistics.  The grade earned on the Project can be used to replace one of the five quizzes contributing towards the course grade.  Although primarily intended for individual students, two students may work together, but the burden is on the students to demonstrate and prove that they both fully understand the concepts and processes involved in the project.  The Project Proposal must describe equal participation in all aspects of the work.  Possible research topics will be discussed in class.  Here is the Research Project overview and guidelines:

Statistics Research Project

The Causation Conundrum

One of the most important cautions in statistics is the caution about NOT assuming causation when one finds correlation.  The following notes are intended to accompany the class video.

  • Causation:  Causation (this is a rather large pdf file)

Review

These are critical sessions since the Final Exam covers ALL STATISTICAL tests for the complete course.  Review outside of class and rely on the class sessions to clarify issues – there is not enough time in class to completely review the whole course.  Use the Preps files below for review.