Descriptive Statistics Standard Error
Contents |
to perform basic data analysis by looking at some descriptive statistics using both programs. Excel To open Excel in windows go Start – Programs – Microsoft Office – Excel When it opens you descriptive statistics confidence interval will see a blank worksheet, which consists of alphabetically titled columns and numbered rows. Each cell is
Descriptive Statistics Variance
referenced by its coordinates of columns and rows, for example A1 is the cell located in column A and row 1; B7 is the cell in descriptive statistics median column B and row 7. You can reference a range of cells, for example C1:C5 are cells in columns C and rows 1 to 5. You can also reference a matrix, A10:C15, are cells in columns A, B and C and rows
Standard Deviation Standard Error
10 to 15. Excel has 256 columns and 65,536 rows. There are some shortcuts to move within the current sheet: · “Home” moves to the first column in the current row · “End – Right Arrow” moves to the last filled cell in the current row · “End - Down Arrow” moves to the last filled cell in the current column · “Ctrl-Home” moves to cell A1 · “Ctrl-End” moves to the last cell in your document (not the last anova standard error cell of the current sheet) · “Ctrl-Shift-End” selects everything between the active cell to the last cell in the document To select a cell : · Click on a cell (i.e. A10), hold the shift key, click on another cell (C15) to select the cells between A10 and C15. · You can also click on a cell and drag the mouse to the desire range · To select not-adjacent cells, click on a cell, press ctrl and select another cell or range of cells. Excel stores your work in a workbook, each workbook has one or more worksheets (and/or charts) which you can view by clicking on the sheet tab (lower left corner of the active (current) sheet). Entering data You can type anything on a cell, in general you can enter text (or labels), numbers, formulas (starting with the “=” sign), and logical values (as in ‘true’ or ‘false’). Click on a cell and start typing, once you finish typing press ‘enter’ (to move to the next cell below) or “tab” (to move to the next cell to the right) You can write long sentences in one single cell but you may see it partially depending on the column width of the cell (and whether the adjacent column is full). To adjust the width of a column go to Format – Column – Width or select “AutoFit Selection”. Numbers are assumed to be positive, if you need to enter a negative value use
Though no one of these measurements are likely to be more precise than any other, this group of values, it is hoped, will cluster
Variance Standard Error
about the true value you are trying to measure. This distribution of
Correlation Standard Error
data values is often represented by showing a single data point, representing the mean value of the data, kurtosis standard error and error bars to represent the overall distribution of the data. Let's take, for example, the impact energy absorbed by a metal at various temperatures. In this case, the temperature of https://www.princeton.edu/~otorres/Excel/excelstata.htm the metal is the independent variable being manipulated by the researcher and the amount of energy absorbed is the dependent variable being recorded. Because there is not perfect precision in recording this absorbed energy, five different metal bars are tested at each temperature level. The resulting data (and graph) might look like this: For clarity, the data for each level of https://www.ncsu.edu/labwrite/res/gt/gt-stat-home.html the independent variable (temperature) has been plotted on the scatter plot in a different color and symbol. Notice the range of energy values recorded at each of the temperatures. At -195 degrees, the energy values (shown in blue diamonds) all hover around 0 joules. On the other hand, at both 0 and 20 degrees, the values range quite a bit. In fact, there are a number of measurements at 0 degrees (shown in purple squares) that are very close to measurements taken at 20 degrees (shown in light blue triangles). These ranges in values represent the uncertainty in our measurement. Can we say there is any difference in energy level at 0 and 20 degrees? One way to do this is to use the descriptive statistic, mean. The mean, or average, of a group of values describes a middle point, or central tendency, about which data points vary. Without going into detail, the mean is a way of summarizing a group of data and stating a best guess at what the true value of the dependent variable value is for that
all the possible scores along the bottom (x axis), and the number of times you came across that score recorded vertically (y axis) in the form of a bar. But such a graph is just plain hard to do statistical analyses with, so we have other, more numerical http://webspace.ship.edu/cgboer/descstats.html ways of summarizing the data. Here is a small set of data: The grades for 15 students. http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html For our purposes, they range from 0 (failing) to 4 (an A), and go up in steps of .2. John -- 3.0 Mary -- 2.8 George -- 2.8 Beth -- 2.4 Sam -- 3.2 Judy -- 2.8 Fritz -- 1.8 Kate -- 3.8 Dave -- 2.6 Jenny -- 3.4 Mike -- 2.4 Sue -- 4.0 Don -- 3.4 Ellen -- 3.2 Orville -- 2.2 Here is the information standard error in bar graph form: Central tendency Central tendency refers to the idea that there is one number that best summarizes the entire set of measurements, a number that is in some way "central" to the set. The mode. The mode is the measurement that has the greatest frequency, the one you found the most of. Although it isn't used that much, it is useful when differences are rare or when the differences are non numerical. The prototypical example of something is usually the mode. The descriptive statistics standard mode for our example is 3.2. It is the grade with the most people (3). The median. The median is the number at which half your measurements are more than that number and half are less than that number. The median is actually a better measure of centrality than the mean if your data are skewed, meaning lopsided. If, for example, you have a dozen ordinary folks and one millionaire, the distribution of their wealth would be lopsided towards the ordinary people, and the millionaire would be an outlier, or highly deviant member of the group. The millionaire would influence the mean a great deal, making it seem like all the members of the group are doing quite well. The median would actually be closer to the mean of all the people other than the millionaire. The median for our example is 3.0. Half the people scored lower, and half higher (and one exactly). The mean. The mean is just the average. It is the sum of all your measurements, divided by the number of measurements. This is the most used measure of central tendency, because of its mathematical qualities. It works best if the data is distributed very evenly across the range, or is distributed in the form of a normal or bell-shaped curve (see below). One interesting thing about the mean is that it represents the expected value if the distribution of measurements were random! Here is what the formula looks like: So 3.0 + 2.8 + 2.8 + 2.4 + 3
Descriptive Statistics Using Microsoft Excel These tutorials briefly explain the use and interpretation of standard statistical analysis techniques. The examples include how-to instructions for Excel. Although there are different version of Excel in use, these should work about the same for most recent versions. They also assume that you have installed the Excel Analysis Pak which is free and comes with Excel (Go to Tools, Addins... if it is not already installed in your version of Excel.) See www.stattutorials.com/EXCELDATA for files mentioned in this tutorial, © TexaSoft, 2008 Descriptive Statistics in Excel Usually when you create or acquire a set of numbers you will want to examine the data to learn more about its distribution and to discover information such as the minimum and maximum values and to determine if there are outliers. This is an important step in any analysis since it helps you understand if you data meet assumptions required by other analyses such as t-tests and regression. For this example, we’ll look at the data set called EXAMPLE.XLS. The first few records are shown here: GROUP AGE TIME1 TIME2 TIME3 TIME4 STATUS A 12 22.3 25.3 28.2 30.6 5 A 11 22.8 27.5 33.3 35.8 5 B 12 22.8 30.0 32.8 31.0 4 A 12 18.5 26.0 29.0 27.9 5 B 9 19.5 25.0 25.3 26.6 5 B 11 23.5 28.8 34.2 35.6 5 C 8 22.6 26.7 28.0 33.4 3 B 8 21.0 26.7 27.5 29.5 5 As an initial example, we’ll examine the variable AGE. 1. In Excel, select Tools/Data Analysis/Descriptive Statistics. (If the Data Analysis option is not on your Tools menu, you must first install it using Tools/Add ins…) 2. Select the input range for the AGE variable. In this case it is $B$2:$B$51. 3. Be sure to select the check boxes Summary Statistics and Confidence level for mean (95% is okay). The output created is shown here: Column1 Mean 10.46 Standard Error 0.343107052 Median 11 Mode 12 Standard Deviation 2.42613323 Sample Variance 5.886122449 Kurtosis -0.261061479 Skewness -0.511921947 Range 11 Minimum 4 Maximum 15 Sum 523 Count 50 Confidence Level(95.0%) 0.689499422 Information you should notice includes: 1. Search for outliers: Look at the Minimum and Maximum values to see if these values fall within your expected range for these data. If a value is unexpectedly small or large, you should examine your original data to see if it was miscoded. If there are corrections that need to be made, make them before continuing. If you