# Which statistical test to use?

Steps:

1. Determine the types of data
2. Determine the number of samples
3. If two samples: are they independent groups or related (matched) groups?
4. Choose the test

### Types of data

Mnemonic: NOIR

Qualitative or Categorical data

a. Nominal (relating to name): Groups e.g. gender (male/female), color (black/white), blood groups (A/B/AB/O), religions (hindu/muslim/christian)

b. Ordinal (relating to order): Rank-ordered data but without meaningful difference; e.g. socio-economic status (low, middle and high), rank (1st, 2nd and 3rd)
– without meaningful difference: difference between 1st and 2nd may be 20 units but that between 2nd and 3rd may be only 3 units

Quantitative or Numerical data (Scale)

a. Interval (means gap): values can be ordered and have a meaningful difference but doubling is not meaningful because there is no “true zero” point
– with meaningful difference: difference between 100 and 90 celsius is same as that between 50 and 40 celsius, i.e. 10 celsius
– without meaningul doubling: 100 celsius is not twice as hot as 50 celsius because 0 celsius doesn’t indicate complete absence of heat

b. Ratio: Similar to interval data but with meaningful doubling because it has “true zero” point (0 means absence of something)
– with meaningful doubling: weight (100 kg is twice as heavy as 50 kg), height (100 cm is twice as tall as 50 cm), kelvin scale (300K is twice as hot as 150K), blood pressure (120 mmHg is twice as high as 60 mmHg), pulse rate (120 beats/min is twice as high as 60 beats/min)

Quantitatve data can also be:

1. Discrete: Only integers (no fraction or decimals); e.g. number of people (178 or 179; 178.5 is not possible)

2. Continuous: Fractions or decimals possible; e.g. temperature (100.4 F), weight (65.7 kg)

Summary: NOIR

1. Nominal: groups
2. Ordinal: ordered ranks
3. Interval: meaningful ordered difference but no meaningul doubling (no true 0 point)
4. Ratio: meaningful ordered difference with meaningul doubling (true 0 point)
5. Discrete: integers only
6. Continuous: fractions or decimals possible
7. Independent variable: Investigator manipulated variable (input)
8. Dependent variable: Measured variable (output)

Plotting a histogram or QQ plot of the variable of interest will give an indication of the shape of the distribution. Histograms should peak in the middle and be approximately symmetrical about the mean. If data is normally distribued, the points in QQ plots will be close to the line.

### Tests of statistical significance and association

 Parametric tests Non-parametric tests Based on Normal distribution Non-normal distribution Types of data Quantitative Qualitative Compares Means (+SD) Percentage, proportions and fractions Examples Students (paired) t-testStudents (unpaired) t-testANOVA (F test) Sign testChi-square testWilcoxan testMann-Whitney test
 Comparing: Dependent variable Independent variable Parametric test (Dependent variable is normally distributed) Non-parametric test Means of 2 independent groups Continuous/Scale Categorical/nominal Unpaired t-testz test (if sample >30) Mann-Whitney U test or Wilcoxon (rank sum) test if data atleast ordinal Means of 2 paired (matched) groups Continuous/Scale Time variable (Time 1 = before, Time 2 = after) Paired t-test Sign test orWilcoxon signed rank test Means of 3+ independent groups Continuous/Scale Categorical/nominal ANOVA Kruskal-Wallis test 3+ measurment on the same subjects Continuous/Scale Time variable Repeated measures ANOVA Friedman test Relationship between 2 continuous variables Continuous/Scale Continuous/Scale Pearson’s correlation coffecient (r) Spearman’s correlation coffecient (rho) – also used for ordinal data Predicting the cahnge in dependent variable with the change in independent variable Continuous/Scale Any Simple linear regression (1 independent variable)Multiple linear regression (2 or more independent variables) Qualitative Any Logistic regression Relation between 2 categorical variables Categroical/nominal Categorical/nominal Chi-square testFischer’s test (if sample size <30)

Examples:

a. Is diet A better than diet B for weight loss?

1. 1 nominal: Diet A or B
2. 1 scale: Weight loss (dependent variable)
Choice of test: Normal distribution – unpaired t test; Non-normal distribution – Mann whitney U test or Wilcoxan (rank sum) test

b. Are height and weight related?

1. 1 scale: Height
2. 1 scale: Weight
Choice of test: Normal distribution – Pearson’s correlation coefficient; Non-normal distribtion – Spearman’s correlation coefficient

c. Can height predict weight?

1. 1 scale: Height
2. 1 scale: Weight (dependent variable)
Choice of test: Simple linear regression

d. Are patients taking treatment A more likely to recover than those taking treatment B?

1. 1 nominal: treatment A or B
2. 1 nominal: early recovery or late recovery
Choice of test: Chi-square test

e. 30% of students in a class are anemic, after 6 months of IFA therapy, now 20% of students are anemic – how do you test the significance?

1. 1 scale: percentage of anemic
2. 1 time interval: before 6 months and after 6 months (matched group)
Choice of test: Sign test or Wilcoxon (signed rank) test (percentage = non-parametric)

f. Mean serum albumin level of dengue patients before treatment was 3.6 gm/dl and after treatment was 3.2 gm/dl.

1. 1 scale: albumin level
2. 1 time interval: before treatment and after treatment (matched group)
Choice of test: paired t-test (mean = parametric)

g. Mean Hb level of anemia patients was 9.6 gm/dl and those of hookworm patients was 7.2 gm/dl.

1. 1 scale: Hb level
2. 1 nominal: Anemia patients or hookworm patients
Choice of test: unpaired t-test (if sample <30) or z test (if sample >30)

h. Mean weight of students in class A is 50 kg, class B is 44 kg and class C is 52 kg.

1. 1 nominal: 3 groups (Class A, B and C)
2. 1 scale: weight
Choice of test: 1 way ANOVA

i. A doctor believes that drawing blood is faster with a vacutioner for someone once that person is trained, but faster with a standard syringe for someone with no training.

1. 1 scale: time (faster vs slower)
2. 1 nominal: vacutioner or syringe
3. 1 nominal: trained or non-trained
Choice of test: 2 way ANOVA

j. Do body weight, calorie intake, fat intake, and age have an influence on the probability of having a heart attack?

1. Dependent variable: nominal (influence probability of having heart attack – yes or no)
2. Independent variables: Body weight (scale), Calorie intake (scale), Fat intake (scale)
Choice of test: Logistic regression

Interpretation of the test

 Inferential statistics test Rules for significance (Null hypothesis rejected) Spearman’ Rank Calculated rho =/> critical value Chi-Square Calculated X² =/> critical value t-test Calculated t =/> critical value Sign Test Calculated S =/< critical value Wilcoxon Calculate T =/< critical value Mann-Whitney Calculated U =/< critical value

a. Applications of chi-square test:

1. Test of proportions
2. Test of association
3. Test of goodness of fit (for single data)

b. Essential requirements for calculation of chi-square test:

1. Random sample
2. Qualitative data
3. Lowest expected frequency not <5

c. Degrees of freedom: It is the number of observations in a dataset that can freely vary once the parameters have been estimated. It is used in chi-square test and t-test. It is calculated as:

1. Single sample (paired t-test): n-1 (where n is the no. of units in the sample)
2. Two sample (unpaired t-test): (N1 + N2) – 2; where N1 and N2 is the no. of units in the two samples)
3. Chi-square test, contingency table: (c-1)(r-1); where c is no. of columns and r is no. of rows

d. Correlation is represented by scatter diagram.

e. Correlation coefficient (r) lies between -1 and +1.

• Negative r = Negative correlation (as one variable increase, another variable decreases)
• Positive r = Positive correlation (as on variable increase, another variable also increases)
• 0 < r < 0.3 = Weak positive correlation
• 0.3 <r < 0.7 = Moderate positive correlation
• r > 0.7 = Strong positive correlation

f. Coefficient of determination = r² (0 to 1); percentation of variation in one variable that can be explained by variation in another variable

g. If we have a regression equation Y = 0.3X1 + 4X2, then the regression coefficient of X1 is 0.3 and the regression coefficient of X2 is 4.This means that when X1 increases by 1 unit, Y will increase by 0.3Also, when X2 increases by 1 unit, Y will increase by 4 units.

This site uses Akismet to reduce spam. Learn how your comment data is processed.