How to do t-test for more than 2 groups

  • Hi All, I am want to compare mean of a variable by subcategorizing them into more than two groups. From the following Example

    use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear

    I can get two sample t test with equal variances as follow:

    ttest write, by(female)

    I was able to get following output Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ diff = mean(male) - mean(female) t = -3.7341 Ho: diff = 0 degrees of freedom = 198 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0001 Pr(|T| > |t|) = 0.0002 Pr(T > t) = 0.9999 but in the same example I have a group ses (social economic status) offcourse if i write following command

    ttest write, by(ses)

    It wont work as ses has more than two category high middle low and stata is giving the same message . ttest write, by(ses) more than 2 groups found, only 2 allowed . What I want to know the name of test or command which do the above output for more than two group. .i.e. t test for ses in the given example. Regards and Stay Blessed Muhammad Mubeen

  • Analysis of variance (which you can also approach via regression)

  • I tried Anova by following command! oneway write ses Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 858.715441 2 429.35772 4.97 0.0078 Within groups 17020.1596 197 86.396749 ------------------------------------------------------------------------ Total 17878.875 199 89.843593 but output is not as per my requirement, I want the output in the format as ttest two group comparison mean is given see following example . ttest write, by(female) Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ diff = mean(male) - mean(female) t = -3.7341 Ho: diff = 0 degrees of freedom = 198 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0001 Pr(|T| > |t|) = 0.0002 Pr(T > t) = 0.9999 but when i apply the following . ttest write, by(ses) more than 2 groups found, only 2 allowed r(420); I have tried multiple anova but all Anova's output is completely different than this.

  • As Nick suggests, you can perform an ANOVA test in this case. You can use the oneway command with the bonferroni option, which will give you a comparison matrix of each category on the grouping variable:

    use "http://www.ats.ucla.edu/stat/stata/notes/hsb2", clear oneway write ses, bonferroni

    The output looks like this:

    . oneway write ses, bonferroni Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 858.715441 2 429.35772 4.97 0.0078 Within groups 17020.1596 197 86.396749 ------------------------------------------------------------------------ Total 17878.875 199 89.843593 Bartlett's test for equal variances: chi2(2) = 0.1462 Prob>chi2 = 0.930 Comparison of writing score by ses (Bonferroni) Row Mean-| Col Mean | low middle ---------+---------------------- middle | 1.30929 | 1.000 | high | 5.29677 3.98748 | 0.012 0.032

    The p-value of 0.0078 tells you that there are statistical significant differences in the writing score between the three groups, at the 99.22%-level.

  • Muhammad: a more time-consuming way to get what you are after implies performing a series of -ttest- dividing the arbitrary p<0.05 by the number of comparison you're going to do (3 in your case) beforehand. Hence, in order to reject the null with a probability of Type 1 Error set at 0.05, the resulting p-value should be less than (0.05/3)=.001666667:

    . use "http://www.ats.ucla.edu/stat/stata/notes/hsb2", clear . ttest write if ses!=1, by(ses) unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- middle | 95 51.92632 .9342604 9.106044 50.07132 53.78131 high | 58 55.91379 1.23991 9.442874 53.43092 58.39667 ---------+-------------------------------------------------------------------- combined | 153 53.43791 .7604806 9.406626 51.93543 54.94039 ---------+-------------------------------------------------------------------- diff | -3.987477 1.552488 -7.062047 -.9129079 ------------------------------------------------------------------------------ diff = mean(middle) - mean(high) t = -2.5684 Ho: diff = 0 Satterthwaite's degrees of freedom = 117.19 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0057 Pr(|T| > |t|) = 0.0115 Pr(T > t) = 0.9943 . ttest write if ses!=2, by(ses) unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- low | 47 50.61702 1.384316 9.490391 47.83054 53.4035 high | 58 55.91379 1.23991 9.442874 53.43092 58.39667 ---------+-------------------------------------------------------------------- combined | 105 53.54286 .954748 9.783255 51.64956 55.43616 ---------+-------------------------------------------------------------------- diff | -5.296772 1.858415 -8.984579 -1.608965 ------------------------------------------------------------------------------ diff = mean(low) - mean(high) t = -2.8502 Ho: diff = 0 Satterthwaite's degrees of freedom = 98.3367 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0027 Pr(|T| > |t|) = 0.0053 Pr(T > t) = 0.9973 . ttest write if ses!=3, by(ses) unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- low | 47 50.61702 1.384316 9.490391 47.83054 53.4035 middle | 95 51.92632 .9342604 9.106044 50.07132 53.78131 ---------+-------------------------------------------------------------------- combined | 142 51.49296 .7738965 9.222041 49.96302 53.0229 ---------+-------------------------------------------------------------------- diff | -1.309295 1.670082 -4.627988 2.009399 ------------------------------------------------------------------------------ diff = mean(low) - mean(middle) t = -0.7840 Ho: diff = 0 Satterthwaite's degrees of freedom = 88.4657 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.2176 Pr(|T| > |t|) = 0.4352 Pr(T > t) = 0.7824 . oneway write ses, bonf Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 858.715441 2 429.35772 4.97 0.0078 Within groups 17020.1596 197 86.396749 ------------------------------------------------------------------------ Total 17878.875 199 89.843593 Bartlett's test for equal variances: chi2(2) = 0.1462 Prob>chi2 = 0.930 Comparison of writing score by ses (Bonferroni) Row Mean-| Col Mean | low middle ---------+---------------------- middle | 1.30929 | 1.000 | high | 5.29677 3.98748 | 0.012 0.032

    That said, I would strongly support previous comments in favour of -oneway- and, even more, in favour of -regression-.

    Last edited by Carlo Lazzaro; 07 Nov 2016, 07:08.

    Kind regards, Carlo

    (Stata 17.0 SE)

  • Note that in this instance, and many others, anova can't tell the whole story as it takes no account of the fact that ses is ordinal. For those curious, a graph tells more than some of the inferential stuff:

    use "http://www.ats.ucla.edu/stat/stata/notes/hsb2", clear capture ssc install stripplot stripplot write, over(ses) cumul cumprob box centre vertical refline yla(, ang(h)) xla(, noticks) xsc(titlegap(*5))

    The boxes show medians and quartiles as customary. The added lines are the means.

  • Thanks Nick: very enlightening.

    Kind regards, Carlo

    (Stata 17.0 SE)

  • Thank you All, especially Carlo's way is giving me required output. However, I have some more small queries.

    oneway write ses, bonferroni

    This code is giving output where it is stated that it is assuming Equal Variance among the Groups. What Test we should use if we have unequal variance among the groups. Moreover, This is just an example, In my Actual Research where I have to apply the same technique, my Groups were made ordinal by myself categorical whereas in actual they were continuous ranging from 0 to 1000. To do the same in the given example following code will give you idea

    generate groups = recode(id, 25, 60, 100, 150, 200)

    Now my hypothesis is that Mean of group with value of 25 is less than Mean of group with Value of 60 is less than Mean of group with Value of 100 is less than Mean of Group = 150 is less than Mean of Group = 200. I am ready to write the following timeconsuming code

    ttest write if groups!=100 | groups!=150 | groups !=200, by(groups) unequal /// for diff of mean between 25 and 60 group ttest write if groups!=25 | groups!=150 | groups !=200, by(groups) unequal /// for diff of mean between 60 and 100 group ttest write if groups!=25 | groups!=60 | groups !=200, by(groups) unequal /// for diff of mean between 100 and 150 group ttest write if groups!=25 | groups!=60 | groups !=100, by(groups) unequal /// for diff of mean between 150 and 200 group

    above code is again not working and stating that there are more than two groups ( i know i am doing some mistake in applying if option) see following ttest write if groups!=100 | groups!=150 | groups !=200, by(groups) unequal more than 2 groups found, only 2 allowed but still i feel that there must be something which i may be missing! as a test for my above hypothsis i.e. mean(group1) < mean(group2) < mean(group3) < mean(group4) < mean(group5) , when variance is unequal. Is not there a specific test for this type of hypothesis which give the output similar to ttest two mean as above? In actual research

    generate group = recode(varcont, 0, 10, 25, 100, 500, 1000)

    * varcont is my continuous variable whom i am making as ordinal to develop my hypothesis. Regards and Stay Blessed

    Muhammad Mubeen

  • Consider a Jonckheere-Terpstra test. It's not for means, but it may help.

  • Muhammad: -anova is, in general, quite robust to departures from equal variance prerequiste; - I find you query a bit too sparse. Anyway, if your depvar is continuous and you want to stick with -ttest-, you should -label- the -Groups- variable first, and then try, for each of the planned comparison, what follows (changing the Group_# when necessary):

    ttest write if ses==Group_25 | ses==Group_60, by(ses) unequal]

    However, as Nick pointed out, if Group has an ordinal flavour, something in your result will remain unsaid.

    PS: crossed in the cyber-space with Nick's reply, who tackled the issue from a very different point.

    Kind regards, Carlo

    (Stata 17.0 SE)

  • Thank You Again, after comparing your given code with my already stated , I found out my mistake and following code worked

    ttest write if groups==25 | groups==60, by(groups) unequal // Difference between group of 25 and group of 60 ttest write if groups==60 | groups==100, by(groups) unequal // Difference between group of 60 and group of 100

    and so on and also Jonckheere-Terpstra test was help too for atleast consideration in analysis. Regards and Stay Blessed Mubeen