What does it mean for standard errors to be clustered?

Statistics Definitions > > Clustered Standard Errors

You may want to read this article first: What is the Standard Error of a Sample?

Clustered Standard Errors(CSEs) happen when some observations in a data set are related to each other. This correlation occurs when an individual trait, like ability or socioeconomic background, is identical or similar for groups of observations within clusters. Panel data (multi-dimensional data collected over time) is usually the type of data associated with CSEs.

For example, let’s say you wanted to know if class size affects SAT scores. Specifically, you think that smaller class size leads to better SAT scores. You collect panel data for dozens of classes in dozens of schools. As this is panel data, you almost certainly have clustering. Teachers might be more efficient in some classes than other classes, students may be clustered by ability (e.g. special education classes), or some schools might have better access to computers than others. According to Cameron and Miller, this clustering will lead to:

  • Standard errors that are smaller than regular OLS standard errors.
  • Narrow confidence intervals.
  • T-statistics that are too large.
  • Misleadingly small p-values.

Incorrect standard errors violate of the assumption of independence required by many estimation methods and statistical tests and can lead to Type I and Type II errors.

Adjusting for Clustered Standard Errors

Accurate standard errors are a fundamental component of statistical inference. Therefore, If you have CSEs in your data (which in turn produce inaccurate SEs), you should make adjustments for the clustering before running any further analysis on the data.

Hand calculations for clustered standard errors are somewhat complicated (compared to your average statistical formula). For example, this snippet from The American Economic Review gives the variance formula for the calculation of the clustered standard errors:

What does it mean for standard errors to be clustered?

It’s usually not necessary to perform these adjustments by hand as most statistical software packages like Stata and SPSS have options for clustering. When you specify clustering, the software will automatically adjust for CSEs.


One way to control for Clustered Standard Errors is to specify a model. For example, you could specify a random coefficient model or a hierarchical model. However, accuracy of any calculated SEs completely relies upon you specifying the correct model for within-cluster error correlation. A second option is Cluster-Robust Inference, which does not require you to specify a model. It does, however, have the assumption that the number of clusters approaches infinity (Ibragimov & Muller).

References
Cameron and Miller. A Practitioner’s Guide to Cluster-Robust Inference
Ibragimov, R., & Muller, U. Inference with Few Heterogeneous Clusters.
Primo, D. the practical researcher. Estimating the Impact of State Policies and
Institutions with Mixed-Level Data

---------------------------------------------------------------------------

Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. Your first 30 minutes with a Chegg tutor is free!

Comments? Need to post a correction? Please Contact Us.

In ancient Greek times, important decisions were never made without consulting the high priestess at the Oracle of Delphi.  She would deliver wisdom from the gods, although this advice was sometimes vague or confusing, and was often misinterpreted by mortals. Today I bring word that the high priestess and priests (Athey, Abadie, Imbens and Wooldridge) have delivered new wisdom from the god of econometrics on the important decision of when should you cluster standard errors. This is definitely one of life’s most important questions, as any keen player of seminar bingo can surely attest. In case their paper is all greek to you (half of it literally is), I will attempt to summarize their recommendations, so that your standard errors may be heavenly.

The authors argue that there are two reasons for clustering standard errors: a sampling design reason, which arises because you have sampled data from a population using clustered sampling, and want to say something about the broader population; and an experimental design reason, where the assignment mechanism for some causal treatment of interest is clustered. Let me go through each in turn, by way of examples, and end with some of their takeaways.

The Sampling Design reason for clustering
Consider running a simple Mincer earnings regression of the form:
Log(wages) = a + b*years of schooling + c*experience + d*experience^2 + e

You present this model, and are deciding whether to cluster the standard errors. Referee 1 tells you “the wage residual is likely to be correlated within local labor markets, so you should cluster your standard errors by state or village.”. But referee 2 argues “The wage residual is likely to be correlated for people working in the same industry, so you should cluster your standard errors by industry”, and referee 3 argues that “the wage residual is likely to be correlated by age cohort, so you should cluster your standard errors by cohort”. What should you do?

You could try estimating your model with these three different clustering approaches, and see what difference this makes.

Their advice: whether or not clustering makes a difference to the standard errors should not be the basis for deciding whether or not to cluster. They note there is a misconception that if clustering matters, one should cluster.

Instead, under the sampling perspective, what matters for clustering is how the sample was selected and whether there are clusters in the population of interest that are not represented in the sample. So, we can imagine different scenarios here:

  1. You want to say something about the association between schooling and wages in a particular population, and are using a random sample of workers from this population. Then there is no need to adjust the standard errors for clustering at all, even if clustering would change the standard errors.
  2. The sample was selected by randomly sampling 100 towns and villages from within the country, and then randomly sampling people in each; and your goal is to say something about the return to education in the overall population. Here you should cluster standard errors by village, since there are villages in the population of interest beyond those seen in the sample.
  3. This same logic makes it clear why you generally wouldn’t cluster by age cohort (it seems unlikely that we would randomly sample some age cohorts and not others, and then try and say something about all ages); and that we would only want to cluster by industry if the sample was drawn by randomly selecting a sample of industries, and then sampling individuals from within each.

Even in the second case, Abadie et al. note that both the usual robust (Eicker-Huber-White or EHW) standard errors, and the clustered standard errors (which they call Liang-Zeger or LZ standard errors) can both be correct, it is just that they are correct for different estimands. That is, if you are content on just saying something about the particular sample of individuals you have, without trying to generalize to the population, the EHW standard errors are all you need; but if you want to say something about the broader population, the LZ standard errors are necessary.

Special case: even when the sampling is clustered, the EHW and LZ standard errors will be the same if there is no heterogeneity in the treatment effects.

Sidenote 1: this reminds me also of propensity score matching command nnmatch of Abadie (with a different et al.), where you can get the narrower SATE standard errors for the sample, or the wider PATE errors for the population.

Sidenote 2: This reason is hardly ever a rationale for clustering in an impact evaluation. But Rosenzweig and Udry’s paper on external validity does make the point that we only observe treatment effects for specific points in time, and that if we want to say something more general about how our treatment behaves in other points in time, we need wider standard errors than we use for just saying something about our specific sample – which is very related to the point here about being very clear what your estimand is.

The Experimental Design Reason for Clustering
The second reason for clustering is the one we are probably more familiar with, which is when clusters of units, rather than individual units, are assigned to a treatment. Let’s take the same equation as above, but assume that we have a binary treatment that assigns more schooling to people. So now we have:
Log(wages) = a +b*Treatment + e

Then if the treatment is assigned at the individual level, there is no need to cluster (*). There has been much confusion about this, as Chris Blattman explored in two earlier posts about this issue (the fabulously titled clusterjerk and clusterjerk the sequel), and I still occasionally get referees suggesting I try clustering by industry or something similar in an individually-randomized experiment. This Abadie et al. paper is now finally a good reference to explain why this is not necessary.
(*) unless you are using multiple time periods, and then you will want to cluster by individual, since the unit of randomization is individual, and not individual-time period.

What about if your treatment is assigned at the village level. Then cluster by village. This is also why you want to cluster difference-in-differences at the state-level when you have a source of variation that comes from differences across states, and why a “treatment” like being on one side of a border vs the other is problematic (because you have only 2 clusters).

Adding fixed effects
What if we sample at the level of cities, but then add city fixed effects to our Mincer regression. Or we randomize at the city level, but add city fixed effects. Do we still need to cluster at the city level? 
The authors note that there is a lot of confusion about using clustering with fixed effects. The general rule is that you still need to cluster if either the sampling or assignment to treatment was clustered. However, the authors show that cluster adjustments will only make an adjustment with fixed effects if there is heterogeneity in treatment effects.

How to cluster?
This is largely a paper about when to cluster, not how to cluster. There is of course a whole other debate about when you can rely on asymptotics, vs bootstrapping, vs randomization inference approaches. They show with asymptotic approximations that the standard Liang-Zeger cluster adjustment is generally conservative, and offer an alternative cluster-adjusted variance estimator that can be used if there is variation in treatment assignment within clusters and you know the fraction of clusters sampled. But since with the sample sizes used in many experiments the concern is now that asymptotic standard errors may not be conservative enough, you should be careful about using such an adjustment with typical sample sizes.

Why do we cluster standard errors in panel data?

Conclusion. Clustering standard errors is a useful tool which allowes us to deal with correlation in a dataset. It is possible to implement in in both dimensions of panel data and it improves the model, when the entities are correlated cross-sectionally or across time.

Does clustering increase or decrease standard errors?

Robust clustered standard errors can change your standard errors in both directions. That is, clustered standard errors can be larger or smaller than conventional standard errors.

What is robust and clustered standard errors?

Robust standard errors are generally larger than non-robust standard errors, but are sometimes smaller. Clustered standard errors are a special kind of robust standard errors that account for heteroskedasticity across “clusters” of observations (such as states, schools, or individuals).

At what level should you cluster standard errors?

To conduct inference, all articles cluster standard errors at the unit level.