![]() ![]() The practical difference is that when the occasion is the unit of analysis, you can use each decade’s college education rate as a covariate for the same decade’s Jobs value. ![]() Whereas in the long format, the unit of analysis is each measurement occasion for each county. For example, in the wide format, the unit of analysis is the subject–the county. Likewise, mixed models and many survival analysis procedures require data to be in the long format.īeyond software requirements, each approach has analytical implications. Many data manipulations are much, much easier as well when data are in the wide format. One reason for setting up the data in one format or the other is simply that different analyses require different set ups.įor example, in all software that I know of, the wide format is required for MANOVA and repeated measures procedures. It looks strange, but it’s okay to have it this way, and as long as you analyze the data using the correct procedures, it will take into account that these are redundant. You’ll notice that variables that didn’t change from year to year–Land Area and Natural Amenity–have the same value in each of the four rows for each county. The same is true for the four values of College.īut to keep track of which observation occurred in which year, we need to add a variable, Year. Instead, all four values of Jobs for each county are stacked–they’re all in the Jobs column. We no longer need four columns for either Jobs or College. Each county has four rows of data–one for each year.Īll the same information is there we’re just set up the data differently. You can see the same five counties’ data below in the long format. Any variables that don’t change across time will have the same value in all the rows. So each subject (county) will have data in multiple rows. In the long format, each row is one time point per subject. But both our outcome, Jobs, and one predictor, College, have different values in each year, so require a different variable (column) for each year. Since land area and presence of a natural amenity doesn’t change from decade to decade, those predictors have only one variable per county. ![]() There are three predictor variables: Land Area, Natural Amenity (4=no and 3=Yes), and the proportion of the county population in that year that had graduated from college. The outcome variable is Jobs, and indicates the number of jobs in each county. In the wide format, a subject’s repeated responses will be in a single row, and each response is in a separate column.įor example, in this data set, each county was measured at four time points, once every 10 years starting in 1970. This article will outline one of the issues in data set up: using the long vs. If the data isn’t set up right, the software won’t be able to run any of your analyses.Īnd in many data situations, you will need to set up the data different ways for different parts of the analyses. Which data should go in each row of the data matrix?Īnswering these practical questions is one of those skills that comes with experience, especially in complicated data sets.Įven so, it’s extremely important.One issue in data analysis that feels like it should be obvious, but often isn’t, is setting up your data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |