Generally, you want your data to be in a form where each row is a case and each column is a variable (either explanatory or response). Sometimes your data don’t start that way. This section describes how to move your data around to get it in that form. The tidyverse provides a simple method for doing this (pivot_longer() and pivot_wider()) which you should read about in R for Data Science. There are also “old school” ways of doing this, via a method called reshape(); this way is more powerful and useful in some circumstances. See the final section for more on this old-style approach.
But for now, the pivot methods will pretty much do everything you want. Both pivot_longer and pivot_wider from tidyverse are great functions to understand. First, we load tidyverse and make some fake data.
library(tidyverse)dat <-data.frame( ID =c( 1:3 ), X =c( 10, 20, 30 ),Y1 =1:3,Y2 =10+1:3,Y3 =20+1:3 )dat
This data is in wide format, where we have multiple measurements (Y1, Y2, and Y3) for each individual (each row of data).
10.1 Converting wide data to long data
We use pivot_longer to take our Y values and nest them within each ID for longitudinal MLM analysis. (NB you can use SEM to fit longitudinal models with wide data; we do not explore that application here.)
# A tibble: 3 × 2
ID X
<int> <dbl>
1 1 10
2 2 20
3 3 30
students =merge( students, newdat, by="ID" )
10.3 Optional: wrangling data with reshape
The reshape() command is the old-school way of doing things, and it is harder to use but also can be more powerful in some ways (alternatively, there is a long literature on doing fancy stuff with the pivot methods as well). This section is entirely optional and possibly no longer useful.
Anyway, say you have data in a form where a row has a value for a variable for several different points in time. The following code turns it into a data.frame where each row (case) is a value for the variable at that point in time. You also have an ID variable for which Country the GDP came from.
Country X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004
1 China 0.5 1 2 3.4 4 5.3 6.0 7
2 Morocco 31.9 32 33 34.0 NA 36.0 37.0 NA
3 England 51.3 52 53 54.3 55 56.0 57.3 58
Here we have three rows, but actually a lot of cases if we consider each time point a case. For trying it on your own, get the sample csv file ()[here]
See the website to get the sample csv file \verb|fake_country_block.csv|.
The following our original data by making a case for each time point:
Country Year X
China.1997 China 1997 0.5
Morocco.1997 Morocco 1997 31.9
England.1997 England 1997 51.3
China.1998 China 1998 1.0
Morocco.1998 Morocco 1998 32.0
England.1998 England 1998 52.0
Things to notice: each case has a “row name” made out of the country and the Year. The “2:9” indicates a range of columns for the variable that is actually the same variable.
R picked up that, for each of these columns, “X” is the name of the variable and the number is the time, and seperated them. You can set the name of your time variable, \verb|timevar|, to whatever you want.
The above output is called “long format” and the prior is called “wide format.”
You can go in either direction. Here: