Table transformation #
If you’ve ever dealt with a spreadsheet, you’re probably used to storing data in a wide table with multiple columns. However, many other tools, e.g. for data visualisation, require the data in a long form (also called stacked).
In these exercises you’ll learn how to switch between wide and long tables and what makes data sets tidy.
◕ Know what makes data sets tidy #
- Each variable in a column
- Each observation in a row
- Each cell is one value
- (Each type of observational unit is a table)
Watch the video explaining these principles.
UNTIDY DATA
student | maths | physics | pe |
---|---|---|---|
Anna | 4 | 5 | 5 |
Robert | 2 | 4 | 3 |
Stefan | 5 | 5 | 3 |
TIDY DATA
student | subject | mark |
---|---|---|
Anna | maths | 4 |
Anna | physics | 5 |
Anna | pe | 5 |
Robert | maths | 2 |
Robert | physics | 4 |
Robert | pe | 3 |
Stefan | maths | 5 |
Stefan | physics | 5 |
Stefan | pe | 3 |
◕ Transpose table #
Table transposition is the simplest form of data transformation that you should have in your toolbelt. So even though we don’t need it in our project, we’ll take a minute to test it out – it will definitely come in handy at some point in the future.
- Use
=TRANSPOSE()
function to swap rows with columns
◕ Perform wide-to-long transformation #
Suppose you want to present your data in one of the great online data visualization tools, like RAWGraphs. However, you realize that you have received data from a fellow analyst in a wide table. Fortunately, with a single function you can quickly turn it into a desired long format.
- Check out this tutorial to unpivot tables in Google Sheets.
=ARRAYFORMULA(
split(
flatten(
row_range&"|"&col_range&"|"&value_range
),"|"
)
)
To use the above formula, replace:
- row_range (rows)
- col_range (columns)
- value_range (cell values) with relevant values – depending on you data set.