Data joins

Joining data across tables #

So far we have focused on a single data table, but in reality more complex information structures are stored in databases consisting of many tables.

See below an example from the most recent Medialab project about Kościuszki street in Katowice.

We have separate tables dedicated to buildings and people. Let’s assume that we want to combine them in order to complement the information about the buildings with the names of their architects.

This is where table joins come in handy.

◕ See example #

buildings

id address year architect
KAM001 Kościuszki 2 1898 OSO001
KAM002 Kościuszki 14 1894 OSO003

people

id first_name last_name
OSO001 Edmund Trossin
OSO002 Jacob Preuss
OSO003 Oswald Winkler

If we want to attach an architect name to each of the building, we need to join these tables using a common key – in this case, the person’s id. The result would look like this.

buildings and people join

id address year first_name last_name
KAM001 Kościuszki 2 1898 Edmund Trossin
KAM002 Kościuszki 14 1894 Oswald Winkler

◕ Join data #

Usually such joins are made using SQL language or other data processing tools. However, simple operations can also be successfully performed in a plain spreadsheet. Let’s practice.

  • Use =VLOOKUP function to recode values in selected columns (e.g. yes/no to 1/0)
  • Create a dictionary in a separate sheet
  • Use this function: =VLOOKUP(search_key, dictionary!A1:B2, 2, false)
  • Then check out: =ARRAYFORMULA(IFERROR(VLOOKUP(N2:N,dictionary!A1:B2, 2, false), ))