The BigQuery team has released an awesome new feature: the ability to treat a Google Spreadsheet like a table in a database. That means you can join a billion-row BigQuery table with a 100-row mapping table you type into a spreadsheet in a single query.
This feature allows for all kinds of cool things, like configuring data models, pumping some ad hoc data into a data pipeline to see what comes out the other end, or comparing real numbers to the latest goals and projections.
Example
To show how it works, we will walk through a simple example.
Using Google Forms, we collect data into this spreadsheet. Every submission adds a row in the sheet. Currently, the contents of the spreadsheet look like this:
Once you have generated a model from your database, follow these steps:
- Share the spreadsheet with the service account you use to connect to Looker or make the spreadsheet available to all via link access (view only).
- Make sure the Drive API and Sheets API are enabled for your project (https://console.developers.google.com/apis/library).
- Create a table from the BigQuery interface that gets its content from this spreadsheet.
- In your Looker project, craft a simple data model joining the spreadsheet table with some data we might already know about the names, such as rank, population, and perhaps gender. We can also use SQL to normalize the names coming from the database. We will cache the data in BigQuery so that we don't hit the spreadsheet too hard.
Here is an example.
explore: names_sheet { persist_for: "60 seconds" join: names_facts { sql_on: ${names_sheet.normalized_name} = ${names_facts.normalized_name} ;; sql_where: ${names_facts.is_dominant_gender} ;; relationship: one_to_one } view_name: names_sheet { derived_table: { persist_for: "2 minutes" sql: SELECT row_number() OVER() as id, name, UPPER(CASE WHEN REGEXP_MATCH(name, r'\,') THEN REGEXP_EXTRACT(name, r', (\w+)') ELSE REGEXP_EXTRACT(name, r'^(\w+)') END ) as normalized_name FROM namesheet.names ;; } dimension: id { type: number } dimension: name { order_by_field: id # keep the rows in the original order } dimension: normalized_name { } measure: count { type: count drill_fields: [id, name, names_facts.gender] } measure: count_male { type: count filters: [names_facts.gender: "M"] } measure: percentage_male { type: number sql: ${count_male}/${count} ;; value_format_name: percent_2 } measure: average_year_median { type: average sql: ${names_facts.year_median} ;; value_format: "0000" } } }
From that model, you can build out a simple dashboard that shows some statistics about all the names entered into the spreadsheet as well as facts about each individual name. We have data happily flowing from an editable spreadsheet, right into a Looker query. Pretty cool, right?