Google BigQuery

Google BigQuery is a fully-managed, server less data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL.It was designed for analyzing data on the order of billions of rows, using a SQL-like syntax. It runs on the Google Cloud Storage infrastructure and can be accessed with a REST-oriented application program interface (API).

To import tables from BigQuery, you first need to configure Google Cloud. Here’s how to do this:

Enable BigQuery API

  1. Log in to Google Cloud Platform and select the project to use in Mammoth;

  2. Go to API and Services from the sidebar;

  3. Click on + Enable APIs and Services and look for BigQuery API;

  4. Open BigQuery API and click on Enable;

    Enabling BigQuery API

    Fig. 61 Enabling BigQuery API

Create a Service Account

  1. From the sidebar, select IAM & Admin and go to Service accounts;

  2. Click on Create service account;

  3. In Service Account Details, enter the service account name and description. Google Cloud will then choose an email for your service account randomly. Click on Create;

    Service account

    Fig. 62 Creating a service account

  4. Click into Grant this service account access to project, add two roles and select BigQuery Data Editor and BigQuery Job User to give the service account permission to complete specific actions on the resources in your project. Click Continue;

  5. In Grant users access to this service account, add the email address of the end user. By default, you will be the sole owner of the service account. This step is optional. Click Done;

    Adding roles

    Fig. 63 Adding roles to your service account.

  6. For the newly created service account, go to the action menu and select Manage keys;

  7. Select Add key and click on Create new key.

  8. Select the key type as JSON (Mammoth accepts JSON type keys). Click on Create and a JSON key will be downloaded into your system. This JSON key will be required while connecting Mammoth to BigQuery.

    Creating a key

    Fig. 64 Creating a JSON key

Note

A service account can be used for multiple projects.

  1. Switch to the project which needs this service account;

  2. From the sidebar, go to IAM & Admin and click on +ADD;

  3. Enter the email of the service account, create the roles and save the file.

    Adding service account

    Fig. 65 Adding a service account to a project

Connecting to Mammoth

  1. Select API & Databases from the add menu and click on Google BigQuery.

    Google BigQuery selection
  2. Click on New Connection and Upload the JSON key which was downloaded in the previous step. CLick on Connect.

    Google BigQuery login
  3. Select the desired ProjectID - Dataset and click on Next.

    Google BigQuery profile

Once your Google account is connected with Mammoth, you will be presented with a list of tables and views in that database.

  • Select the desired table to get a preview.

  • Write your own SQL query or run a test query and preview the result.

  • Click on Next .

    Google BigQuery profile

After you have selected the table you want to work on, you get options to configure it as follows -

  • Rename it in the data pull scheduling window.

  • Save it in a desired location in the the Data Library from Adding file to option.

Scheduling your Data Pulls

You can start retrieving the data now or at a specific time according to your choice. You can also schedule the data pull in order to get the latest data from your Database at a certain time interval - daily, weekly or monthly.

On every data pull from your Database, you also have an option to either replace the older data or combine with older data.

BigQuery data pull

On choosing Combine with older data option, you will get an option to choose a unique sequence column. Using this column, on refresh, Mammoth will pick up all the rows that have greater value in this column than the previous data pull.