Databricks¶

Databricks combines the capabilities of a Data Warehouse with Data Lake to provide scalable storage and processing capabilities for modern organizations in its unique product Data Lakehouse.

Connecting to Databricks Account¶

Mammoth allows you to connect to Databricks and get the data into Mammoth.

Select API & Databases from the ‘Add Data’ menu and click on Databricks.

Click on New Connection and log in into your Databricks account.

Fig. 80 Sign into your Databricks account¶

Next, select the desired account.

Fig. 81 Choose the account¶

Once your Databricks account is connected with Mammoth, you will be presented with a list of tables and views in that database.

Select the desired table to get a preview.

Write your own SQL query or run a test query and preview the result.

Click on Next .

After you have selected the table you want to work on, you get options to schedule the data imports as discussed below.

Scheduling your Data Pulls¶

You can start retrieving the data now or at a specific time according to your choice. You can also schedule the data pull in order to get the latest data from your Database at a certain time interval - just once, daily, weekly or monthly.

On every data pull from your Database, you also have options to - Replace all data, Add new data since last pull, and Replace with new data since last pull.

On choosing Add new data since last pull or Replace with new data since last pull option, you will get an option to choose a unique sequence column. Using this column, on refresh, Mammoth will pick up all the rows that have greater value in this column than the previous data pull.