Dataset

A Dataset contains tabular data organized as columns and rows. You can create a Dataset by sourcing data from files, databases, APIs or Webhooks. Read more about the sources of data here.

You can add or replace data in a Dataset. This data shows as Batches in a Dataset. The mechanism of adding data differs based on different sources of data.

You see the data of a Dataset in a View. A fresh View with no rules shows you the original Dataset. You can create many Views on a Dataset. Each View transforms the Dataset but never changes the Dataset itself.

Datasets are listed in Data Library. Clicking on a Dataset opens Preview Panel for that Dataset where you can see various properties related to Datasets, its Batches and Views.

Note

All columns in a Dataset have well defined data type namely Numeric, Date or Text. Mammoth respects the data types when it is available (from Databases, API etc.). For CSV and Excel files, Mammoth infers the types after analysing the uploaded file. This inference may not be correct at times and Mammoth may need your feedback to make corrections. Mammoth would prompt you for your inputs whenever such need arises.

Let us visit each of the above ideas in some detail.

Adding Datasets

You can create Datasets by clicking on ‘+’ icon in the Data Library.

Adding Dataset

Fig. 1 Click on ‘+’ to add a Dataset

You can add data from the following sources:

Desktop

You can drag and drop files in the Data Library from the Deskstop. Alternatively, click on ‘+’ icon and open “My Desktop” Window.

Mammoth supports the following file types: .csv, .tab, .tsv, .txt, .xls, .xlsx, .zip .

APIs and databases

You can fetch data from various APIs and Databases. Click here to learn more about supported APIs and Databases.

Webhooks

You can set up an incoming webhook to receive data from Webhooks section after clicking on ‘+’ in the Data Library. Webhooks can receive data as GET, POST, or JSON post parameters. They can be used with popular services such as GitHub or JIRA.

For more information on Webhooks, Click here.

Public files from URLs

Enter the URL to file on web under “Fetch from URL” option in ‘+’ menu.

Note that there are files on the web that are not accessible by external software because the owners do not want automated software to access the data. In such case, you can download the file and upload it in the Data Library.

Viewing Dataset

You can see your Dataset by clicking on Open in the Preview Panel. Mammoth creates a first View of the Dataset by default.

View Dataset

Fig. 2 Click on open to view Dataset

If rules in the Data Pipeline have modified your data, you can see the original Dataset by creating a new View. If the View has rules, you can see the original data in the Dataset from the Original Dataset option present at the top of the Data Pipeline.

Original Dataset

Fig. 3 Click on rows to preview original Dataset

Adding or replacing data

You can add or replace data into your Dataset. Add or replace data is configured differently in case of different types of Datasets. This is how it works for Datasets created through different sources:

File upload and public URLs

For these Datasets, you can add or replace data from the Preview Panel. When you add more data, Mammoth infers type of columns in the new data and attempts to do an exact match with the types in the existing Dataset. If the type matches correctly the merge of data succeeds. Otherwise you would see a new Dataset in the Data Library. You can then use Combine Dataset process to merge the Dataset. This gives you a way to map the types of the original and the new uploaded Datasets.

add more data

Fig. 4 Add data to Dataset

Third-party connections

You can combine with or replace older data while creating a new connection.

Connection settings

Fig. 5 Choose the option to replace or combine.

Mammoth currently does not allow for changing the combine or replace options once the Dataset is created.

Branch out to Dataset

When you work in a View, you may want to finally save the data of a View in a Dataset. This is done through Branch Out to Dataset. The options to combine new data or replace existing data from the View is configured within the Branch Out to Dataset interface. Read more about it here.

Webhooks

Webhooks typically provide one row of a data at a time. Mammoth, however, deals with data in terms of Batches. Mammoth accumulates the Webhook data and makes a Batch of data. Batches of data are made every hour automatically if there is any data in that hour. Whenever Webhook catches any data, a Refresh button appears on the Preview Panel. You can also create a Batch manually by clicking on Refresh.

Refresh button

Fig. 6 Click on Refresh to add a new Batch manually

You can add or replace data from the Webhook menu in the Preview Panel.

Webhook Dataset mode

Fig. 7 Click on edit on Dataset mode to replace or combine data

Combine with another Dataset

Combine with another Dataset allows for combining of two Datasets that are not similar to each other. The result of combining Datasets can be saved as a new Dataset or into one of the existing Datasets.

add more data

Fig. 8 Combine with another Dataset

Understanding Batches

Dataset is made of Batches of data. A Batch is created when new data is added into a Dataset. The following actions can be performed on any Batch of the data:

Previewing a Batch

You can preview the data in a Batch. Select the Batch you want to see from the Batch Table and click on Preview batches to see the data of that Batch under Preview section.

Preview Batch

Fig. 9 Preview Batch

Adding columns of the Batch Table into Dataset

If you want to analyze your data in the context of the information present in the Batch Table, columns of the Batch Table can be added to the Dataset by using “Add Batch info to Dataset” option. If you want to remove the Batch columns, uncheck the columns under “Add Batch info to Dataset” option and click Apply.

Batch Table

Fig. 10 Click on “Add Batch info to Dataset” and check the desired columns.

Viewing source

You can see the source of a Batch from the source column in the Batch Table to know its origin.

Batch source

Fig. 11 Source column in the Batch Table

Deleting Batch

You can delete one or more Batches of data. Select the Batch you want to delete from the Batch Table and click on delete batches option.

delete batch

Fig. 12 Deleting a Batch

Suspending/unsuspending a Batch

You can suspend one or multiple batches to halt data merge from these batches into the respective dataset and its corresponding Views.

To suspend a batch, select the batch and click on the “Suspend/Unsuspend” option at the top. On successful batch suspension, the batch information greys out and the State changes to “Suspended”.

suspended batch

Fig. 13 Figure showing the greyed-out Suspended batch. At the top are buttons to Suspend/Unsuspend, Delete and Preview selected batches

If you wish to include data from a suspended batch, select the suspended batch (or batches) and click on the “Suspend/Unsuspend” option again. This will lift the suspension and the data will start reflecting in the dataset and its corresponding Views.

Note

Make sure Auto Sync is on for the changes to automatically reflect in the Views.

syncing suspended batch

Fig. 14 Manually syncing data when Auto Sync is off

When a batch is suspended or unsuspended, the relevant dataset properties update accordingly. For instance, you’ll notice an additional “Suspended rows” property appear when a batch is suspended. It shows the number of rows suspended in the dataset.

suspended batch properties

Fig. 15 Dataset properties showing number of suspended rows

Similarly, you’ll see information regarding pending syncs. This depicts the number of rows that are yet to be synced. The number becomes zero when the data is synced.

sync pending property

Fig. 16 Image showing zero pending syncs

Synchronizing Views with data

When a Batch is added or replaced to the Dataset, the Pipelines in the View are run and thus Views are synchronized with data. While Views are synchronized, you cannot perform any activity in the View. The data from the Dataset can be synced to its Views in the following ways:

  • Automatic sync mode - sync all Views automatically when new data is added.

  • Manual sync mode - sync all Views manually when new data is added.

    Sync Settings

    Fig. 17 Sync Settings in Preview Panel

Automatic sync is the default option for data sync. This can be changed from the Preview Panel.

When new data is added to a Dataset, there is a 30-second window during which you can turn off the automatic sync of new data into the Views. Pausing the sync through this option changes the Dataset to work in the manual sync mode.