Dataset¶

A Dataset contains tabular data organized as columns and rows. You can create a Dataset by sourcing data from files, databases, APIs or Webhooks. Read more about the sources of data here.

You can add or replace data in a Dataset. This data shows as Batches in a Dataset. The mechanism of adding data differs based on different sources of data.

You see the data of a Dataset in a View. A fresh View with no rules shows you the original Dataset. You can create many Views on a Dataset. Each View transforms the Dataset but never changes the Dataset itself.

Datasets are listed in Data Library under their respective project. Clicking on a Dataset opens the Preview Panel for that Dataset where you can see various properties related to Datasets, its Batches and Views.

Note

All columns in a Dataset have well defined data type namely Numeric, Date or Text. Mammoth respects the data types when it is available (from Databases, API etc.). For CSV and Excel files, Mammoth infers the types after analysing the uploaded file. This inference may not be correct at times and Mammoth may need your feedback to make corrections. Mammoth would prompt you for your inputs whenever such need arises.

Let us visit each of the above ideas in some detail.

Adding Datasets¶

You can create datasets in any project with the “Add data” button.

Mammoth supports data imports from the following sources:

Desktop¶

You can drag and drop files in the Data Library from the Desktop. Alternatively, click on ‘Add data’ icon and open “My Desktop” Window.

Mammoth supports the following file types: .csv, .tab, .tsv, .txt, .xls, .xlsx, .zip. Mammoth also supports password-protected .xlsx files.

APIs and databases¶

You can fetch data from various APIs and Databases. Click here to learn more about supported APIs and Databases.

Webhooks¶

You can set up an incoming webhook to receive data from Webhooks section after clicking on ‘Add data’ in the Data Library. Webhooks can receive data as GET, POST, or JSON post parameters. They can be used with popular services such as GitHub or JIRA.

For more information on Webhooks, Click here.

Public files from URLs¶

Enter the URL to file on web under “Fetch from URL” option in ‘Add data’ menu.

Note that there are files on the web that are not accessible by external software because the owners do not want automated software to access the data. In such case, you can download the file and upload it in the Data Library.

Viewing Dataset¶

You can see your Dataset by clicking on Open in the Preview Panel. Mammoth creates a first View of the Dataset by default.

Fig. 3 Click on open to view Dataset¶

If the View has rules, you can see the original data in the Dataset from the Original Dataset option present at the top of the Data Pipeline or by simply creating a new View.

Fig. 4 Click on rows to preview original Dataset¶

Adding or replacing data¶

You can add or replace data into your Dataset. Add or replace data is configured differently in case of different types of Datasets. This is how it works for Datasets created through different sources:

File upload and public URLs¶

For these Datasets, you can add or replace data from the Preview Panel. When you add more data, Mammoth infers type of columns in the new data and attempts to do an exact match with the types in the existing Dataset. If the type matches correctly the merge of data succeeds. Otherwise you would see a new Dataset in the Data Library. You can then use Combine Dataset process to merge the Dataset. This gives you a way to map the types of the original and the new uploaded Datasets.

Fig. 5 Add data to Dataset¶

Third-party connections¶

You can combine with or replace older data while creating a new connection.

Fig. 6 Choose the option to replace or combine.¶

Mammoth currently does not allow for changing the combine or replace options once the Dataset is created.

Branch out to Dataset¶

When you work in a View, you may want to finally save the data of a View in a Dataset. Use the Branch Out to Dataset to do this. The options to combine new data or replace existing data from the View is configured within the Branch Out to Dataset interface. Read more about it here.

Webhooks¶

Webhooks typically provide one row of a data at a time. Mammoth, however, deals with data in terms of Batches. Mammoth accumulates the Webhook data and makes a Batch of data. Batches of data are made every hour automatically if there is any data in that hour. Whenever Webhook catches any data, a Refresh button appears on the Preview Panel. You can also create a Batch manually by clicking on Refresh.

Fig. 7 Click on Refresh to add a new Batch manually¶

You can add or replace data from the Webhook menu in the Preview Panel.

Fig. 8 Click on edit on Dataset mode to replace or combine data¶

Combine with another Dataset¶

Combine with another Dataset allows for combining of two Datasets that are not similar to each other. The result of combining Datasets can be saved as a new Dataset or into one of the existing Datasets.

Fig. 9 Combine with another Dataset¶

Understanding Batches¶

Dataset is made of Batches of data. A Batch is created when new data is added into a Dataset. The following actions can be performed on any Batch of the data:

Previewing a Batch¶

You can preview the data in a Batch. Select the Batch you want to see from the Batch Table and click on Preview batches to see the data of that Batch under Preview section.

Fig. 10 Preview Batch¶

Adding columns of the Batch Table into Dataset¶

If you want to analyze your data in the context of the information present in the Batch Table, columns of the Batch Table can be added to the Dataset by using “Add Batch info to Dataset” option. If you want to remove the Batch columns, uncheck the columns under “Add Batch info to Dataset” option and click Apply.

Fig. 11 Click on “Add Batch info to Dataset” and check the desired columns.¶

Viewing source¶

You can see the source of a Batch from the source column in the Batch Table to know its origin.

Fig. 12 Source column in the Batch Table¶

Deleting Batch¶

You can delete one or more Batches of data. Select the Batch you want to delete from the Batch Table and click on delete batches option.

Fig. 13 Deleting a Batch¶

Suspending/unsuspending a Batch¶

You can suspend one or multiple batches to halt data merge from these batches into the respective dataset and its corresponding Views.

To suspend a batch, select the batch and click on the “Suspend/Unsuspend” option at the top. On successful batch suspension, the batch information greys out and the State changes to “Suspended”.

Fig. 14 Figure showing the greyed-out Suspended batch. At the top are buttons to Suspend/Unsuspend, Delete and Preview selected batches¶

If you wish to include data from a suspended batch, select the suspended batch (or batches) and click on the “Suspend/Unsuspend” option again. This will lift the suspension and the data will start reflecting in the dataset and its corresponding Views.

Note

Make sure Auto Sync is on for the changes to automatically reflect in the Views.

When a batch is suspended or unsuspended, the relevant dataset properties update accordingly. For instance, you’ll notice an additional “Suspended rows” property appear when a batch is suspended. It shows the number of rows suspended in the dataset.

Fig. 15 Dataset properties showing number of suspended rows¶

Similarly, you’ll see information regarding pending syncs. This depicts the number of rows that are yet to be synced. The number becomes zero when the data is synced.

Fig. 16 Image showing zero pending syncs¶

Synchronizing Views with data¶

You can choose to synchronize selective Views with data updates from the source. Mammoth provides the Data Sync feature to control the dataflow into individual Views of a Dataset.

You can read more about the Dataflow Control here.