Branch out to Dataset

A view can be saved as a dataset using Branch out to Dataset task. Optionally, you can append to another dataset that is created using the same mechanism. This is useful for combining data from different views.

Quick Start

Let us say you have two views with the following sample data:

Dataset 1 → View 1:

Student Science Language
Alice 140 138
Bob 135 145

Dataset 2 → View 1:

Student Science Language
Frank 122 102
Judy 185 135

Let us try to combine these two views into one target dataset. Complete the following steps:

Open Dataset 1 → View 1.

  1. Open Data Preparation menu and click on Merge & Branch Out.
  2. Select Branch out to Dataset.
  3. Enter a name for the new dataset. In this example it would be Final Scores.
  4. Click APPLY.

The new dataset produced is created in the same folder.

This new dataset will look exactly like Dataset 1 → View 1 in with respect to data.

Open Dataset 2 → View 1. Complete the following steps:

  1. Open Data Preparation menu and click on Merge & Branch Out.
  2. Select Branch out to Dataset.
  3. Select the Combine with an existing Dataset option on at the top of the panel.
  4. Find the dataset Final Scores.
  5. Verify the column mappings.
  6. Click APPLY.

After the steps are completed, the data in Final Scores appears as shown below:

Student Science Language
Alice 140 138
Bob 135 145
Frank 122 102
Judy 185 135

Supported Options

There are two supported destination options in this task

Branch out to Dataset

This option is used to create a new Dataset. Following option is supported in this mode:

  • Dataset Name: The name of the new dataset that would get created with this operation. This name is just a suggestion and it will be modified by the system if you have datasets with the same name already. System will place the new dataset in the same folder as the parent dataset.

Note

  • System will place the new dataset in the same folder as the parent dataset.
  • Hidden columns will be ignored.

Combine with an existing Dataset

This option is used to combine with another existing dataset. This list is limited to Mammoth generated datasets only. Following options are supported in this mode:

  • Select a Dataset: To select target dataset where the data should go to. This list will show all the datasets in the system that are support appending of data.
  • Match Columns: Match the columns from the view into the target. System will auto determine but you can override it. You can only match columns that have the same type. Also the matching is one to one.

Note

Apart from these options, an option to tweak how this task works when data is refreshed is also provided. See what happens when pipeline reruns.

What happens when the pipeline reruns

There are two modes you can choose from while doing a pipeline rerun

  1. Replace mode: Only batch linked to this task in the target dataset gets replaced.
  2. Combine mode: The data will be added to the target dataset as new batch. This might produce duplicate rows in target dataset if one is not careful.

Note

  • The resultant dataset gets modified only when there are changes in original dataset.
  • The resultant dataset does not reflect the changes done in the views.