Overview

Tasks are the data transformations offered by Mammoth. A Task adds a layer of change on top of the original Dataset. Each new Task then adds a new layer on the previous state, and so on.

A record of all Tasks is stored as a list in the Pipeline. Each Task in the Pipeline runs sequentially and can be edited, re-ordered, suspended, or deleted.

pipeline example

Fig. 99 A Pipeline

At the bottom of the Pipeline, you’ll find the Dataflow settings. On expanding, you’ll see the Auto-run pipeline toggle. Enabling this toggle would apply each Task in the Pipeline instantly while disabling this toggle will allow you to draft changes and apply them together. This is especially helpful when you are working on large datasets that take several minutes for each Task to run. In such cases it is recommended that you make all changes before submitting them at once. The system also lets you review all modifications before applying.

dataflow settings in views

Fig. 100 Control the Pipeline behaviour with Auto-run

Note

1. When Auto-run is off, grid and explore cards show previous run’s data. The schema, however, will follow the changes you do in the Pipeline. So suppose you have created a new column, it won’t reflect in the current View, but will show in the options when you add the next Task.

  1. You can not check/uncheck the Auto-run checkbox when the pipeline is still running.

Adding Tasks

A Task can be added using the Tansform menu. Clicking on a Task opens the Task Panel where you can provide the inputs for the Task, be that a condition, calculation, join, etc. (See Tasks for more information about Tasks). Once a Task is added, all the data is refreshed to reflect the changes and a new entry for the Task is added to the Pipeline.

Editing Tasks

Any Task can be edited in the Pipeline with the Edit option in the menu at the top right corner of the Task.

Note

You will see a warning message when an edit affects other Tasks in the Pipeline. If you choose to proceed, the Pipeline may go into error and will need corrective action. The APPLY button is only enabled when the Task has valid inputs.

Deleting Tasks

Remove a Task with the Delete option in the menu at the top right corner of the Task. It will display a prompt for you to confirm because a deleted Task cannot be recovered.

Note

The system will warn you if deleting a Task may affect other Tasks in the Pipeline. If you proceed, the Pipeline will go into error and you will need to take corrective action.

In the case where Auto-run is disabled, task removal gets recorded as one of the many changes in the Pipeline. You’ll see it in the review panel before applying all the changes. You can always Restore a deleted Task.

deleting a Task in draft mode

Fig. 101 Deleting a Task when auto-run is disabled

Suspending Tasks

A Task can also be suspended with the Suspend option in the menu at the top-right corner of the Task. Suspended Tasks remain idle on the Pipeline and can be restored with the Restore option. When you Restore a Suspended Task it becomes a part of the Pipeline again.

Note

The system will warn you if suspending a Task may affect other Tasks in the Pipeline. If you proceed, the Pipeline will go into error and you will need to take corrective action.

Duplicating Tasks

Sometimes, you may want the same Task with some slight changes. In such cases, you can duplicate an existing Task with the Duplicate option in the menu at the top right corner of the Task.

Copying Tasks

A Task can be copied and pasted in the same or another Pipeline.

Reordering Tasks

You can also change the sequence of your Tasks in the Data Pipeline. Just hover over to the Task, hold the six-dot icon and drag it anywhere in the Pipeline.

reordering a task

Fig. 102 Reordering a task

Reordening a Task that has further dependencies in the pipeline gets the Pipeline into error. You’ll receive a prompt to fix the behaviour. You will need to fix the error in order to Apply the changes.

reordering error in draft mode

Fig. 103 Pipeline goes into error when a Task with dependencies is moved

If you try to reorder a Task with dependencies when Auto-run is enabled, the system will throw an error message - “Can’t move. Task {no.} is dependent on this Task”.

Inserting Tasks

A new Task can be inserted anywhere in the Pipeline. Select the Task you want to add from Transform menu. Now hold and drag it anywhere in the Pipeline.

Note

The system will not allow insertion at a place if it does not make sense to the pipeline.

Previewing data at any step

You can view and export data at any step in the Data Pipeline.

To preview data at any step,

  • Click on the number of rows in that step and it’ll open the preview.

previewing data

Fig. 104 Previewing data at any step in the Pipeline

To download a CSV of that step,

  • Open preview at that step

  • Go to Export and Share

  • Select the Download CSV option, a CSV file of that particular state will be downloaded.

Errors in Pipeline

The Pipeline in a View can have Tasks that create new columns. These columns may be used in subsequent Tasks in the Pipeline. If your edit changes the column type or it is removed altogether, subsequent Tasks using these columns will go into error and the Pipeline will not work.

Also, a Dataset may change when a column is removed or its type is changed. Any Tasks using such columns in any View of the Dataset will then be in error.

Changes due to Errors in Pipeline

pipeline error

Fig. 105 When a Pipeline in a View is in error, the Task is highlighted with red borders. Also, the View tab and Pipeline icon will show a red dot and the data grid and options like Explore columns, Alerts, and Column browser will be greyed out.

Data Library when Pipeline is in error

Fig. 106 In the Data Library, the View in error will display a red exclamation icon and the Open button in the Preview Panel for the broken View will turn red with a message saying: “This view has errors.”

Restrictions due to Errors in Pipeline

As shown above, in Fig. 105, Error(s) will disable various features for a specific View, such as:

  • Column browser, Metric Explore Panel, Explore Cards, Alerts, Export, CSV links.

  • Renaming, duplicating, or saving a View as a template.

  • All export Tasks, Crosstab, and Branch out to Dataset.

Fixing Errors in Pipeline

When an error occurs, you have two options - Fix or Delete:

  • Delete simply deletes the Task.

  • Fix opens the edit Task window for you to rectify the problems in the Task.

Fixing the error in Pipeline

Fig. 107 Fixing error in the Pipeline

Sometimes fixing errors in one Task may create a cascade of new errors elsewhere. If this feels messier, It may be wise to make a fresh start by using the “Discard Changes” option at the bottom of the Pipeline to return it to its previous state.