Crosstab

A Crosstab is a table of statistics that summarizes the data of a more extensive table. The summary may include sums, averages or other statistics, which the Crosstab groups together in a meaningful way.

When to use

  • A Crosstab can help in summarizing the data and to highlight the information.
  • To create columns for each category in a row.

Quick Start

Let us start with the following data:

Student Subject Category Marks
Alice Physics Science 85
Alice Chemistry Science 55
Alice English Language 60
Alice French Language 78
Bob Physics Science 75
Bob Chemistry Science 60
Bob English Language 65
Bob French Language 80

It is important to identify the required end result. In this example, let us ask “How many marks did each student score per category?”

To answer this question, complete the following steps in Mammoth application:

  1. Open Data Preparation menu and click on Reshape, Group & Aggregate.
  2. Select Crosstab
  3. Drag and drop the Student column into the Rows bucket.
  4. Drag and drop the Category column into the Columns bucket.
  5. Change the aggregation from Count to Sum of Marks.
  6. Click APPLY.

This produces a new dataset. The dataset is created in the same folder.

The new dataset appears as shown below:

Student Science Language
Alice 140 138
Bob 135 145

Note: We could use the Group data task to achieve the same results. It will be presented differently.

Supported Options

The following options are supported within the Crosstab task:

  • Rows: The columns that produce the rows in the Crosstab. When multiple columns are chosen, the result contains possible combination of the values in selected columns.
  • Columns: The columns that produce the columns in the Crosstab. When you choose multiple columns, the result will have one column for each possible combination of the values in the selected columns.
  • Aggregation: The aggregation that needs to produced in the summary. See functions.
  • Dataset Name: The name of the new dataset that is created with this operation. This name is a suggestion and gets modified by the system if you have datasets with the same name already.

Apart from the options listed above, an option to tweak the working of task when data is refreshed is also provided. See what happens when pipeline reruns.

What happens when pipeline reruns

See What happens when the pipeline reruns in the feature guide for Branch out to Dataset.

Note

  1. There is a limit of 400 columns in the result dataset.
  2. Editing this task can result in some unexpected behavior when you change the configuration too much. It is best to delete a crosstab task and create a new task if the configuration is going very different from the previous one.
  3. The date columns can be summarized further within the in Rows/Columns section. You can extract specific components from date columns before summarizing.
  4. System places the new dataset in the same folder as the parent dataset.