Crosstab

A Crosstab is a table of statistics that summarizes the data of a more extensive table. The summary may include sums, averages or other statistics, which the Crosstab groups together in a meaningful way.

When to use

  • A Crosstab can help in summarizing the data and to highlight the information.

  • To create columns for each category in a row.

Quick Start

Let us start with the following data:

Student

Subject

Category

Marks

Alice

Physics

Science

85

Alice

Chemistry

Science

55

Alice

English

Language

60

Alice

French

Language

78

Bob

Physics

Science

75

Bob

Chemistry

Science

60

Bob

English

Language

65

Bob

French

Language

80

It is important to identify the required end result. In this example, let us ask “How many marks did each student score per category?”

To answer this question, complete the following steps in Mammoth application:

  1. Go to Transform > Reshape, Group & Aggregate.

  2. Select the Crosstab option.

  3. Drag and drop the Student column into the Rows bucket.

  4. Drag and drop the Category column into the Columns bucket.

  5. Change the aggregation from Count to Sum of Marks.

  6. Click APPLY.

This produces a new dataset. The dataset is created in the same folder.

The new dataset appears as shown below:

Student

Science

Language

Alice

140

138

Bob

135

145

Note: We could use the Group data task to achieve the same results. It will be presented differently.

Supported Options

The following options are supported within the Crosstab task:

  • Rows: The columns that produce the rows in the Crosstab. When multiple columns are chosen, the result contains possible combination of the values in selected columns.

  • Columns: The columns that produce the columns in the Crosstab. When you choose multiple columns, the result will have one column for each possible combination of the values in the selected columns.

  • Aggregation: The aggregation that needs to produced in the summary. See functions.

  • Dataset Name: The name of the new dataset that is created with this operation. This name is a suggestion and gets modified by the system if you have datasets with the same name already.

Apart from the options listed above, an option to tweak the working of task when data is refreshed is also provided. See what happens when pipeline reruns.

What happens when pipeline reruns

See On updates in the feature guide for Branch out to Dataset.

Note

  1. There is a limit of 400 columns in the result dataset.

  2. Editing this task can result in some unexpected behavior when you change the configuration too much. It is best to delete a crosstab task and create a new task if the configuration is going very different from the previous one.

  3. The date columns can be summarized further within the in Rows/Columns section. You can extract specific components from date columns before summarizing.

  4. System places the new dataset in the same folder as the parent dataset.