A Crosstab is a table of statistics that summarizes the data of a more extensive table. The summary may include sums, averages or other statistics, which the Crosstab groups together in a meaningful way.
- A Crosstab can help in summarizing the data and to highlight the information.
- To create columns for each category in a row.
Let us start with the following data:
It is important to identify the required end result. In this example, let us ask “How many marks did each student score per category?”
To answer this question, complete the following steps in Mammoth application:
- Open Data Preparation menu and click on Reshape, Group & Aggregate.
- Select Crosstab
- Drag and drop the Student column into the Rows bucket.
- Drag and drop the Category column into the Columns bucket.
- Change the aggregation from Count to Sum of Marks.
- Click APPLY.
This produces a new dataset. The dataset is created in the same folder.
The new dataset appears as shown below:
Note: We could use the Group data task to achieve the same results. It will be presented differently.
The following options are supported within the Crosstab task:
- Rows: The columns that produce the rows in the Crosstab. When multiple columns are chosen, the result contains possible combination of the values in selected columns.
- Columns: The columns that produce the columns in the Crosstab. When you choose multiple columns, the result will have one column for each possible combination of the values in the selected columns.
- Aggregation: The aggregation that needs to produced in the summary. See functions.
- Dataset Name: The name of the new dataset that is created with this operation. This name is a suggestion and gets modified by the system if you have datasets with the same name already.
Apart from the options listed above, an option to tweak the working of task when data is refreshed is also provided. See what happens when pipeline reruns.
See What happens when the pipeline reruns in the feature guide for Branch out to Dataset.
- There is a limit of 400 columns in the result dataset.
- Editing this task can result in some unexpected behavior when you change the configuration too much. It is best to delete a crosstab task and create a new task if the configuration is going very different from the previous one.
- The date columns can be summarized further within the in Rows/Columns section. You can extract specific components from date columns before summarizing.
- System places the new dataset in the same folder as the parent dataset.