Group and Aggregate

The Group and Aggregate task lets you summarize your data by doing one or both of the following:

  • Grouping rows with the same set of values.
  • Calculate aggregate functions.

Table of Contents

Quick Start

Let us start with the following sample data:

Product Price Category
HP Elite 900.00 Laptop
Lenovo Thinkpad 1100.00 Laptop
Sony VAIO 600.00 Laptop
Microsoft Lumia 300.00 Smartphone
HTC One 400.00 Smartphone
iPad 700.00 Tablet
Kindle Fire 300.00 Tablet

Let us calculate the average price in each Category. Complete the following steps:

  1. Open Data Preparation menu and click on Reshape, Group & Aggregate.
  2. Select Group & Aggregate.
  3. In the Group the following section, find the column Category and add it by clicking on the + button.
  4. In the And Aggregate By section, find the column Price. Change the dropdown before it to AVG. Click on the + button. Change the format to have two decimal places.
  5. Click APPLY.

The following table shows the final result:

Category | Avg. of Price
Laptop 866.67
Smartphone 350.00
Tablet 500.00

We can calculate multiple aggregations. Let us edit the task we just created. Let us add these items to the result from the Aggregate the following section.

  1. COUNT
  2. MAX of Price
  3. MIN of Price

The following table shows the final result:

Category Avg. of Price Count Max of Price Min of Price
Laptop 866.67 3 1,100 600
Smartphone 350.00 2 400 300
Tablet 500.00 2 700 300

You can add multiple items from group the following section too. For example, say you add two different columns. Then the data will be grouped by each unique combination values in those two columns from the data.

Supported Options

The following inputs are expected by this task:

  • Group the following: Select any column you would like to group by. If multiple columns are selected, the data will be grouped by each unique combination values in those columns from the data.
  • Aggregate the following: Select any aggregations you would like to include in your summary. The aggregation will be calculated for each group. In case you have not selected any column to group by, aggregation will be calculated over the entire data.

Note

  1. You can rename or change the format of the selected aggregations.
  2. You drag-and-drop to reorder the results.