Fill Missing Values with Grouping

Filling missing values with grouping is a unique function which segregates the data and fills the null or empty values in a Dataset.

Quick Start

Let’s take a sample data set “Hospital” which shows the information about certain patients. You can download this sample data here .

Table 20 Hospital → View 1

Row Number

Patient entry ID

Date

Name of Patient

Blood Group

Any prior history?

Address

1

0001

06/01/2018

Luke

A+

Minneapolis

2

0002

22/03/2018

Nguyen

B+

Diabetic

Ohio

3

0003

22/03/2018

Nguyen

4

0004

14/09/2017

John

AB+

High blood pressure

San Diego

5

0005

06/01/2018

Luke

Minneapolis

6

0006

26/11/2017

Anne

O-

Obesity

Kansas city

7

0007

06/01/2018

Luke

A+

Minneapolis

8

0008

06/01/2018

Luke

Alzheimer

In this Dataset, there are three columns with missing values - ‘Blood Group’, ‘ Prior medical history’ and ‘Address’. It is evident that the blood group of a person never changes, therefore we fill the missing values in the column ‘Blood Group’.

Let’s take the column ‘Blood Group’ and fill the missing values within the group (Name of patient) from above. Follow the steps below:

  1. Go to Transform > Column Functions.

  2. Select the Fill Missing Values function.

  3. Select Blood Group column from Fill empty cells in drop-down.

  4. Select Patient Entry ID column and ascending/descending order in order by drop-down.

  5. Select above/below from the drop-down ‘with values from’ to fill the missing value in the column with the value above or below it.

  6. Click on Add Group and select Name of the patient column for the grouping of the data.

  7. Click on APPLY .

How it Works

Grouping by column divides the table into groups on the basis of data in the selected column. Filling from above or below is done on the groups thus formed.

Let’s take the sample Dataset of a hospital and break it down step by step.

  1. Add a grouping rule and select the column which you want to use to segregate the data. For example, let’s use the Name of the patient column. This rule divides the table based on the similar values in the same column. In this table, four groups are formed. We can visualize this as:

Grouping example

Fig. 116 Division of Dataset into Groups

Values from Above / Below

When you select ‘above’ or ‘below’ from the ‘with values from’ drop-down, it fills the empty cell of the column with value in the cell just above/below it. If we fill a group from above (using default row, as ‘order by’) in the table “Hospital”, it will fill the data from above in the four groups formed separately. Similarly, on selecting ‘below’, it will fill the data from cells below and then show it together.

We can visualize fill missing values from Above as follows:

Fill from above Example

Fig. 117 Fill missing values from Above

The resultant table will be:

Result on filling from above

Fig. 118 Final outcome after filling values from Above

Similarly, fill missing values from below can be visualized as:

Fill from below Example

Fig. 119 Fill missing values from Below

The resultant table will be:

Result on filling from below

Fig. 120 Final outcome after filling values from Below

Ordering

To fill from above or below, you would need to order the data first. ‘Order by’ option allows you to sort your data according to the specified column in ascending or descending order. By default, ordering of data is done by the row number. For example, select Patient Entry ID column and ‘ascending’ in the drop-down menu labelled ‘order by’ to sort the data in ascending order.

Alternatively, you can order them on the home screen itself. At the bottom of the screen, there is a ‘Sort by’ option. If you sort the table on the home screen, it becomes the default option in the fill missing values menu.

Sorting

Fig. 121 Sort by option at the home screen

In the above example, if you select the ‘Date’ column for ordering the data, it leads to an ambiguity as the data in that column is not unique, there is duplicate data in several cells. Since row number is a unique, duplicate value in the chosen order column get further resolved by row number.

Note

The ‘Add Group’ option however is optional. When grouping is not applied, whole data is treated as a single group. Thus the missing values is filled by the value above or below based on the ordering over complete data.

See also

Window Function

The Window Function documentation