You can use this Task when you want to filter in (Keep) or filter out (Remove) certain rows of data from the view based on some condition(s) in the data. These conditions are also applicable in other Tasks.
In order to use Apply Filter effectively, let’s take a look at some of the fundamentals of it.
A condition is a statement that is checked against every row of data. Each row is checked for the condition. Only those rows that satisfy the condition will be present in the results of the Task.
Value or Column Value¶
A condition in Mammoth has two types of operands - Value and Column Value.
Choose Value as operand type if you want to build a condition w.r.t to any value chosen by you. For example - in the Dataset in Fig. 15 , we want to filter the data where the Gender is “Female”, so we can build a condition for the column Gender where its value is “Female”.
Choose Column Value as operand type if you want to build a condition w.r.t values in another Column. For example - in the Dataset in Fig. 15, we can use Column value operand to compare the marks of English over the two semesters for all Students.
If we want to select the Students who have better marks In English in Sem-1, we can build a condition comparing English (Sem-1) column with the English (Sem-2) column using the greater than operator. The result will be as shown below:
Let’s take Fig. 15 in consideration and try to reach the final output using Apply Filter.
The final result shows all the females in the Dataset. You can achieve that by doing the following:
- Open Data Preparation menu and click on Apply Filter.
- Select Keep from the Keep / Remove option to filter in the values which satisfy the condition you have defined.
- Select the column Gender from the column drop-down.
- Select the operator is from the operator list.
- Select the operand type Value from the operand list.
- Select the operand value Female .
- Click Apply
- Filtering does not delete any data. It only filters the data for use in the next step of the pipeline. Filters can be modified or deleted.
- You can make the text values case sensitive by clicking on the case-sensitive checkbox present below the last condition.
- Filter condition suggestions are provided based on current data in the view. It does not account for any future data that the view may get. This could affect the filter results. For example, if your need is to remove all those rows where a particular column value has -(hyphen) or _ (underscore). If your current data in the view only had values with - , the suggestions would only contain this. You need to explicitly add _ also to account for any future data containing this value. Else all values with _ in the view would not be filtered out.
- Apply filter can also be reached through shortcut menus on explore cards. Explore cards filter out the values but don’t make it part of the pipeline. From the explore card you can choose to add it to the pipeline. When you choose that, it opens the Apply Filter task pre-filled with the filter conditions matching the explore card selection
Building Complex Conditions¶
Let’s take a look at the Dataset in Fig. 15 . Let’s suppose you want to filter the data where the Gender is “Female” and marks in English in Sem-1 is greater than 70. There are two conditions we need to build in order to get the desired result. Mammoth allows you to have multiple conditions for a Task. A combination of conditions can be created using AND or OR operation.
To build multiple conditions, click on the ‘+’ icon . To solve the problem above, we create two conditions-
Using “AND” as both conditions should be true, you get the desired result.
Let’s take another scenario where you want to select the students (for a debate competition) if they have average marks (65<marks<80) in English in both Semesters or have scored above 90 in Sem-1. From the defined selection criteria, you can see there are two conditions. But in the first condition for average marks, we need to compare the marks for both the semesters,i.e, there is a condition within a condition, also called a nested condition.
Mammoth also gives you an option to build a nested condition for a Task. To create a nested condition, click on the ‘→’ icon .
Now to solve the problem defined above, we will build a multiple nested condition as -
English (Sem-1) is greater than 90
Output of (Use ‘→’ icon to create nested condition) -
English (Sem-1) in between 65-80
English (Sem-2) in between 65-80.
“OR” and “AND” being the operators used in between conditions to get the desired result.
- You can place a condition outside the parentheses by clicking on ‘←’ icon.