Remove Duplicates

This task removes duplicate rows from your data.

basic functionality

Fig. 113 Remove Duplicate rows

Two rows are duplicates if they have the same values in corresponding cells on two or more rows.

Mammoth allows you to ignore a few columns from comparison for duplicates (See Fig. 114). To ignore columns from the left side selection box, click on the ‘+’ button. The box on the right side will show the ignored columns. To compare an entire row to check for duplicates, make sure no columns are in the right side box.

menu

Fig. 114 Remove Duplicates rows task window.

Since Remove Duplicates does not consider cells from ignored columns while comparing, it picks a random value from the duplicated rows for the ignored columns. See Fig. 115.

Random value example

Fig. 115 Randomly picks the values from the column cells for the resultant row.

Note

  1. If Remove Duplicates is not working, check the numbers or dates aren’t formatted to look the same but are internally different. Check if there are any formats set on the columns. Data seen on screen may look different from what is stored internally.