Remove DuplicatesΒΆ

The Remove Duplicates task removes duplicate rows from your data.

basic functionality

Fig. 21 Remove Duplicate rows

Two rows are duplicate if they have precisely the same values in every corresponding cell among two or more rows.

Mammoth allows you to ignore a few columns from comparison for duplicates (See Fig. 22). You can choose columns to ignore from the left side selection box (click on the + button). The box on the right side shows ignored columns. To compare full row for removing duplicates, make sure that no columns are in the right box.

menu

Fig. 22 Remove Duplicates rows task window.

Since Remove Duplicates does not consider cells from ignored columns while comparing, it picks a random value from the duplicated rows for the ignored columns. See Fig. 23.

Random value example

Fig. 23 Randomly picks the values from the column cells for the resultant row.

Note

  1. If Remove Duplicates does not work as expected, you may have numbers or dates that have been formatted to look the same but are internally different. Remove duplicate compares on internally stored value and not the formatted values. Check if there are any formats set on the columns. Data seen on screen may look different from what is stored internally.