{"id":3711,"date":"2023-07-05T10:56:29","date_gmt":"2023-07-05T09:56:29","guid":{"rendered":"https:\/\/mammoth.enspire.in\/?p=3711"},"modified":"2026-03-02T17:59:31","modified_gmt":"2026-03-02T17:59:31","slug":"how-to-clean-a-dataset","status":"publish","type":"post","link":"https:\/\/mammoth.io\/mammoth_v2\/how-to-clean-a-dataset\/","title":{"rendered":"How To Clean A Dataset in Minutes Quick 2-Step Guide)"},"content":{"rendered":"<p>Unsure of know how to clean a dataset the right way?<\/p>\n<p>If you struggle with <strong>data<\/strong>\u00a0<strong>cleansing,<\/strong>\u00a0<strong>normalization<\/strong>,\u00a0<strong>standardization<\/strong>\u00a0or\u00a0<strong>consolidation<\/strong>, this article is for you.<\/p>\n<p>We\u2019ll lay down a simple scenario from the retail world, but the concepts are applicable in a lot of other situations<\/p>\n<p>Let us take the following tables.<\/p>\n<p>These are transactional data for the same vendor that come from different sources &amp; different schemas:<\/p>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter wp-image-3714 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/blog1.png\" alt=\"\" width=\"1024\" height=\"841\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog1.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog1-300x246.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog1-768x631.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<h2>Our Objective<\/h2>\n<p><strong>Clean, Transform, and Merge the data\u00a0<\/strong>to look like the following:<\/p>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img decoding=\"async\" class=\"alignnone wp-image-3715 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/blog2.png\" alt=\"\" width=\"1024\" height=\"502\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog2.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog2-300x147.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog2-768x377.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<p>\u200d<\/p>\n<h2>The Challenges<\/h2>\n<p>If we only had these nine rows to deal with, it\u2019s not an issue \u2014 copy and paste within MS Excel or Google Sheets and manually clean it up.<\/p>\n<p>But in the real world, the problems come in various forms:<\/p>\n<ul role=\"list\">\n<li>\u200d<strong>Size of datasets<\/strong>: Whether it is a couple of thousand rows or millions, a regular spreadsheet isn\u2019t designed to handle the transformation required to achieve the end state<strong>\u200d<\/strong><\/li>\n<li><strong>Constant inflow of data &amp; the need for automation<\/strong>: Data today is rarely static. They are continually growing, and all the modifications needed become a repetitive nightmare.<strong>\u200d<\/strong><\/li>\n<li><strong>Unavoidable data messiness<\/strong>: Additional column names, inconsistent content, different schemas \u2014 these are real-world problems that are almost impossible to fix at the source. They need to be handled during data consolidation.<\/li>\n<\/ul>\n<h2>Mammoth\u2019s code-free, time-saving, automated solution<\/h2>\n<p>Let us show you how you can resolve this in a couple of minutes, without writing any code.<\/p>\n<p>For those who don\u2019t know about\u00a0<a href=\"https:\/\/www.mammoth.io\/\" target=\"_blank\" rel=\"noopener\">Mammoth Analytics<\/a>, it is a lightweight, code-free data management platform.<\/p>\n<p>It provides powerful tools for the entire data journey, including data retrieval, consolidation, storage, cleanup, reshaping, analysis, insights, alerts and more.<\/p>\n<h2>Step 1 \u2014 Transform and normalize the three datasets<\/h2>\n<p>First, bring your data into the Mammoth Data Library.<\/p>\n<p>For this example, we have simple CSV files that we uploaded directly into Mammoth, but\u00a0<a href=\"https:\/\/mammoth.io\/mammoth_v2\/features\/#bring-your-data-together\" target=\"_blank\" rel=\"noopener\">the platform supports a lot of additional ways to ingest your data.<\/a><\/p>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img decoding=\"async\" class=\"alignnone wp-image-3716 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/blog4.png\" alt=\"\" width=\"1024\" height=\"597\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog4.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog4-300x175.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog4-768x448.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<p>With Mammoth\u2019s\u00a0<a href=\"https:\/\/mammoth.io\/mammoth_v2\/features\/#transform-and-prepare\" target=\"_blank\" rel=\"noopener\">extensive data transformation functions<\/a>, we can shape the data in a variety of ways to get it in the format<\/p>\n<p>We\u2019ll perform a couple of transformations here to get the data in the right shape:<\/p>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3717 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/blog-step-1.png\" alt=\"\" width=\"1024\" height=\"345\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-300x101.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-768x259.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3718 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/blog-step-1-2.png\" alt=\"\" width=\"1024\" height=\"286\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-2.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-2-300x84.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-2-768x215.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3719 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/blog-step-1-3.png\" alt=\"\" width=\"1024\" height=\"302\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-3.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-3-300x88.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/blog-step-1-3-768x227.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<h2>Step 2 \u2014 Save the Datasets into a Master Dataset<\/h2>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3726 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/step-2-main.png\" alt=\"\" width=\"1024\" height=\"921\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-main.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-main-300x270.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-main-768x691.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<p>Now that we have transformed the data let\u2019s save it into a Master Dataset.<\/p>\n<p>For this action, we will utilize a powerful function called \u201cSave to Dataset\u201d. This function allows multiple, potentially inconsistent and incompatible datasets to be merged into a single master dataset.<\/p>\n<p>From Dataset 1, we will create a Master Dataset<\/p>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3722 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/step-2-1.png\" alt=\"\" width=\"1024\" height=\"634\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-1.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-1-300x186.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-1-768x476.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3723 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/step-2-2.png\" alt=\"\" width=\"1017\" height=\"1024\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-2.png 1017w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-2-298x300.png 298w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-2-150x150.png 150w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-2-768x773.png 768w\" sizes=\"(max-width: 1017px) 100vw, 1017px\" \/><\/figure>\n<p>Now with Dataset 2 and 3, we\u2019ll add the data into the Master Dataset<\/p>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3724 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/step-2-3.png\" alt=\"\" width=\"1024\" height=\"866\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-3.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-3-300x254.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/step-2-3-768x650.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\">\n<div><img decoding=\"async\" src=\"https:\/\/uploads-ssl.webflow.com\/6256cf335240122837e95aff\/631b1958135f70d199f5dbc9_step-2-4.png\" alt=\"\" \/><\/div>\n<\/figure>\n<h2>And we\u2019re done<\/h2>\n<p>We can now see the \u201cMaster Dataset\u201d in the Data Library. If we open that up, we\u2019ll see our cleaned up and consolidated data.<\/p>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3720 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/colclusion-1.png\" alt=\"\" width=\"1024\" height=\"519\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/colclusion-1.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/colclusion-1-300x152.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/colclusion-1-768x389.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<figure class=\"w-richtext-figure-type-image w-richtext-align-center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3721 size-full\" src=\"http:\/\/mammoth.enspire.in\/wp-content\/uploads\/2023\/07\/conclusion-2.png\" alt=\"\" width=\"1024\" height=\"522\" srcset=\"https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/conclusion-2.png 1024w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/conclusion-2-300x153.png 300w, https:\/\/mammoth.io\/mammoth_v2\/wp-content\/uploads\/2023\/07\/conclusion-2-768x392.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<p>We have achieved a code-free solution to combining multiple, incompatible datasets in a couple of minutes.<\/p>\n<p>This a small example of some of the benefits of using the\u00a0<a href=\"https:\/\/www.mammoth.io\/\" target=\"_blank\" rel=\"noopener\">Mammoth Analytics<\/a>\u00a0platform.<\/p>\n<p>To learn more, check out some of the\u00a0<a href=\"https:\/\/mammoth.io\/mammoth_v2\/features\" target=\"_blank\" rel=\"noopener\">features<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you struggle with data cleansing, normalization, standardization or consolidation, this article is for you. Mammoth\u2019s powerful new feature will save you time, money and all the headaches.<\/p>\n","protected":false},"author":7,"featured_media":3712,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[15],"tags":[73],"class_list":["post-3711","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-data-integration"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/posts\/3711","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/comments?post=3711"}],"version-history":[{"count":4,"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/posts\/3711\/revisions"}],"predecessor-version":[{"id":10063,"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/posts\/3711\/revisions\/10063"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/media\/3712"}],"wp:attachment":[{"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/media?parent=3711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/categories?post=3711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mammoth.io\/mammoth_v2\/wp-json\/wp\/v2\/tags?post=3711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}