How to use Concatenate Values to solve common data problems?¶
In Data Analytics, you are often working to find insights like - the top 10 revenue-generators, top performers in an organization, most popular electives in a college department, varieties available for a product, and so on.
Even though it’s a typical data problem, the solution may require many steps before we get to the result. But, it does not have to be this tough.
Today we are looking at an alternative way to solve these data problems using Mammoth. I’ll show you how you can achieve the results faster with a combination of efficient functions in Mammoth.
Let’s look at a dataset to better understand what’s going on. Suppose you have the
following dataset and you’re tasked with finding the 3 bestselling models within each brand:
That is, you need results like this:
There are three sub-tasks we are dealing with here:
To know the total sales each of these brands, and by extension, each model is making
To know the best-selling models among each lot
To have the final top 3 models within each brand as a comma-separated list
The first part is pretty straightforward. You can find the total sales value for the entire data using a simple mathematical function.
Solving the second will require the use of grouping combined with mathematical functions.
The third bit is the trickiest. If you’ve ever tried following this in Excel , you’ll see how tricky it can be. We are essentially trying to achieve this:
Our goal will be to use Mammoth to solve the above data problem, and to take you through the detailed steps involved in each of the above-mentioned tasks.
First, let’s breakdown the requirement into simple to-dos:
To calculate the total sales values for each row
To rank the models in each brand as per their sales value
To generate a list of top-selling models in each brand
Step 1: Calculating total sales value¶
First, we’ll calculate the total sales value (say ‘Revenue’) using one of the math functions in Mammoth.
Navigate to the dataset
Go to Data Preparation > Numeric Function > Perform a math function
Write a mathematical function to calculate the total sales value. Here we are multiplying the columns - price each and quantity ordered, and getting the results into a new column ‘Revenue’. Click Apply:
Step 2: Ranking models according to their sales value¶
Next, we’ll try to rank the data as per the highest revenue-generating models in each brand. Now, this wouldn’t be a hassle at all if the clause ‘within each brand’ wasn’t there. We could have simply sorted the data by Revenue (descending). But since we have an extra condition here, we can’t do that. Instead, we’ll use another mathematical function here - the window function. This will help us apply grouping before computing rank.
To do this,
Go to Data Preparation > Numeric Function > Window Function > Rank
Use the following configuration:
This will rank the rows within the grouping, here within each brand, and limit them to the top 3 results.
Step 3: Creating final list of top-performing models¶
Finally, we need this list as a comma-separated list like:
To get this:
Go to Data Preparation > Text Function > Concatenate Values
Use the following configuration:
And you’re done.
Mammoth provides a range of data preparation options that helps you handle tricky data problems just like that. Many companies have opted for Mammoth to cut time and make their analysis efficient.
Giants like Starbucks, Nielsen, etc have discovered the power of Mammoth to solve numerous data problems. So have companies like Rethink, ILG, and Everest Detection. Check out these case studies to understand how companies are using Mammoth to solve their unique data problems.
If you want to take the tool for a spin, get in touch.