Journey of Data

Mammoth is used to implement a data flow Pipeline where your data traverses through successive transformations to produce output. The process is feedback driven where data discovery happens along the way. Here is a sample of a simple data transformation process using Mammoth.

Data flow in Mammoth

Fig. 2 A typical data flow in Mammoth.

Add New Data

Mammoth allows you to fetch or add new data from different kinds of sources.

  • Local Files : Upload files from your system.
  • Databases : Connect to one of the supported databases .
  • API : From a cloud API (like Salesforce, Google Analytics etc).
  • Webhooks : Push data to Mammoth using webhooks .

You can also pull any existing public Datasets as a comma separated values (CSV) or a zipped CSV hyperlink on the web.

Discover

When you open a Dataset, Mammoth automatically creates the first View to help you start exploring your data. You can create an Explore card to do quick exploration and look for any quick insights, anomalies or pattern. Explore cards are a very powerful tool. For more information, see here

Creating Multiple Views

You build perspectives on the data through Views. Views give you the flexibility to transform your data through simple or complex Pipelines for you to analyze your data. Following are some points to note about Views:

  • You can create multiple Views on the same Dataset.
  • A View can blend data from single or multiple Datasets and Views.
  • Multiple Views can add their end data to the same Dataset.

Following are few examples that illustrate the power of having multiple Views of a Dataset:

  • Assume that you have raw data containing a large number of columns. Each column gives a different type of statistics about the subject of analysis. You may find it difficult to deal with so many columns. To simplify, you can create multiple Views on the same Dataset and hide the columns that are not relevant in each of the Views. Each View can then have a Pipeline that is relevant for analysis.
  • In another case, you may have a large number of columns in your data, but only a few of them should be exposed for downstream analysis. You can hide such columns and save the View as a new Dataset.
  • In another example, your data is in two different Datasets and you want to join them based on a common key that is present in both the Datasets. You can join data from another View with data in your current View to achieve this.
  • Your data could be coming from multiple data sources and structured differently. You, however, have a standardized structure to final data. You can create Pipelines on each Dataset through its Views and create a standardized output and then append into a target Dataset.

Export/Save

After a Task is complete in a Task Pipeline, you can save your data to another Dataset or publish the data for further analysis to an external system like databases or visualization platforms like PowerBI or Elasticsearch/Kibana or Google Data Studio. Many external visualization tools connect to one of the supported databases.