Extract JSON

The Extract JSON task allows for safe conversion of data from JSON format into a table format. This task comes with a auto detection which will pre-populate the user interface with suggestions.

Using this task, you can easily extract data from arbitrary JSON without programming knowledge.

When to use

JSON flattening might be necessary in the following situations.

  1. If you are using webhooks to bring data into Mammoth.
  2. If you have JSON files
  3. If you have an API response from a web service.
  4. One or more of the columns in your database has JSON data.

Quick Start

Let us start with the following sampple data:

Items
{"Student": "Alice", "Science": 140, "Language": 138}
{"Student": "Frank", "Science": 122, "Arts": 102}

There are two rows in this data. Each row contains a JSON. If you want to convert this JSON into a rows vs column format, we can do the following:

  1. Open Data Preparation menu and click on Extract JSON.
  2. Select the column Items

Mammoth will automatically make the following suggestions:

  1. The JSON is an dictionary (object).
  2. Each Key in the dictionary will be extracted as a column.
  3. Keys to be extracted are Arts, Language, Science, Student.
  4. Click APPLY. The result will appear as displayed below:
Arts Language Science Student
  138 140 Alice
102   122 Frank

Supported Options

  1. Source Column: The column that contains the JSON values to be extracted.
  2. JSON type: The type of JSON object.
  3. Extract as: Choice on whether the data should be extracted as new columns or new rows.
  4. Extract the following keys: This panel allows for the list of keys to be extracted from the JSON dictionary. Each key will become a new column.
  5. Column type: The type of the new column. If the data cannot be extracted as the option chosen here, empty values will be produced. Numeric and Text data are supported. For date type see Date handling in JSON.
  6. Number of columns: The number of new column to create. This option appears when the JSON type is list and it is being extracted as new columns.
  7. Keep source column: If the source column is to be kept after extraction. This option is turned off by default. See Keeping source column.

JSON can be either list or a dictionary. JSON can also be extracted as new columns or rows. The following section describes how the combination of these options work.

Extracting dictionary As Columns

This combination of options allows for the extraction of dictionaries as new columns.

  1. One column is get created per key.
  2. All the values go into a cell below the column corresponding to that key.
  3. If the data in a particular row does not contain a certain key, an empty value is placed in the cell.

This is the default combination of options when the JSON is detected by Mammoth as a dictionary.

To see an example of this, see Quick Start.

Extracting dictionary as rows

This combination of options allows for the extraction of dictionaries as new rows.

  1. Each key value pair is extracted as a new row.
  2. Two columns will get automatically created, one for the keys and one for the values.
  3. If the data does not contain any key-value pairs, corresponding cells are left empty.

For example, if the data appears like this:

Names
{"first": "Alice", "second": "Bob", "third": "Claire"}

After applying this combination, it will appear as shown below:

Key Value
first Alice
second Bob
third Clair

Extracting list as columns

This combination of options allows for the extraction of lists as new columns.

  1. The number of columns to be extracted can be configured.
  2. The first value in the list goes into the first column, second value goes into the second column and so on.
  3. If the list does not contain as many columns as you have chosen, the corresponding cells are left empty.

For example, if the data appears like this:

Names
["Alice", "Bob", "Chuck"]

After applying this combination, it will appear as shown below:

Item 1 Item 2 Item 3
Alice Bob Chuck

Extracting list as rows

This combination of options allows for the extraction of lists as new rows.

  1. Mammoth creates two columns automatically when this option is selected. One for index of the item in the array and one for the item itself.
  2. If the array is empty, the corresponding cell will be empty.

This is the default combination of options when the JSON is detected by Mammoth as a list.

For example, if the data appears like this:

Names
["Alice", "Bob", "Chuck"]

After applying this combination, it will appear as shown below:

Index Item
1 Alice
2 Bob
3 Chuck

Notes

Safe extraction

JSON data is often non-uniform. The keys are not mandatory and it is possible that data gets mixed.

The present ways of flattening JSON involve programming. The programmer has to ensure that the code for flattening data does not throw error if the data does not match with what they had envisoned.

Mammoth simplifies this complexity and provides safety extracting JSON. Mammoth’s algorithms safely handle the error cases and produce empty values instead of producing error conditions.

The concept of safe extraction also applies for various data types. If a value cannot be extracted safely as the given type, it produces an empty value. For example, if you want to extract a key-value pair into a numeric column and the value itself is of type text, then the output will be empty.

Auto Detection and Suggestions

When Auto Detection and Suggestions is invoked, Mammoth looks at some sample values in the JSON column and automatically makes suggestions. Since the suggestions are made only on the sample data, you should always verify these options before applying the rule.

  1. If the JSON is detected as a dictionary, the default suggestion is extraction as new columns.
  2. If the JSON is detected as a list, the default suggestion is extraction as new rows.

Corrective measures can be applied to the suggestions if the suggestion does not appear to be what you have envisoned. Some examples of such a scenario are:

  1. If the keys are not very consistent in data, some of the keys in data might not be captured by the suggestion algorithm. You can manually add missing keys.
  2. Some keys may be less important than others and you may want to delete such keys from the Key Extraction panel when carrying out this task to reduce complexity of the resulting data.
  3. The choice of extraction as rows or columns is dependent on the data and the desired result. See the following section for more information.

Extract as rows vs columns

The choice of extraction as rows or columns is dependent on the data and the desired results.

For example, see the following data:

Items
{"Student": "Alice", "Science": 140, "Language": 138}
{"Student": "Frank", "Science": 122, "Arts": 102}

This can be extracted as new columns to appear as shown below:

Arts Language Science Student
  138 140 Alice
102   122 Frank

The same data when extracted as new rows will appear as shown below:

Key Value
Language 138
Science 140
Student Alice
Arts 102
Science 122
Student Bob

Both of these extraction mechanisms have their uses.

  • If you want to know how each of the students performed, the first method is useful.
  • If you want to know the average of marks irrespective of students, second method is useful. You can filter the Student values from Key column and build summary subject-wise.

Date handling in JSON

JSON does not have a date type but JSON data often contains date values. However, it is up to the programmers as to how they want to represent dates in JSON. Since JSON is very flexible, there is no consistent date format in JSON data.

Mammoth does not allow for date handling in Extract JSON task. You would have to perform date conversion separately using the Convert column type task.

Keeping source column

The costs of storage may increase exponentially when this option is turned on. This is especially true when the JSON extraction is creating new rows. However, it may be useful to use this option if the data contains contains both lists and dictionaries and you want to apply two different extraction techniques on the same column.

See also

Convert column type

How to convert one or more columns into a different type.

Extract Text

How to extract sub-strings from a text column into another column.

Remove Column

How to delete one more columns.