Extract Text

The Extract Text task allows for partial extraction of values from a text column. In other words, you can extract sub-strings from text values.

Quick Start

Let us start with the following sample data:

URL
website1.com/Alice
website2.com/Bob
website1.com/Chuck

Let us assume that you want to extract the name in the column URL. That is everything following the string /. Complete the following steps to achieve this.

  1. Open Data Preparation menu and click on Text Function
  2. Select Extract Text
  3. Choose the column URL as the column to extract text from.
  4. Select Characters around a LETTER/WORD for the What to Extract option.
  5. Select All characters right of letter/word.
  6. Enter ‘/’ in the input box for letter/word.
  7. Apply result into a new column called Name.
  8. Click APPLY.

The resulting data appears as shown below.

URL Name
website1.com/Alice Alice
website2.com/Bob Bob
website1.com/Chuck Chuck

Supported Options

The following options are supported by this task:

  • Extract Text From: The column to extract sub-string from.
  • What to Extract: This option allows you to choose from one of the following three options:
  1. Characters at a certain position
  2. Characters around a letter/word
  3. Characters from beginning or end
  • Parameters: Set of options that allow specification of other inputs as required by the choice of What to extract option. For more information, see:
  • Condition: See Conditions in Tasks
  • Apply result into: This option allows configuration of the destination of the results by this task. See result documentation.

Extraction Types

Mammoth supports the following sub-string extraction techniques. Each of these techniques can be used to achieve multiple sub-objectives and they are controlled using the parameters in the options that follow the What to Extract section.

Extracting characters from a certain position

By using this option, you can achieve one of the following objectives:

  1. Extract all characters to the left of a specific position.
  2. Extract all characters to the right of a specific position.
  3. Extract specific number of characters to the left of a specific position.
  4. Extract specific number of characters to the right of a specific position.

Extracting characters around a letter/word

By using this option, you can achieve one of the following objectives:

  1. Extract all characters to the left of a letter/word.
  2. Extract all characters to the right of a letter/word.
  3. Extract specific number of characters to the left of a letter/word.
  4. Extract specific number of characters to the right of a letter/word.

The letter/word can be included in the extracted string by using the left of and including/right of and including option.

Extracting characters from beginning or end

By using this option, one of the following objectives can be achieved.

  1. Extract a specific number of characters from the beginning of the string.
  2. Extract a specific number of characters from the end of the string.

See also

Result Column

The result column documentation

Conditions in Tasks

Conditions in tasks