Extract Text¶
The Extract Text task allows for partial extraction of values from a text column. In other words, you can extract sub-strings from text values.
Table of Contents
Quick Start¶
Let us start with the following sample data:
URL |
---|
website1.com/Alice |
website2.com/Bob |
website1.com/Chuck |
Let us assume that you want to extract the name in the column URL. That is everything following the string /. Complete the following steps to achieve this.
- Open Data Preparation menu and click on Text Function
- Select Extract Text
- Choose the column URL as the column to extract text from.
- Select Characters around a LETTER/WORD for the What to Extract option.
- Select All characters right of letter/word.
- Enter ‘/’ in the input box for letter/word.
- Apply result into a new column called Name.
- Click APPLY.
The resulting data appears as shown below.
URL | Name |
---|---|
website1.com/Alice | Alice |
website2.com/Bob | Bob |
website1.com/Chuck | Chuck |
Supported Options¶
The following options are supported by this task:
- Extract Text From: The column to extract sub-string from.
- What to Extract: This option allows you to choose from one of the following three options:
- Characters at a certain position
- Characters around a letter/word
- Characters from beginning or end
- Parameters: Set of options that allow specification of other inputs as required by the choice of What to extract option. For more information, see:
- Condition: See Conditions in Tasks
- Apply result into: This option allows configuration of the destination of the results by this task. See result documentation.
Extraction Types¶
Mammoth supports the following sub-string extraction techniques. Each of these techniques can be used to achieve multiple sub-objectives and they are controlled using the parameters in the options that follow the What to Extract section.
Extracting characters from a certain position¶
By using this option, you can achieve one of the following objectives:
- Extract all characters to the left of a specific position.
- Extract all characters to the right of a specific position.
- Extract specific number of characters to the left of a specific position.
- Extract specific number of characters to the right of a specific position.
Extracting characters around a letter/word¶
By using this option, you can achieve one of the following objectives:
- Extract all characters to the left of a letter/word.
- Extract all characters to the right of a letter/word.
- Extract specific number of characters to the left of a letter/word.
- Extract specific number of characters to the right of a letter/word.
The letter/word can be included in the extracted string by using the left of and including/right of and including option.
Extracting characters from beginning or end¶
By using this option, one of the following objectives can be achieved.
- Extract a specific number of characters from the beginning of the string.
- Extract a specific number of characters from the end of the string.