String Manipulator Tool
Processes and cleans up string fields in work items by applying regex patterns, length limitations, and text transformations. Essential for data cleanup and standardization during migration.
Overview
The String Manipulator Tool provides powerful text processing capabilities for work item migration. It applies configurable string manipulations to all text fields in work items, enabling data cleanup, standardization, and format corrections during the migration process.
The tool processes string fields through a series of regex-based manipulators that can remove invalid characters, standardize formats, replace text patterns, and enforce field length limits. Each manipulation is applied in sequence and can be individually enabled or disabled.
topHow It Works
The String Manipulator Tool operates on all string fields within work items during migration:
- Field Processing: The tool identifies all string-type fields in each work item
- Sequential Application: Each configured manipulator is applied in the order defined in the configuration
- Regex Transformations: Pattern-based replacements using regular expressions
- Length Enforcement: Truncates fields that exceed the maximum allowed length
- Conditional Execution: Each manipulator can be individually enabled or disabled
The tool is automatically invoked by migration processors and applies transformations before work items are saved to the target system.
topUse Cases
Common scenarios where the String Manipulator Tool is essential:
- Data Cleanup: Removing invalid Unicode characters, control characters, or formatting artifacts
- Format Standardization: Converting text patterns to consistent formats
- Length Compliance: Ensuring field values don’t exceed target system limits
- Character Encoding: Fixing encoding issues from legacy systems
- Pattern Replacement: Updating URLs, paths, or references to match target environment
Configuration Structure
topOptions
topSample
{
"MigrationTools": {
"Version": "16.0",
"CommonTools": {
"StringManipulatorTool": {
"Enabled": "True",
"Manipulators": [
{
"$type": "RegexStringManipulator",
"Description": "Remove invalid characters from the end of the string",
"Enabled": "True",
"Pattern": "[^( -~)\n\r\t]+",
"Replacement": ""
}
],
"MaxStringLength": "1000000"
}
}
}
}
Defaults
{
"MigrationTools": {
"Version": "16.0",
"CommonTools": {
"StringManipulatorTool": {
"Enabled": "True",
"Manipulators": null,
"MaxStringLength": "1000000"
}
}
}
}
Basic Examples
The String Manipulator Tool is configured with an array of manipulators, each defining a specific text transformation:
{
"StringManipulatorTool": {
"Enabled": true,
"MaxStringLength": 1000000,
"Manipulators": [
{
"$type": "RegexStringManipulator",
"Enabled": true,
"Description": "Remove invalid characters",
"Pattern": "[^\\x20-\\x7E\\r\\n\\t]",
"Replacement": ""
}
]
}
}
Complex Examples
topManipulator Types
Currently, the tool supports the following manipulator types:
- RegexStringManipulator: Applies regular expression pattern matching and replacement
Manipulator Properties
Each manipulator supports these properties:
- $type: Specifies the manipulator type (e.g., “RegexStringManipulator”)
- Enabled: Boolean flag to enable/disable this specific manipulator
- Description: Human-readable description of what the manipulator does
- Pattern: Regular expression pattern to match text
- Replacement: Text to replace matched patterns (can be empty string for removal)
Common Scenarios
topRemoving Invalid Characters
Remove non-printable characters that may cause issues in the target system:
{
"$type": "RegexStringManipulator",
"Description": "Remove invalid characters from the end of the string",
"Enabled": true,
"Pattern": "[^( -~)\n\r\t]+",
"Replacement": ""
}
Standardizing Line Endings
Convert all line endings to a consistent format:
{
"$type": "RegexStringManipulator",
"Description": "Standardize line endings to CRLF",
"Enabled": true,
"Pattern": "\r\n|\n|\r",
"Replacement": "\r\n"
}
Cleaning HTML Content
Remove or clean HTML tags from text fields:
{
"$type": "RegexStringManipulator",
"Description": "Remove HTML tags",
"Enabled": true,
"Pattern": "<[^>]*>",
"Replacement": ""
}
Fixing Encoding Issues
Replace common encoding artifacts:
{
"$type": "RegexStringManipulator",
"Description": "Fix common encoding issues",
"Enabled": true,
"Pattern": "’|“|â€\u009d",
"Replacement": "'"
}
Good Practices
topPattern Testing
- Test regex patterns thoroughly before applying to production data
- Use regex testing tools to validate patterns against sample data
- Consider edge cases and unintended matches in your patterns
Performance Considerations
- Order manipulators efficiently: Place simpler patterns before complex ones
- Use specific patterns: Avoid overly broad regex that may match unintended content
- Consider field length: Set appropriate
MaxStringLength
to prevent excessive processing
Data Safety
- Backup source data: Always maintain backups before applying string manipulations
- Test with sample data: Validate manipulations on a subset before full migration
- Review results: Check processed fields to ensure transformations are correct
Configuration Management
- Document patterns: Include clear descriptions for each manipulator
- Version control: Maintain configuration files in version control
- Incremental changes: Test one manipulator at a time when developing complex transformations
Troubleshooting
topCommon Issues
Manipulations Not Applied:
- Verify the tool is enabled (
"Enabled": true
) - Check that individual manipulators are enabled
- Review regex patterns for syntax errors
- Ensure the tool is configured in the processor’s tool list
Unexpected Results:
- Test regex patterns in isolation with sample data
- Check the order of manipulators (they execute sequentially)
- Verify escape sequences in JSON configuration
- Review field content before and after processing
Performance Issues:
- Consider reducing
MaxStringLength
if processing very large fields - Optimize regex patterns to avoid catastrophic backtracking
- Disable unnecessary manipulators
- Process smaller batches of work items
Regex Pattern Errors:
- Validate regex syntax using online tools or testing utilities
- Escape special characters properly in JSON configuration
- Consider case sensitivity requirements
- Test patterns against various input scenarios
Schema
This is the JSON schema that defines the structure and validation rules for this configuration.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://devopsmigration.io/schema/schema.tools.stringmanipulatortool.json",
"title": "StringManipulatorTool",
"description": "Used to process the String fields of a work item. This is useful for cleaning up data. It will limit fields to a max length and apply regex replacements based on what is configured. Each regex replacement is applied in order and can be enabled or disabled.",
"type": "object",
"properties": {
"Enabled": {
"description": "If set to `true` then the tool will run. Set to `false` and the processor will not run.",
"type": "boolean",
"default": "true"
},
"Manipulators": {
"description": "List of regex based string manipulations to apply to all string fields. Each regex replacement is applied in order and can be enabled or disabled.",
"type": "array",
"default": "{}"
},
"MaxStringLength": {
"description": "Max number of chars in a string. Applied last, and set to 1000000 by default.",
"type": "integer",
"default": "1000000"
}
}
}
In this article
Project Information
Azure DevOps Marketplace
Maintainer
Created and maintained by Martin Hinshelwood of nkdagility.com
Getting Support
Community Support
The first place to look for usage, configuration, and general help.
Commercial Support
We provide training, ad-hoc support, and full service migrations through our professional services.
Azure DevOps Migration Services