String Functions

Learn to manipulate text with UPPER, LOWER, CONCAT, SUBSTRING, and more.

20 min read
Beginner

Introduction to String Functions

Text data is everywhere in databases - names, addresses, descriptions, emails, and more. But raw text often needs manipulation:

  • Convert to uppercase or lowercase for consistency
  • Extract parts of strings (first 3 characters, domain from email)
  • Combine multiple columns (first name + last name)
  • Clean up whitespace and formatting
  • Search for patterns within text

String functions in SQL let you transform and manipulate text data directly in your queries!

Built-in SQL functions that operate on text (string) values. They can change case, extract substrings, concatenate strings, find patterns, measure length, and more - all without writing complex application code.

Why use string functions?

  • Format output for reports (proper case names, clean addresses)
  • Data cleaning (trim whitespace, standardize formats)
  • Search and filter (find emails ending in @company.com)
  • Extract information (get first initial, domain name, area code)
  • Build dynamic content (full names, formatted addresses)

Let's work with a sample customers table:

customers
id
first_name
last_name
email
phone
city
1alicesmithalice@gmail.com555-123-4567new york
2BOBJOHNSONbob.j@company.com555-987-6543los angeles
3CarolWhitecarol.white@yahoo.com555-246-8135chicago

UPPER() and LOWER() - Changing Case

Convert strings to uppercase or lowercase for consistency:

sql
-- Convert to uppercase
SELECT
    id,
    UPPER(first_name) AS first_name_upper,
    UPPER(last_name) AS last_name_upper,
    UPPER(city) AS city_upper
FROM customers;
sql
-- Convert to lowercase
SELECT
    id,
    LOWER(first_name) AS first_name_lower,
    LOWER(email) AS email_lower
FROM customers;

Use cases:

  • Case-insensitive searches: WHERE LOWER(email) = 'alice@gmail.com'
  • Standardize data display
  • Compare strings regardless of case

CONCAT() - Combining Strings

Join multiple strings together into one:

sql
-- Create full names
SELECT
    id,
    first_name,
    last_name,
    CONCAT(first_name, ' ', last_name) AS full_name
FROM customers;
sql
-- Create formatted display with proper case
SELECT
    id,
    CONCAT(UPPER(SUBSTRING(first_name, 1, 1)), LOWER(SUBSTRING(first_name, 2)), ' ',
           UPPER(SUBSTRING(last_name, 1, 1)), LOWER(SUBSTRING(last_name, 2))) AS proper_name,
    CONCAT(email, ' | ', phone) AS contact_info
FROM customers;

Alternative syntax (MySQL/PostgreSQL):

-- Using || operator
SELECT first_name || ' ' || last_name AS full_name
FROM customers;

Use cases:

  • Build full names from first/last names
  • Create formatted addresses
  • Combine multiple fields for display

LENGTH() - String Length

Get the number of characters in a string:

sql
-- Find length of various fields
SELECT
    first_name,
    LENGTH(first_name) AS name_length,
    email,
    LENGTH(email) AS email_length
FROM customers;
sql
-- Find customers with short names (3 letters or less)
SELECT
    id,
    first_name,
    LENGTH(first_name) AS name_length
FROM customers
WHERE LENGTH(first_name) <= 3;

Use cases:

  • Validate input lengths (password must be 8+ characters)
  • Find unusually short/long values
  • Calculate string sizes for storage analysis

SUBSTRING() - Extracting Parts of Strings

Extract a portion of a string starting at a position:

Syntax:

SUBSTRING(string, start_position, length)
-- or
SUBSTRING(string FROM start_position FOR length)

Note: Positions start at 1 (not 0)!

sql
-- Extract first 3 characters of names
SELECT
    first_name,
    SUBSTRING(first_name, 1, 3) AS first_3_chars,
    email,
    SUBSTRING(email, 1, 5) AS email_prefix
FROM customers;
sql
-- Extract area code from phone numbers (characters 1-3)
SELECT
    phone,
    SUBSTRING(phone, 1, 3) AS area_code
FROM customers;

Extract from the end: Use negative positions (in some databases) or calculate from length:

-- Last 4 characters
SUBSTRING(phone, LENGTH(phone) - 3, 4)
-- or in some databases:
SUBSTRING(phone, -4)

LEFT() and RIGHT() - Extract from Ends

Simpler alternatives for extracting from the beginning or end:

sql
-- Get first and last 3 characters
SELECT
    first_name,
    LEFT(first_name, 3) AS first_3,
    RIGHT(first_name, 3) AS last_3,
    email,
    RIGHT(email, 8) AS email_ending
FROM customers;
sql
-- Extract last 4 digits of phone
SELECT
    phone,
    RIGHT(phone, 4) AS last_4_digits
FROM customers;

Use cases:

  • Get initials: LEFT(first_name, 1)
  • Extract file extensions: RIGHT(filename, 4).pdf
  • Get last 4 of credit card: RIGHT(card_number, 4)

TRIM(), LTRIM(), RTRIM() - Removing Whitespace

Remove unwanted spaces from strings:

products
id
product_name
category
1Laptop ProElectronics
2Desk ChairFurniture
3MouseElectronics
sql
-- Remove leading and trailing spaces
SELECT
    id,
    product_name,
    CONCAT('[', product_name, ']') AS original,
    CONCAT('[', TRIM(product_name), ']') AS trimmed,
    LENGTH(product_name) AS original_length,
    LENGTH(TRIM(product_name)) AS trimmed_length
FROM products;
sql
-- LTRIM removes leading spaces only
-- RTRIM removes trailing spaces only
SELECT
    id,
    CONCAT('[', LTRIM(product_name), ']') AS left_trimmed,
    CONCAT('[', RTRIM(product_name), ']') AS right_trimmed,
    CONCAT('[', TRIM(product_name), ']') AS both_trimmed
FROM products
WHERE id = 3;

Use cases:

  • Clean user input (remove accidental spaces)
  • Prepare data for comparison
  • Fix data import issues

REPLACE() - Find and Replace

Replace all occurrences of a substring with another:

Syntax: REPLACE(string, find, replace_with)

sql
-- Replace characters in phone numbers
SELECT
    phone,
    REPLACE(phone, '-', '.') AS dots_format,
    REPLACE(phone, '-', '') AS no_dashes,
    REPLACE(phone, '555', '999') AS new_area_code
FROM customers;
sql
-- Clean up text
SELECT
    city,
    REPLACE(city, 'new york', 'New York') AS fixed_case,
    REPLACE(REPLACE(city, 'los', 'Los'), 'angeles', 'Angeles') AS city_proper
FROM customers;

Use cases:

  • Format phone numbers
  • Clean up data (remove special characters)
  • Standardize values
  • Mask sensitive data

LOCATE() / POSITION() - Finding Substrings

Find the position of a substring within a string:

MySQL: LOCATE(substring, string) PostgreSQL/SQL Standard: POSITION(substring IN string)

Returns the position (starting at 1) or 0 if not found.

sql
-- Find position of @ in emails
SELECT
    email,
    LOCATE('@', email) AS at_position,
    SUBSTRING(email, 1, LOCATE('@', email) - 1) AS username,
    SUBSTRING(email, LOCATE('@', email) + 1) AS domain
FROM customers;
sql
-- Find emails with specific domains
SELECT
    id,
    email
FROM customers
WHERE LOCATE('@gmail.com', email) > 0;

Use cases:

  • Parse email addresses
  • Extract usernames
  • Find patterns in text
  • Conditional logic based on content

REVERSE() - Reverse Strings

Reverse the order of characters:

sql
SELECT
    first_name,
    REVERSE(first_name) AS reversed,
    city,
    REVERSE(city) AS reversed_city
FROM customers;

Use case:

  • Palindrome detection
  • Reverse sorting
  • Data validation

REPEAT() - Repeat Strings

Repeat a string multiple times:

sql
SELECT
    first_name,
    REPEAT('*', LENGTH(first_name)) AS masked,
    REPEAT('-', 10) AS separator
FROM customers;

Use cases:

  • Create visual separators
  • Mask sensitive data
  • Generate repeated patterns

LIKE with Wildcards - Pattern Matching

While not a function, LIKE is essential for string searching:

Wildcards:

  • % - matches any number of characters (including zero)
  • _ - matches exactly one character
sql
-- Find emails ending with .com
SELECT id, email
FROM customers
WHERE email LIKE '%.com';
sql
-- Find customers with 5-letter first names
SELECT id, first_name, LENGTH(first_name) AS length
FROM customers
WHERE first_name LIKE '_____';
sql
-- Find names starting with 'C' or 'c'
SELECT id, first_name
FROM customers
WHERE LOWER(first_name) LIKE 'c%';

Practical Example: Email Analysis

Let's combine multiple string functions for a real-world task:

sql
-- Analyze and format email information
SELECT
    id,
    email,
    SUBSTRING(email, 1, LOCATE('@', email) - 1) AS username,
    SUBSTRING(email, LOCATE('@', email) + 1) AS domain,
    UPPER(LEFT(SUBSTRING(email, 1, LOCATE('@', email) - 1), 1)) AS first_initial,
    CASE
        WHEN LOCATE('@gmail.com', email) > 0 THEN 'Gmail'
        WHEN LOCATE('@yahoo.com', email) > 0 THEN 'Yahoo'
        WHEN LOCATE('@company.com', email) > 0 THEN 'Corporate'
        ELSE 'Other'
    END AS email_provider,
    LENGTH(email) AS email_length
FROM customers;

Practical Example: Data Cleaning

Clean and standardize messy data:

raw_customers
id
name
phone
1alice SMITH5551234567
2Bob Johnson(555) 987-6543
3CAROL white555.246.8135
sql
-- Clean and standardize the data
SELECT
    id,
    name AS original_name,
    CONCAT(
        UPPER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', 1)), 1, 1)),
        LOWER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', 1)), 2)),
        ' ',
        UPPER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', -1)), 1, 1)),
        LOWER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', -1)), 2))
    ) AS clean_name,
    phone AS original_phone,
    REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', '') AS clean_phone,
    CONCAT(
        SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 1, 3),
        '-',
        SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 4, 3),
        '-',
        SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 7, 4)
    ) AS formatted_phone
FROM raw_customers;

Common Mistakes to Avoid

Don't make these mistakes:

  1. Forgetting positions start at 1: ❌

    • SQL string positions start at 1, not 0
    • SUBSTRING(text, 0, 3) won't work as expected
    • Use SUBSTRING(text, 1, 3)
  2. Not handling NULL values: ❌

    • String functions on NULL return NULL
    • Use COALESCE(column, '') or check with IS NOT NULL
  3. Case sensitivity confusion: ❌

    • LIKE 'alice' won't match 'Alice' in case-sensitive systems
    • Use LOWER() or UPPER() for case-insensitive matching
  4. Wrong CONCAT behavior with NULLs: ❌

    • In some databases, CONCAT('Hello', NULL) = NULL
    • Use CONCAT_WS() or COALESCE()
  5. Inefficient string operations in WHERE: ⚠️

    • WHERE SUBSTRING(email, 1, 5) = 'alice' can't use indexes
    • Better: WHERE email LIKE 'alice%'
  6. Not considering multi-byte characters: ⚠️

    • LENGTH() counts bytes, not characters for UTF-8
    • Use CHAR_LENGTH() for character count
  7. Assuming TRIM removes all whitespace: ❌

    • TRIM only removes spaces from ends
    • Use REPLACE to remove internal spaces

Database-Specific Variations

String Function Differences
Function
MySQL
PostgreSQL
SQL Server
ConcatenateCONCAT()\\or CONCAT()+ or CONCAT()
Find positionLOCATE()POSITION()CHARINDEX()
SubstringSUBSTRING()SUBSTRING()SUBSTRING()
LengthLENGTH()LENGTH()LEN()
ReplaceREPLACE()REPLACE()REPLACE()
Character lengthCHAR_LENGTH()LENGTH()LEN()

Always check your database documentation for specific function names and syntax!

Key Takeaways

What you learned:

UPPER() and LOWER() - change case ✅ CONCAT() - join strings together ✅ LENGTH() - count characters ✅ SUBSTRING(), LEFT(), RIGHT() - extract portions ✅ TRIM(), LTRIM(), RTRIM() - remove whitespace ✅ REPLACE() - find and replace text ✅ LOCATE() / POSITION() - find substring positions ✅ REVERSE() - reverse string order ✅ REPEAT() - repeat strings ✅ LIKE with % and _ - pattern matching

Real-world applications:

  • Data cleaning and standardization
  • Formatting output for reports
  • Parsing structured strings (emails, phone numbers)
  • Building dynamic content
  • Searching and filtering text data

Practice Exercise: Using the customers table:

  1. Create a display name: "SMITH, Alice" (last name uppercase, comma, first name proper case)
  2. Extract the domain name from emails (everything after @)
  3. Format phone numbers as (555) 123-4567
  4. Find all emails from Gmail or Yahoo
  5. Create a masked email: show first 2 characters of username, then ****@domain.com
  6. Find customers whose names contain exactly 5 letters
  7. Replace all spaces in city names with underscores