String Functions
Learn to manipulate text with UPPER, LOWER, CONCAT, SUBSTRING, and more.
Introduction to String Functions
Text data is everywhere in databases - names, addresses, descriptions, emails, and more. But raw text often needs manipulation:
- Convert to uppercase or lowercase for consistency
- Extract parts of strings (first 3 characters, domain from email)
- Combine multiple columns (first name + last name)
- Clean up whitespace and formatting
- Search for patterns within text
String functions in SQL let you transform and manipulate text data directly in your queries!
Built-in SQL functions that operate on text (string) values. They can change case, extract substrings, concatenate strings, find patterns, measure length, and more - all without writing complex application code.
Why use string functions?
- Format output for reports (proper case names, clean addresses)
- Data cleaning (trim whitespace, standardize formats)
- Search and filter (find emails ending in @company.com)
- Extract information (get first initial, domain name, area code)
- Build dynamic content (full names, formatted addresses)
Let's work with a sample customers table:
id | first_name | last_name | email | phone | city |
|---|---|---|---|---|---|
| 1 | alice | smith | alice@gmail.com | 555-123-4567 | new york |
| 2 | BOB | JOHNSON | bob.j@company.com | 555-987-6543 | los angeles |
| 3 | Carol | White | carol.white@yahoo.com | 555-246-8135 | chicago |
UPPER() and LOWER() - Changing Case
Convert strings to uppercase or lowercase for consistency:
-- Convert to uppercase
SELECT
id,
UPPER(first_name) AS first_name_upper,
UPPER(last_name) AS last_name_upper,
UPPER(city) AS city_upper
FROM customers;-- Convert to lowercase
SELECT
id,
LOWER(first_name) AS first_name_lower,
LOWER(email) AS email_lower
FROM customers;Use cases:
- Case-insensitive searches:
WHERE LOWER(email) = 'alice@gmail.com' - Standardize data display
- Compare strings regardless of case
CONCAT() - Combining Strings
Join multiple strings together into one:
-- Create full names
SELECT
id,
first_name,
last_name,
CONCAT(first_name, ' ', last_name) AS full_name
FROM customers;-- Create formatted display with proper case
SELECT
id,
CONCAT(UPPER(SUBSTRING(first_name, 1, 1)), LOWER(SUBSTRING(first_name, 2)), ' ',
UPPER(SUBSTRING(last_name, 1, 1)), LOWER(SUBSTRING(last_name, 2))) AS proper_name,
CONCAT(email, ' | ', phone) AS contact_info
FROM customers;Alternative syntax (MySQL/PostgreSQL):
-- Using || operator
SELECT first_name || ' ' || last_name AS full_name
FROM customers;
Use cases:
- Build full names from first/last names
- Create formatted addresses
- Combine multiple fields for display
LENGTH() - String Length
Get the number of characters in a string:
-- Find length of various fields
SELECT
first_name,
LENGTH(first_name) AS name_length,
email,
LENGTH(email) AS email_length
FROM customers;-- Find customers with short names (3 letters or less)
SELECT
id,
first_name,
LENGTH(first_name) AS name_length
FROM customers
WHERE LENGTH(first_name) <= 3;Use cases:
- Validate input lengths (password must be 8+ characters)
- Find unusually short/long values
- Calculate string sizes for storage analysis
SUBSTRING() - Extracting Parts of Strings
Extract a portion of a string starting at a position:
Syntax:
SUBSTRING(string, start_position, length)
-- or
SUBSTRING(string FROM start_position FOR length)
Note: Positions start at 1 (not 0)!
-- Extract first 3 characters of names
SELECT
first_name,
SUBSTRING(first_name, 1, 3) AS first_3_chars,
email,
SUBSTRING(email, 1, 5) AS email_prefix
FROM customers;-- Extract area code from phone numbers (characters 1-3)
SELECT
phone,
SUBSTRING(phone, 1, 3) AS area_code
FROM customers;Extract from the end: Use negative positions (in some databases) or calculate from length:
-- Last 4 characters
SUBSTRING(phone, LENGTH(phone) - 3, 4)
-- or in some databases:
SUBSTRING(phone, -4)
LEFT() and RIGHT() - Extract from Ends
Simpler alternatives for extracting from the beginning or end:
-- Get first and last 3 characters
SELECT
first_name,
LEFT(first_name, 3) AS first_3,
RIGHT(first_name, 3) AS last_3,
email,
RIGHT(email, 8) AS email_ending
FROM customers;-- Extract last 4 digits of phone
SELECT
phone,
RIGHT(phone, 4) AS last_4_digits
FROM customers;Use cases:
- Get initials:
LEFT(first_name, 1) - Extract file extensions:
RIGHT(filename, 4)→.pdf - Get last 4 of credit card:
RIGHT(card_number, 4)
TRIM(), LTRIM(), RTRIM() - Removing Whitespace
Remove unwanted spaces from strings:
id | product_name | category |
|---|---|---|
| 1 | Laptop Pro | Electronics |
| 2 | Desk Chair | Furniture |
| 3 | Mouse | Electronics |
-- Remove leading and trailing spaces
SELECT
id,
product_name,
CONCAT('[', product_name, ']') AS original,
CONCAT('[', TRIM(product_name), ']') AS trimmed,
LENGTH(product_name) AS original_length,
LENGTH(TRIM(product_name)) AS trimmed_length
FROM products;-- LTRIM removes leading spaces only
-- RTRIM removes trailing spaces only
SELECT
id,
CONCAT('[', LTRIM(product_name), ']') AS left_trimmed,
CONCAT('[', RTRIM(product_name), ']') AS right_trimmed,
CONCAT('[', TRIM(product_name), ']') AS both_trimmed
FROM products
WHERE id = 3;Use cases:
- Clean user input (remove accidental spaces)
- Prepare data for comparison
- Fix data import issues
REPLACE() - Find and Replace
Replace all occurrences of a substring with another:
Syntax: REPLACE(string, find, replace_with)
-- Replace characters in phone numbers
SELECT
phone,
REPLACE(phone, '-', '.') AS dots_format,
REPLACE(phone, '-', '') AS no_dashes,
REPLACE(phone, '555', '999') AS new_area_code
FROM customers;-- Clean up text
SELECT
city,
REPLACE(city, 'new york', 'New York') AS fixed_case,
REPLACE(REPLACE(city, 'los', 'Los'), 'angeles', 'Angeles') AS city_proper
FROM customers;Use cases:
- Format phone numbers
- Clean up data (remove special characters)
- Standardize values
- Mask sensitive data
LOCATE() / POSITION() - Finding Substrings
Find the position of a substring within a string:
MySQL: LOCATE(substring, string)
PostgreSQL/SQL Standard: POSITION(substring IN string)
Returns the position (starting at 1) or 0 if not found.
-- Find position of @ in emails
SELECT
email,
LOCATE('@', email) AS at_position,
SUBSTRING(email, 1, LOCATE('@', email) - 1) AS username,
SUBSTRING(email, LOCATE('@', email) + 1) AS domain
FROM customers;-- Find emails with specific domains
SELECT
id,
email
FROM customers
WHERE LOCATE('@gmail.com', email) > 0;Use cases:
- Parse email addresses
- Extract usernames
- Find patterns in text
- Conditional logic based on content
REVERSE() - Reverse Strings
Reverse the order of characters:
SELECT
first_name,
REVERSE(first_name) AS reversed,
city,
REVERSE(city) AS reversed_city
FROM customers;Use case:
- Palindrome detection
- Reverse sorting
- Data validation
REPEAT() - Repeat Strings
Repeat a string multiple times:
SELECT
first_name,
REPEAT('*', LENGTH(first_name)) AS masked,
REPEAT('-', 10) AS separator
FROM customers;Use cases:
- Create visual separators
- Mask sensitive data
- Generate repeated patterns
LIKE with Wildcards - Pattern Matching
While not a function, LIKE is essential for string searching:
Wildcards:
%- matches any number of characters (including zero)_- matches exactly one character
-- Find emails ending with .com
SELECT id, email
FROM customers
WHERE email LIKE '%.com';-- Find customers with 5-letter first names
SELECT id, first_name, LENGTH(first_name) AS length
FROM customers
WHERE first_name LIKE '_____';-- Find names starting with 'C' or 'c'
SELECT id, first_name
FROM customers
WHERE LOWER(first_name) LIKE 'c%';Practical Example: Email Analysis
Let's combine multiple string functions for a real-world task:
-- Analyze and format email information
SELECT
id,
email,
SUBSTRING(email, 1, LOCATE('@', email) - 1) AS username,
SUBSTRING(email, LOCATE('@', email) + 1) AS domain,
UPPER(LEFT(SUBSTRING(email, 1, LOCATE('@', email) - 1), 1)) AS first_initial,
CASE
WHEN LOCATE('@gmail.com', email) > 0 THEN 'Gmail'
WHEN LOCATE('@yahoo.com', email) > 0 THEN 'Yahoo'
WHEN LOCATE('@company.com', email) > 0 THEN 'Corporate'
ELSE 'Other'
END AS email_provider,
LENGTH(email) AS email_length
FROM customers;Practical Example: Data Cleaning
Clean and standardize messy data:
id | name | phone |
|---|---|---|
| 1 | alice SMITH | 5551234567 |
| 2 | Bob Johnson | (555) 987-6543 |
| 3 | CAROL white | 555.246.8135 |
-- Clean and standardize the data
SELECT
id,
name AS original_name,
CONCAT(
UPPER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', 1)), 1, 1)),
LOWER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', 1)), 2)),
' ',
UPPER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', -1)), 1, 1)),
LOWER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', -1)), 2))
) AS clean_name,
phone AS original_phone,
REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', '') AS clean_phone,
CONCAT(
SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 1, 3),
'-',
SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 4, 3),
'-',
SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 7, 4)
) AS formatted_phone
FROM raw_customers;Common Mistakes to Avoid
Don't make these mistakes:
-
Forgetting positions start at 1: ❌
- SQL string positions start at 1, not 0
SUBSTRING(text, 0, 3)won't work as expected- Use
SUBSTRING(text, 1, 3)
-
Not handling NULL values: ❌
- String functions on NULL return NULL
- Use
COALESCE(column, '')or check withIS NOT NULL
-
Case sensitivity confusion: ❌
LIKE 'alice'won't match 'Alice' in case-sensitive systems- Use
LOWER()orUPPER()for case-insensitive matching
-
Wrong CONCAT behavior with NULLs: ❌
- In some databases,
CONCAT('Hello', NULL)= NULL - Use
CONCAT_WS()orCOALESCE()
- In some databases,
-
Inefficient string operations in WHERE: ⚠️
WHERE SUBSTRING(email, 1, 5) = 'alice'can't use indexes- Better:
WHERE email LIKE 'alice%'
-
Not considering multi-byte characters: ⚠️
LENGTH()counts bytes, not characters for UTF-8- Use
CHAR_LENGTH()for character count
-
Assuming TRIM removes all whitespace: ❌
- TRIM only removes spaces from ends
- Use REPLACE to remove internal spaces
Database-Specific Variations
Function | MySQL | PostgreSQL | SQL Server | ||
|---|---|---|---|---|---|
| Concatenate | CONCAT() | \ | \ | or CONCAT() | + or CONCAT() |
| Find position | LOCATE() | POSITION() | CHARINDEX() | ||
| Substring | SUBSTRING() | SUBSTRING() | SUBSTRING() | ||
| Length | LENGTH() | LENGTH() | LEN() | ||
| Replace | REPLACE() | REPLACE() | REPLACE() | ||
| Character length | CHAR_LENGTH() | LENGTH() | LEN() |
Always check your database documentation for specific function names and syntax!
Key Takeaways
What you learned:
✅ UPPER() and LOWER() - change case
✅ CONCAT() - join strings together
✅ LENGTH() - count characters
✅ SUBSTRING(), LEFT(), RIGHT() - extract portions
✅ TRIM(), LTRIM(), RTRIM() - remove whitespace
✅ REPLACE() - find and replace text
✅ LOCATE() / POSITION() - find substring positions
✅ REVERSE() - reverse string order
✅ REPEAT() - repeat strings
✅ LIKE with % and _ - pattern matching
Real-world applications:
- Data cleaning and standardization
- Formatting output for reports
- Parsing structured strings (emails, phone numbers)
- Building dynamic content
- Searching and filtering text data
Practice Exercise: Using the customers table:
- Create a display name: "SMITH, Alice" (last name uppercase, comma, first name proper case)
- Extract the domain name from emails (everything after @)
- Format phone numbers as (555) 123-4567
- Find all emails from Gmail or Yahoo
- Create a masked email: show first 2 characters of username, then ****@domain.com
- Find customers whose names contain exactly 5 letters
- Replace all spaces in city names with underscores