Kekkei - Advancing Education For Everyone

Introduction to String Functions

Text data is everywhere in databases - names, addresses, descriptions, emails, and more. But raw text often needs manipulation:

Convert to uppercase or lowercase for consistency
Extract parts of strings (first 3 characters, domain from email)
Combine multiple columns (first name + last name)
Clean up whitespace and formatting
Search for patterns within text

String functions in SQL let you transform and manipulate text data directly in your queries!

Built-in SQL functions that operate on text (string) values. They can change case, extract substrings, concatenate strings, find patterns, measure length, and more - all without writing complex application code.

Why use string functions?

Format output for reports (proper case names, clean addresses)
Data cleaning (trim whitespace, standardize formats)
Search and filter (find emails ending in @company.com)
Extract information (get first initial, domain name, area code)
Build dynamic content (full names, formatted addresses)

Let's work with a sample customers table:

customers

id	first_name	last_name	email	phone	city
1	alice	smith	alice@gmail.com	555-123-4567	new york
2	BOB	JOHNSON	bob.j@company.com	555-987-6543	los angeles
3	Carol	White	carol.white@yahoo.com	555-246-8135	chicago

UPPER() and LOWER() - Changing Case

Convert strings to uppercase or lowercase for consistency:

sql

-- Convert to uppercase
SELECT
    id,
    UPPER(first_name) AS first_name_upper,
    UPPER(last_name) AS last_name_upper,
    UPPER(city) AS city_upper
FROM customers;

sql

-- Convert to lowercase
SELECT
    id,
    LOWER(first_name) AS first_name_lower,
    LOWER(email) AS email_lower
FROM customers;

Use cases:

Case-insensitive searches: WHERE LOWER(email) = 'alice@gmail.com'
Standardize data display
Compare strings regardless of case

CONCAT() - Combining Strings

Join multiple strings together into one:

sql

-- Create full names
SELECT
    id,
    first_name,
    last_name,
    CONCAT(first_name, ' ', last_name) AS full_name
FROM customers;

sql

-- Create formatted display with proper case
SELECT
    id,
    CONCAT(UPPER(SUBSTRING(first_name, 1, 1)), LOWER(SUBSTRING(first_name, 2)), ' ',
           UPPER(SUBSTRING(last_name, 1, 1)), LOWER(SUBSTRING(last_name, 2))) AS proper_name,
    CONCAT(email, ' | ', phone) AS contact_info
FROM customers;

Alternative syntax (MySQL/PostgreSQL):

-- Using || operator
SELECT first_name || ' ' || last_name AS full_name
FROM customers;

Use cases:

Build full names from first/last names
Create formatted addresses
Combine multiple fields for display

LENGTH() - String Length

Get the number of characters in a string:

sql

-- Find length of various fields
SELECT
    first_name,
    LENGTH(first_name) AS name_length,
    email,
    LENGTH(email) AS email_length
FROM customers;

sql

-- Find customers with short names (3 letters or less)
SELECT
    id,
    first_name,
    LENGTH(first_name) AS name_length
FROM customers
WHERE LENGTH(first_name) <= 3;

Use cases:

Validate input lengths (password must be 8+ characters)
Find unusually short/long values
Calculate string sizes for storage analysis

SUBSTRING() - Extracting Parts of Strings

Extract a portion of a string starting at a position:

Syntax:

SUBSTRING(string, start_position, length)
-- or
SUBSTRING(string FROM start_position FOR length)

Note: Positions start at 1 (not 0)!

sql

-- Extract first 3 characters of names
SELECT
    first_name,
    SUBSTRING(first_name, 1, 3) AS first_3_chars,
    email,
    SUBSTRING(email, 1, 5) AS email_prefix
FROM customers;

sql

-- Extract area code from phone numbers (characters 1-3)
SELECT
    phone,
    SUBSTRING(phone, 1, 3) AS area_code
FROM customers;

Extract from the end: Use negative positions (in some databases) or calculate from length:

-- Last 4 characters
SUBSTRING(phone, LENGTH(phone) - 3, 4)
-- or in some databases:
SUBSTRING(phone, -4)

LEFT() and RIGHT() - Extract from Ends

Simpler alternatives for extracting from the beginning or end:

sql

-- Get first and last 3 characters
SELECT
    first_name,
    LEFT(first_name, 3) AS first_3,
    RIGHT(first_name, 3) AS last_3,
    email,
    RIGHT(email, 8) AS email_ending
FROM customers;

sql

-- Extract last 4 digits of phone
SELECT
    phone,
    RIGHT(phone, 4) AS last_4_digits
FROM customers;

Use cases:

Get initials: LEFT(first_name, 1)
Extract file extensions: RIGHT(filename, 4) → .pdf
Get last 4 of credit card: RIGHT(card_number, 4)

TRIM(), LTRIM(), RTRIM() - Removing Whitespace

Remove unwanted spaces from strings:

products

id	product_name	category
1	Laptop Pro	Electronics
2	Desk Chair	Furniture
3	Mouse	Electronics

sql

-- Remove leading and trailing spaces
SELECT
    id,
    product_name,
    CONCAT('[', product_name, ']') AS original,
    CONCAT('[', TRIM(product_name), ']') AS trimmed,
    LENGTH(product_name) AS original_length,
    LENGTH(TRIM(product_name)) AS trimmed_length
FROM products;

sql

-- LTRIM removes leading spaces only
-- RTRIM removes trailing spaces only
SELECT
    id,
    CONCAT('[', LTRIM(product_name), ']') AS left_trimmed,
    CONCAT('[', RTRIM(product_name), ']') AS right_trimmed,
    CONCAT('[', TRIM(product_name), ']') AS both_trimmed
FROM products
WHERE id = 3;

Use cases:

Clean user input (remove accidental spaces)
Prepare data for comparison
Fix data import issues

REPLACE() - Find and Replace

Replace all occurrences of a substring with another:

Syntax: REPLACE(string, find, replace_with)

sql

-- Replace characters in phone numbers
SELECT
    phone,
    REPLACE(phone, '-', '.') AS dots_format,
    REPLACE(phone, '-', '') AS no_dashes,
    REPLACE(phone, '555', '999') AS new_area_code
FROM customers;

sql

-- Clean up text
SELECT
    city,
    REPLACE(city, 'new york', 'New York') AS fixed_case,
    REPLACE(REPLACE(city, 'los', 'Los'), 'angeles', 'Angeles') AS city_proper
FROM customers;

Use cases:

Format phone numbers
Clean up data (remove special characters)
Standardize values
Mask sensitive data

LOCATE() / POSITION() - Finding Substrings

Find the position of a substring within a string:

MySQL: LOCATE(substring, string) PostgreSQL/SQL Standard: POSITION(substring IN string)

Returns the position (starting at 1) or 0 if not found.

sql

-- Find position of @ in emails
SELECT
    email,
    LOCATE('@', email) AS at_position,
    SUBSTRING(email, 1, LOCATE('@', email) - 1) AS username,
    SUBSTRING(email, LOCATE('@', email) + 1) AS domain
FROM customers;

sql

-- Find emails with specific domains
SELECT
    id,
    email
FROM customers
WHERE LOCATE('@gmail.com', email) > 0;

Use cases:

Parse email addresses
Extract usernames
Find patterns in text
Conditional logic based on content

REVERSE() - Reverse Strings

Reverse the order of characters:

sql

SELECT
    first_name,
    REVERSE(first_name) AS reversed,
    city,
    REVERSE(city) AS reversed_city
FROM customers;

Use case:

Palindrome detection
Reverse sorting
Data validation

REPEAT() - Repeat Strings

Repeat a string multiple times:

sql

SELECT
    first_name,
    REPEAT('*', LENGTH(first_name)) AS masked,
    REPEAT('-', 10) AS separator
FROM customers;

Use cases:

Create visual separators
Mask sensitive data
Generate repeated patterns

LIKE with Wildcards - Pattern Matching

While not a function, LIKE is essential for string searching:

Wildcards:

% - matches any number of characters (including zero)
_ - matches exactly one character

sql

-- Find emails ending with .com
SELECT id, email
FROM customers
WHERE email LIKE '%.com';

sql

-- Find customers with 5-letter first names
SELECT id, first_name, LENGTH(first_name) AS length
FROM customers
WHERE first_name LIKE '_____';

sql

-- Find names starting with 'C' or 'c'
SELECT id, first_name
FROM customers
WHERE LOWER(first_name) LIKE 'c%';

Practical Example: Email Analysis

Let's combine multiple string functions for a real-world task:

sql

-- Analyze and format email information
SELECT
    id,
    email,
    SUBSTRING(email, 1, LOCATE('@', email) - 1) AS username,
    SUBSTRING(email, LOCATE('@', email) + 1) AS domain,
    UPPER(LEFT(SUBSTRING(email, 1, LOCATE('@', email) - 1), 1)) AS first_initial,
    CASE
        WHEN LOCATE('@gmail.com', email) > 0 THEN 'Gmail'
        WHEN LOCATE('@yahoo.com', email) > 0 THEN 'Yahoo'
        WHEN LOCATE('@company.com', email) > 0 THEN 'Corporate'
        ELSE 'Other'
    END AS email_provider,
    LENGTH(email) AS email_length
FROM customers;

Practical Example: Data Cleaning

Clean and standardize messy data:

raw_customers

id	name	phone
1	alice SMITH	5551234567
2	Bob Johnson	(555) 987-6543
3	CAROL white	555.246.8135

sql

-- Clean and standardize the data
SELECT
    id,
    name AS original_name,
    CONCAT(
        UPPER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', 1)), 1, 1)),
        LOWER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', 1)), 2)),
        ' ',
        UPPER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', -1)), 1, 1)),
        LOWER(SUBSTRING(TRIM(SUBSTRING_INDEX(name, ' ', -1)), 2))
    ) AS clean_name,
    phone AS original_phone,
    REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', '') AS clean_phone,
    CONCAT(
        SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 1, 3),
        '-',
        SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 4, 3),
        '-',
        SUBSTRING(REPLACE(REPLACE(REPLACE(phone, '(', ''), ')', ''), ' ', ''), 7, 4)
    ) AS formatted_phone
FROM raw_customers;

Common Mistakes to Avoid

Don't make these mistakes:

Forgetting positions start at 1: ❌
- SQL string positions start at 1, not 0
- SUBSTRING(text, 0, 3) won't work as expected
- Use SUBSTRING(text, 1, 3)
Not handling NULL values: ❌
- String functions on NULL return NULL
- Use COALESCE(column, '') or check with IS NOT NULL
Case sensitivity confusion: ❌
- LIKE 'alice' won't match 'Alice' in case-sensitive systems
- Use LOWER() or UPPER() for case-insensitive matching
Wrong CONCAT behavior with NULLs: ❌
- In some databases, CONCAT('Hello', NULL) = NULL
- Use CONCAT_WS() or COALESCE()
Inefficient string operations in WHERE: ⚠️
- WHERE SUBSTRING(email, 1, 5) = 'alice' can't use indexes
- Better: WHERE email LIKE 'alice%'
Not considering multi-byte characters: ⚠️
- LENGTH() counts bytes, not characters for UTF-8
- Use CHAR_LENGTH() for character count
Assuming TRIM removes all whitespace: ❌
- TRIM only removes spaces from ends
- Use REPLACE to remove internal spaces

Database-Specific Variations

String Function Differences

Function	MySQL	PostgreSQL	SQL Server
Concatenate	CONCAT()	\	\	or CONCAT()	+ or CONCAT()
Find position	LOCATE()	POSITION()	CHARINDEX()
Substring	SUBSTRING()	SUBSTRING()	SUBSTRING()
Length	LENGTH()	LENGTH()	LEN()
Replace	REPLACE()	REPLACE()	REPLACE()
Character length	CHAR_LENGTH()	LENGTH()	LEN()

Always check your database documentation for specific function names and syntax!

Key Takeaways

What you learned:

✅ UPPER() and LOWER() - change case ✅ CONCAT() - join strings together ✅ LENGTH() - count characters ✅ SUBSTRING(), LEFT(), RIGHT() - extract portions ✅ TRIM(), LTRIM(), RTRIM() - remove whitespace ✅ REPLACE() - find and replace text ✅ LOCATE() / POSITION() - find substring positions ✅ REVERSE() - reverse string order ✅ REPEAT() - repeat strings ✅ LIKE with % and _ - pattern matching

Real-world applications:

Data cleaning and standardization
Formatting output for reports
Parsing structured strings (emails, phone numbers)
Building dynamic content
Searching and filtering text data

Practice Exercise: Using the customers table:

Create a display name: "SMITH, Alice" (last name uppercase, comma, first name proper case)
Extract the domain name from emails (everything after @)
Format phone numbers as (555) 123-4567
Find all emails from Gmail or Yahoo
Create a masked email: show first 2 characters of username, then ****@domain.com
Find customers whose names contain exactly 5 letters
Replace all spaces in city names with underscores

Introduction to String FunctionsFocusStart Focus Mode

UPPER() and LOWER() - Changing CaseFocusStart Focus Mode

CONCAT() - Combining StringsFocusStart Focus Mode

LENGTH() - String LengthFocusStart Focus Mode

SUBSTRING() - Extracting Parts of StringsFocusStart Focus Mode

LEFT() and RIGHT() - Extract from EndsFocusStart Focus Mode

TRIM(), LTRIM(), RTRIM() - Removing WhitespaceFocusStart Focus Mode

REPLACE() - Find and ReplaceFocusStart Focus Mode

LOCATE() / POSITION() - Finding SubstringsFocusStart Focus Mode

REVERSE() - Reverse StringsFocusStart Focus Mode

REPEAT() - Repeat StringsFocusStart Focus Mode

LIKE with Wildcards - Pattern MatchingFocusStart Focus Mode

Practical Example: Email AnalysisFocusStart Focus Mode

Practical Example: Data CleaningFocusStart Focus Mode

Common Mistakes to AvoidFocusStart Focus Mode

Database-Specific VariationsFocusStart Focus Mode

Key TakeawaysFocusStart Focus Mode

Introduction to String Functions

UPPER() and LOWER() - Changing Case

CONCAT() - Combining Strings

LENGTH() - String Length

SUBSTRING() - Extracting Parts of Strings

LEFT() and RIGHT() - Extract from Ends

TRIM(), LTRIM(), RTRIM() - Removing Whitespace

REPLACE() - Find and Replace

LOCATE() / POSITION() - Finding Substrings

REVERSE() - Reverse Strings

REPEAT() - Repeat Strings

LIKE with Wildcards - Pattern Matching

Practical Example: Email Analysis

Practical Example: Data Cleaning

Common Mistakes to Avoid

Database-Specific Variations

Key Takeaways