Kekkei - Advancing Education For Everyone

Introduction to SQL in Applications

You've mastered SQL queries, but how do you use them safely and effectively in real applications? Writing SQL in Python, Node.js, Java, or any programming language introduces new challenges: security vulnerabilities, performance issues, and maintainability concerns.

This lesson covers production-ready practices for integrating SQL into your applications.

Critical topics:

SQL Injection - The #1 web security vulnerability
Parameterized queries - How to prevent SQL injection
Connection pooling - Efficient database connections
ORMs vs Raw SQL - When to use each
Error handling - Dealing with database failures
Performance optimization - Making your app fast

Real-world scenario: Your e-commerce app lets users search products by name. A naive implementation might do:

# DANGEROUS! Never do this!
query = "SELECT * FROM products WHERE name LIKE '%" + user_input + "%'"

If a malicious user enters ' OR '1'='1, they can access all products. Worse, they could enter '; DROP TABLE products; -- and destroy your database!

This lesson shows you how to build secure, performant database-driven applications.

A security vulnerability where attackers inject malicious SQL code through user input, potentially accessing, modifying, or deleting data. The #1 threat to web applications.

SQL Integration Approaches

Approach	Security	Performance	Ease of Use	When to Use
Raw SQL strings	❌ Dangerous	Fast	Simple	Never!
Parameterized queries	✅ Safe	Fast	Moderate	Complex queries
ORMs	✅ Safe	Variable	Easy	CRUD operations
Query builders	✅ Safe	Fast	Moderate	Dynamic queries

SQL Injection - The Threat

SQL injection is when an attacker manipulates SQL queries by injecting malicious input.

Example: Vulnerable login system

sql

-- Python code (VULNERABLE!)
username = request.form['username']
password = request.form['password']

query = f"SELECT * FROM users WHERE username = '{username}' AND password = '{password}'"
cursor.execute(query)

Normal user enters:

Username: alice
Password: mypassword123
Query: SELECT * FROM users WHERE username = 'alice' AND password = 'mypassword123'
Result: Returns Alice's account (if password matches)

Attacker enters:

Username: admin' --
Password: (anything)
Query: SELECT * FROM users WHERE username = 'admin' -- ' AND password = 'anything'
Result: Returns admin account! The -- comments out the password check.

Even worse - data destruction:

Username: '; DROP TABLE users; --
Query: SELECT * FROM users WHERE username = ''; DROP TABLE users; -- ' AND ...
Result: Deletes the entire users table!

Critical: NEVER concatenate user input into SQL queries! Always use parameterized queries or prepared statements.

Other SQL injection vectors:

sql

-- Search vulnerability
search = "'; SELECT password FROM users WHERE '1'='1"
query = f"SELECT * FROM products WHERE name LIKE '%{search}%'"
-- Exposes password data!

-- ORDER BY injection
sort = "name; DROP TABLE products"
query = f"SELECT * FROM products ORDER BY {sort}"
-- Destroys table!

-- UNION attack
id = "1 UNION SELECT username, password, NULL FROM users"
query = f"SELECT id, name, price FROM products WHERE id = {id}"
-- Leaks user credentials!

Parameterized Queries - The Solution

Parameterized queries (also called prepared statements) separate SQL code from data. The database treats user input as data, never as code.

Python (with psycopg2 for PostgreSQL):

sql

# SAFE - Parameterized query
username = request.form['username']
password = request.form['password']

query = "SELECT * FROM users WHERE username = %s AND password = %s"
cursor.execute(query, (username, password))

# Even if attacker enters: admin' --
# Database treats it as literal string "admin' --"
# Query becomes: WHERE username = 'admin'' --' AND password = '...'
# No injection possible!

Node.js (with mysql2):

sql

// SAFE - Parameterized query
const username = req.body.username;
const password = req.body.password;

const query = 'SELECT * FROM users WHERE username = ? AND password = ?';
connection.execute(query, [username, password], (err, results) => {
  // Safe from SQL injection
});

Java (JDBC):

sql

// SAFE - Prepared statement
String query = "SELECT * FROM users WHERE username = ? AND password = ?";
PreparedStatement stmt = connection.prepareStatement(query);
stmt.setString(1, username);
stmt.setString(2, password);
ResultSet rs = stmt.executeQuery();

PHP (PDO):

sql

// SAFE - Prepared statement
$query = "SELECT * FROM users WHERE username = :username AND password = :password";
$stmt = $pdo->prepare($query);
$stmt->execute(['username' => $username, 'password' => $password]);

Golden rule: Use ? or :param placeholders for ALL user input. Never concatenate strings!

When Parameterization Doesn't Work

You cannot parameterize:

Table names
Column names
SQL keywords (ORDER BY, ASC/DESC)

Problem: Dynamic column sorting

sql

// User selects sort column from dropdown
sort_column = request.form['sort']  // Could be: "name", "price", "created_at"

// This doesn't work!
query = "SELECT * FROM products ORDER BY ?"
cursor.execute(query, (sort_column,))
// Database treats ? as a string literal, not a column name

Solution: Whitelist validation

sql

// SAFE - Validate against whitelist
allowed_columns = ['name', 'price', 'created_at']
sort_column = request.form['sort']

if sort_column not in allowed_columns:
    sort_column = 'name'  // Default

query = f"SELECT * FROM products ORDER BY {sort_column}"
cursor.execute(query)
// Safe because we validated the column name

Another approach: Map user input to safe values

sql

// User-facing names mapped to safe SQL
sort_mapping = {
    'product_name': 'name',
    'product_price': 'price',
    'date_created': 'created_at'
}

user_sort = request.form['sort']
safe_column = sort_mapping.get(user_sort, 'name')

query = f"SELECT * FROM products ORDER BY {safe_column}"
cursor.execute(query)

Rule: If you must build dynamic SQL, validate inputs against a strict whitelist. Never trust user input!

Connection Pooling

Problem: Opening a new database connection for every query is slow (100-200ms per connection).

Solution: Connection pooling reuses existing connections.

A cache of database connections that can be reused, avoiding the overhead of establishing new connections for every request. Essential for production applications.

Without pooling (slow):

sql

// Every request opens a new connection
app.get('/products', (req, res) => {
  const connection = mysql.createConnection({...});  // Slow!
  connection.connect();
  connection.query('SELECT * FROM products', (err, results) => {
    connection.end();  // Close connection
    res.json(results);
  });
});

// 100 requests = 100 connection opens = 10-20 seconds wasted!

With pooling (fast):

sql

// Create pool once at startup
const pool = mysql.createPool({
  host: 'localhost',
  user: 'app_user',
  password: 'password',
  database: 'mydb',
  connectionLimit: 10  // Max 10 concurrent connections
});

// Each request reuses a connection from pool
app.get('/products', (req, res) => {
  pool.query('SELECT * FROM products', (err, results) => {
    // Connection automatically returned to pool
    res.json(results);
  });
});

// 100 requests = reuse 10 connections = milliseconds!

Python with SQLAlchemy:

sql

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

# Create engine with connection pool
engine = create_engine(
    'postgresql://user:pass@localhost/db',
    poolclass=QueuePool,
    pool_size=10,        # Normal connections
    max_overflow=20,     // Extra connections under load
    pool_timeout=30,     // Wait 30s for connection
    pool_recycle=3600    // Recycle connections after 1 hour
)

# Each query gets connection from pool
with engine.connect() as conn:
    result = conn.execute("SELECT * FROM products")
    # Connection returned to pool automatically

Best practices:

Pool size: 5-20 connections for most apps (don't over-provision!)
Timeouts: Set reasonable timeouts (10-30 seconds)
Recycling: Close old connections to avoid stale connections
Monitoring: Track pool usage (active connections, wait times)

Sizing your pool: Start with 10 connections. Monitor under load. Increase if you see "waiting for connection" errors. Too many connections can overload the database!

ORMs vs Raw SQL

ORM (Object-Relational Mapping) libraries map database tables to objects in your code.

Popular ORMs:

Python: SQLAlchemy, Django ORM
Node.js: Sequelize, TypeORM, Prisma
Java: Hibernate
Ruby: ActiveRecord
C#: Entity Framework

ORMs vs Raw SQL

Feature	ORM	Raw SQL
Security	Built-in protection	Manual (parameterization required)
Ease of use	Simple CRUD	More code
Performance	Can be slow	Optimized
Flexibility	Limited	Full control
Complex queries	Difficult	Easy
Database portability	High	Low

ORM example (SQLAlchemy):

sql

# Define model
class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True)
    name = Column(String(100))
    price = Column(Numeric(10, 2))

# Query using ORM (simple and safe)
products = session.query(Product).filter(
    Product.price > 50
).order_by(Product.name).all()

# Generated SQL:
# SELECT * FROM products WHERE price > 50 ORDER BY name

Raw SQL equivalent:

sql

# More verbose but full control
query = "SELECT * FROM products WHERE price > %s ORDER BY name"
cursor.execute(query, (50,))
products = cursor.fetchall()

When to use ORMs:

✅ Simple CRUD operations (Create, Read, Update, Delete)
✅ Prototyping and rapid development
✅ Database portability (switch MySQL → PostgreSQL easily)
✅ Type safety and IDE autocomplete

When to use Raw SQL:

✅ Complex joins and subqueries
✅ Performance-critical queries
✅ Bulk operations (INSERT 10,000 rows)
✅ Database-specific features (PostgreSQL arrays, MySQL full-text search)
✅ Analytics and reporting

Hybrid approach (best):

sql

# Use ORM for simple operations
user = User.query.filter_by(email=email).first()
user.last_login = datetime.now()
session.commit()

# Use raw SQL for complex analytics
report_query = """
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        COUNT(*) AS orders,
        SUM(total) AS revenue
    FROM orders
    WHERE order_date >= %s
    GROUP BY month
    ORDER BY month DESC
"""
results = session.execute(report_query, (start_date,))

Best practice: Use ORMs for 80% of operations, raw SQL for complex queries and performance-critical code. Don't fight the ORM!

Error Handling

Always handle database errors gracefully:

sql

// Python with proper error handling
try:
    cursor.execute(query, params)
    connection.commit()
    return {"status": "success"}

except psycopg2.IntegrityError as e:
    connection.rollback()
    if 'duplicate key' in str(e):
        return {"error": "Email already exists"}
    return {"error": "Data integrity error"}

except psycopg2.OperationalError as e:
    connection.rollback()
    logger.error(f"Database connection error: {e}")
    return {"error": "Database unavailable, please try again"}

except Exception as e:
    connection.rollback()
    logger.error(f"Unexpected error: {e}")
    return {"error": "Internal server error"}

finally:
    cursor.close()  // Always close cursor

Common database errors:

IntegrityError: Violates constraint (duplicate key, foreign key)
OperationalError: Connection failed, database down
ProgrammingError: SQL syntax error
DataError: Invalid data type or value too large

Best practices:

✅ Always use try-except blocks
✅ Rollback on errors
✅ Log errors for debugging
✅ Return user-friendly messages (don't expose SQL errors!)
✅ Close connections in finally block

Security: Never expose raw SQL error messages to users! They reveal database structure and can aid attackers. Log detailed errors server-side, show generic messages to users.

Query Optimization for Applications

1. Select only needed columns:

sql

// Bad: Fetches all columns (including large text fields)
SELECT * FROM articles

// Good: Only fetch needed columns
SELECT id, title, author, published_date FROM articles

2. Use LIMIT for pagination:

sql

// Bad: Fetch all products, paginate in application
SELECT * FROM products ORDER BY created_at DESC

// Good: Paginate in database
SELECT * FROM products ORDER BY created_at DESC LIMIT 20 OFFSET 0

3. Index frequently queried columns:

sql

-- If you frequently query by email
CREATE INDEX idx_users_email ON users(email);

-- If you frequently filter by status and sort by created_at
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

4. Use bulk inserts for multiple rows:

sql

// Bad: Insert one row at a time
for product in products:
    cursor.execute("INSERT INTO products VALUES (%s, %s)", (product.name, product.price))
    // 1000 products = 1000 round trips!

// Good: Bulk insert
values = [(p.name, p.price) for p in products]
cursor.executemany("INSERT INTO products (name, price) VALUES (%s, %s)", values)
// 1000 products = 1 round trip!

5. Cache frequently accessed data:

sql

// Use Redis/Memcached for hot data
def get_product(product_id):
    # Check cache first
    cached = redis.get(f"product:{product_id}")
    if cached:
        return json.loads(cached)

    # Cache miss: fetch from database
    product = db.query("SELECT * FROM products WHERE id = %s", (product_id,))

    # Store in cache (expire after 1 hour)
    redis.setex(f"product:{product_id}", 3600, json.dumps(product))

    return product

Transaction Management in Applications

Use transactions for multi-step operations:

sql

// Python with context manager
def transfer_money(from_account, to_account, amount):
    try:
        with engine.begin() as conn:  # Auto-commit on success
            # Deduct from source
            conn.execute(
                "UPDATE accounts SET balance = balance - %s WHERE id = %s",
                (amount, from_account)
            )

            # Add to destination
            conn.execute(
                "UPDATE accounts SET balance = balance + %s WHERE id = %s",
                (amount, to_account)
            )

            # Log transaction
            conn.execute(
                "INSERT INTO transaction_log (from_id, to_id, amount) VALUES (%s, %s, %s)",
                (from_account, to_account, amount)
            )

        # Transaction committed automatically
        return {"status": "success"}

    except Exception as e:
        # Transaction rolled back automatically
        logger.error(f"Transfer failed: {e}")
        return {"status": "error", "message": "Transfer failed"}


// Node.js
async function transferMoney(fromAccount, toAccount, amount) {
    const connection = await pool.getConnection();

    try {
        await connection.beginTransaction();

        await connection.execute(
            'UPDATE accounts SET balance = balance - ? WHERE id = ?',
            [amount, fromAccount]
        );

        await connection.execute(
            'UPDATE accounts SET balance = balance + ? WHERE id = ?',
            [amount, toAccount]
        );

        await connection.commit();
        return { status: 'success' };

    } catch (error) {
        await connection.rollback();
        console.error('Transfer failed:', error);
        return { status: 'error', message: 'Transfer failed' };

    } finally {
        connection.release();  // Return to pool
    }
}

Environment Configuration

Never hardcode database credentials!

sql

// Bad: Hardcoded credentials (committed to git!)
connection = mysql.connect(
    host='localhost',
    user='admin',
    password='supersecret123',  // EXPOSED!
    database='production'
)

// Good: Use environment variables
import os

connection = mysql.connect(
    host=os.getenv('DB_HOST'),
    user=os.getenv('DB_USER'),
    password=os.getenv('DB_PASSWORD'),
    database=os.getenv('DB_NAME')
)

// .env file (not committed to git):
DB_HOST=localhost
DB_USER=app_user
DB_PASSWORD=complex_password_here
DB_NAME=myapp_production

Different configs for dev/staging/prod:

sql

// config.py
class Config:
    SQLALCHEMY_DATABASE_URI = os.getenv('DATABASE_URL')

class DevelopmentConfig(Config):
    DEBUG = True
    SQLALCHEMY_ECHO = True  # Log all SQL queries

class ProductionConfig(Config):
    DEBUG = False
    SQLALCHEMY_ECHO = False  # Don't log queries in production

config = {
    'development': DevelopmentConfig,
    'production': ProductionConfig
}

// Use:
app.config.from_object(config[os.getenv('APP_ENV', 'development')])

Common Mistakes to Avoid

Don't make these mistakes:

String concatenation for queries: ❌
- ALWAYS use parameterized queries!
- SQL injection is still the #1 vulnerability
Opening connections per query: ❌
- Use connection pooling
- Opening connections is slow (100-200ms each)
Not closing connections: ❌
- Leads to connection leaks
- Database runs out of connections
- Use finally blocks or context managers
Exposing SQL errors to users: ❌
- Security risk (reveals database structure)
- Log errors server-side, show generic messages to users
No indexes on queried columns: ❌
- Queries become slow as data grows
- Index columns used in WHERE, JOIN, ORDER BY
Fetching all rows instead of paginating: ❌
- Use LIMIT and OFFSET
- Fetch only what you need
Not using transactions: ❌
- Multi-step operations must be atomic
- Use BEGIN, COMMIT, ROLLBACK
Committing credentials to git: ❌
- Use environment variables
- Add .env to .gitignore
Using SELECT * in production: Sometimes OK, often wasteful
- Select only needed columns
- Especially avoid large TEXT/BLOB columns
N+1 query problem: ❌
- Don't query in loops!
- Use JOINs or eager loading (ORM)

Security Checklist

Before deploying your application:

✅ SQL Injection:

[ ] All queries use parameterized statements
[ ] Dynamic identifiers validated against whitelists
[ ] ORM used correctly (not building raw strings)

✅ Authentication & Authorization:

[ ] Least privilege principle (app user has minimal permissions)
[ ] Separate users for different services
[ ] No admin/root credentials in application code

✅ Credentials:

[ ] Database passwords in environment variables
[ ] .env file in .gitignore
[ ] Different passwords for dev/staging/prod
[ ] Passwords rotated regularly

✅ Error Handling:

[ ] Generic error messages to users
[ ] Detailed errors logged server-side
[ ] Stack traces never exposed

✅ Connection Security:

[ ] SSL/TLS for database connections
[ ] Firewall rules restrict database access
[ ] Database not exposed to public internet

✅ Monitoring:

[ ] Slow query logging enabled
[ ] Connection pool metrics tracked
[ ] Failed login attempts monitored
[ ] Unusual query patterns detected

Key Takeaways

What you learned:

✅ SQL injection is prevented by using parameterized queries ✅ Never concatenate user input into SQL strings ✅ Connection pooling dramatically improves performance ✅ ORMs are great for CRUD, raw SQL for complex queries ✅ Error handling must rollback transactions and log appropriately ✅ Environment variables store credentials, never hardcode ✅ Transactions ensure multi-step operations are atomic ✅ Indexes speed up frequently queried columns ✅ Pagination prevents fetching too much data ✅ Security requires constant vigilance (SQL injection, credentials, errors)

Production-ready checklist:

Use parameterized queries everywhere
Set up connection pooling
Handle errors gracefully
Use transactions for multi-step operations
Store credentials in environment variables
Never expose SQL errors to users
Add indexes on frequently queried columns
Monitor slow queries and connection pool
Use least privilege for database users
Enable SSL/TLS for database connections

Remember: Security and performance are not optional. Build them in from day one!

Practice Exercise: Build a secure application:

Create a user registration system with parameterized queries
Set up connection pooling with proper error handling
Implement search functionality (prevent SQL injection!)
Add transaction for order processing (inventory + order + payment)
Configure environment variables for database credentials
Add proper logging without exposing sensitive data

Introduction to SQL in ApplicationsFocusStart Focus Mode

SQL Injection - The ThreatFocusStart Focus Mode

Parameterized Queries - The SolutionFocusStart Focus Mode

When Parameterization Doesn't WorkFocusStart Focus Mode

Connection PoolingFocusStart Focus Mode

ORMs vs Raw SQLFocusStart Focus Mode

Error HandlingFocusStart Focus Mode

Query Optimization for ApplicationsFocusStart Focus Mode

Transaction Management in ApplicationsFocusStart Focus Mode

Environment ConfigurationFocusStart Focus Mode

Common Mistakes to AvoidFocusStart Focus Mode

Security ChecklistFocusStart Focus Mode

Key TakeawaysFocusStart Focus Mode

Introduction to SQL in Applications

SQL Injection - The Threat

Parameterized Queries - The Solution

When Parameterization Doesn't Work

Connection Pooling

ORMs vs Raw SQL

Error Handling

Query Optimization for Applications

Transaction Management in Applications

Environment Configuration

Common Mistakes to Avoid

Security Checklist

Key Takeaways