Relational Database Concepts

Back in 1970, an IBM researcher named Edgar F. Codd published a paper that changed everything. He proposed organizing data using mathematical set theory instead of the hierarchical and network models everyone was using. People thought he was crazy. Today, his relational model powers most of the world’s critical systems.

Let me show you why this matters for your daily work as a DBA.

What Is a Relational Database?

A relational database stores data in tables (called relations) where:

  • Each table represents one type of entity (customers, orders, products)
  • Rows contain individual records
  • Columns contain attributes
  • Tables connect through relationships using keys

Simple example:

CUSTOMERS Table
customer_id | name        | email
101         | John Smith  | john@email.com
102         | Jane Doe    | jane@email.com

ORDERS Table
order_id | customer_id | amount
1001     | 101         | 500
1002     | 101         | 750
1003     | 102         | 300

The customer_id links these tables—that’s the “relational” part.

Core Concepts You Must Know

1. Tables (Relations)

A table is a collection of related data organized in rows and columns.

Key rules:

  • Each table has a unique name
  • Column names must be unique within the table
  • All values in a column must be of the same data type
  • Order of rows doesn’t matter (it’s a set, not a list)
  • Each row should be uniquely identifiable

2. Primary Keys

A primary key uniquely identifies each row in a table.

sql

CREATE TABLE employees (
    employee_id NUMBER PRIMARY KEY,  -- This is the primary key
    name VARCHAR2(100),
    email VARCHAR2(100)
);

Rules:

  • Must be unique
  • Cannot be NULL
  • Should be immutable (never changes)
  • Can be a single column or multiple columns (composite key)

Common mistake: Using email as a primary key. What if someone changes their email? Use an ID instead.

💡 Interview Insight: “Why not use email as a primary key?” Answer: Emails can change, may not be unique in all contexts, and are longer to index than numeric IDs. Primary keys should be stable and efficient.

3. Foreign Keys

A foreign key creates relationships between tables by referencing another table’s primary key.

CREATE TABLE orders (
    order_id NUMBER PRIMARY KEY,
    customer_id NUMBER,
    amount NUMBER,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

What this enforces:

  • Can’t create an order for a non-existent customer
  • Can’t delete a customer who has orders (unless you configure CASCADE)
  • Maintains referential integrity automatically

This is referential integrity—one of the most powerful features of relational databases.

4. Relationships Between Tables

Three types of relationships exist:

One-to-One (1:1)

  • One employee has one employee badge
  • Rare in practice, often combined into one table

One-to-Many (1:M)

  • One customer has many orders
  • Most common relationship type
  • Implemented with foreign key in the “many” table

Many-to-Many (M:M)

  • One student enrolls in many courses
  • One course has many students
  • Requires a junction table (enrollment table linking students and courses)

sql

-- Junction table for many-to-many
CREATE TABLE enrollments (
    student_id NUMBER,
    course_id NUMBER,
    enrollment_date DATE,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
```

### 5. Normalization

**Normalization** is the process of organizing data to reduce redundancy and improve integrity.

**Example of poor design (unnormalized):**
```
ORDERS Table
order_id | customer_name | customer_email | customer_phone | product | price
1001     | John Smith    | john@email.com | 555-1234       | Laptop  | 1200
1002     | John Smith    | john@email.com | 555-1234       | Mouse   | 25
```

**Problems:**
- Customer info repeated for every order
- If John changes his email, must update multiple rows
- Wasted storage space

**After normalization:**
```
CUSTOMERS Table
customer_id | name       | email          | phone
101         | John Smith | john@email.com | 555-1234

ORDERS Table
order_id | customer_id | product | price
1001     | 101         | Laptop  | 1200
1002     | 101         | Mouse   | 25

Benefits:

  • Customer info stored once
  • Updates happen in one place
  • Less storage, fewer errors

Normal Forms (Quick Overview)

1NF (First Normal Form): Each column contains atomic values (no lists or arrays)

2NF (Second Normal Form): 1NF + no partial dependencies (all non-key columns depend on entire primary key)

3NF (Third Normal Form): 2NF + no transitive dependencies (non-key columns don’t depend on other non-key columns)

BCNF (Boyce-Codd Normal Form): Stricter version of 3NF

Most production databases target 3NF, then denormalize selectively for performance.

💡 Interview Insight: “When would you denormalize?” Answer: For read-heavy workloads, reporting tables, or when join performance becomes a bottleneck. Always denormalize intentionally, not accidentally.

6. Integrity Constraints

Constraints maintain data quality and enforce business rules.

Types of constraints:

CREATE TABLE employees (
    employee_id NUMBER PRIMARY KEY,              -- Entity integrity
    department_id NUMBER NOT NULL,               -- Domain integrity
    email VARCHAR2(100) UNIQUE,                  -- Uniqueness
    salary NUMBER CHECK (salary > 0),            -- Domain integrity
    hire_date DATE DEFAULT SYSDATE,              -- Default value
    FOREIGN KEY (department_id)                  -- Referential integrity
        REFERENCES departments(department_id)
);

Entity Integrity: Primary key must be unique and NOT NULL

Domain Integrity: Values must be valid for the data type and within acceptable ranges

Referential Integrity: Foreign keys must reference existing primary keys

User-Defined Integrity: Custom business rules (CHECK constraints, triggers)

7. ACID Properties in Relational Context

Relational databases guarantee ACID properties for transactions:

Atomicity: Transaction succeeds completely or fails completely

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;  -- Both updates succeed or both rollback

Consistency: Database moves from one valid state to another (all constraints satisfied)

Isolation: Concurrent transactions don’t interfere with each other

Durability: Committed changes survive system failures

These guarantees make relational databases ideal for financial systems, e-commerce, and any application where data accuracy is critical.

SQL: The Language of Relational Databases

Structured Query Language (SQL) is the standard language for interacting with relational databases.

Key SQL categories:

DDL (Data Definition Language): Define structure

CREATE TABLE, ALTER TABLE, DROP TABLE

DML (Data Manipulation Language): Manipulate data

SELECT, INSERT, UPDATE, DELETE

DCL (Data Control Language): Control access

GRANT, REVOKE

TCL (Transaction Control Language): Manage transactions

COMMIT, ROLLBACK, SAVEPOINT

SQL is declarative—you specify what you want, not how to get it. The DBMS figures out the optimal execution path.

Relational Operations

The relational model is built on set theory operations:

SELECT: Filter rows based on conditions (σ)

SELECT * FROM employees WHERE salary > 50000;

PROJECT: Choose specific columns (π)

SELECT name, email FROM employees;

JOIN: Combine related tables

SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

UNION: Combine results from multiple queries (must have same columns)

SELECT name FROM employees_2023
UNION
SELECT name FROM employees_2024;

INTERSECT: Find common rows between queries

MINUS (or EXCEPT): Find rows in first query but not in second

These operations follow mathematical set theory, which is why they’re reliable and predictable.

Indexes: Performance Accelerators

Indexes are separate data structures that speed up data retrieval.

CREATE INDEX idx_emp_dept ON employees(department_id);

How indexes work:

  • Like a book’s index—jump directly to relevant pages instead of reading everything
  • Typically implemented as B-tree structures
  • Trade-off: Faster reads, slower writes (index must be updated)

When to use indexes:

  • Columns used in WHERE clauses
  • Columns used in JOIN conditions
  • Columns used in ORDER BY
  • Foreign key columns

When NOT to use indexes:

  • Small tables (full scan is faster)
  • Columns with low selectivity (many duplicate values)
  • Tables with heavy INSERT/UPDATE activity

💡 Interview Insight: “How do you decide what to index?” Answer: Analyze query patterns, examine execution plans, index columns in WHERE/JOIN clauses, but avoid over-indexing. Monitor index usage and drop unused indexes.

Views: Virtual Tables

Views are saved queries that act like tables.

CREATE VIEW active_employees AS
SELECT employee_id, name, email
FROM employees
WHERE status = 'ACTIVE';

-- Use it like a table
SELECT * FROM active_employees WHERE name LIKE 'J%';

Benefits:

  • Simplify complex queries
  • Provide security (users see only specific columns/rows)
  • Abstract underlying schema changes
  • No data duplication (it’s just a stored query)

Why Relational Databases Dominate

After 50+ years, the relational model remains dominant because:

Mathematical foundation: Based on solid set theory and relational algebra

Data integrity: Constraints and ACID properties prevent data corruption

Flexibility: SQL lets you ask questions not anticipated during design

Standardization: SQL is (mostly) standardized across different RDBMS

Maturity: Decades of optimization and tooling

Universal understanding: Most developers/DBAs know relational concepts

Yes, NoSQL databases have their place, but for structured data requiring consistency and complex queries, relational databases are still the gold standard.

Relational vs Other Models

Relational:

  • Data in tables with relationships
  • Schema defined upfront
  • ACID guarantees
  • SQL for queries

Document (MongoDB):

  • Data in JSON-like documents
  • Flexible schema
  • Eventual consistency
  • Query language varies

Key-Value (Redis):

  • Simple key-value pairs
  • Very fast, very simple
  • No complex queries

Graph (Neo4j):

  • Nodes and relationships
  • Optimized for connected data
  • Specialized query language

Choose relational when:

  • Data has clear structure and relationships
  • Consistency is critical
  • Complex queries are needed
  • Multiple access patterns required

Practical Example: E-commerce Database

Let me show you a real-world relational schema:

-- Core entities
CREATE TABLE customers (
    customer_id NUMBER PRIMARY KEY,
    email VARCHAR2(100) UNIQUE NOT NULL,
    name VARCHAR2(100) NOT NULL,
    created_date DATE DEFAULT SYSDATE
);

CREATE TABLE products (
    product_id NUMBER PRIMARY KEY,
    name VARCHAR2(200) NOT NULL,
    price NUMBER(10,2) CHECK (price > 0),
    stock_quantity NUMBER DEFAULT 0
);

CREATE TABLE orders (
    order_id NUMBER PRIMARY KEY,
    customer_id NUMBER NOT NULL,
    order_date DATE DEFAULT SYSDATE,
    status VARCHAR2(20) CHECK (status IN ('PENDING','SHIPPED','DELIVERED')),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE order_items (
    order_id NUMBER,
    product_id NUMBER,
    quantity NUMBER CHECK (quantity > 0),
    price_at_purchase NUMBER(10,2),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

This design ensures:

  • No customer can be deleted if they have orders
  • No product prices become negative
  • Order status is always valid
  • Quantity is always positive
  • Price at purchase is preserved (even if product price changes later)

That’s the power of relational design with proper constraints.


Coming up next: We’ll explore Types of Databases and understand when to use relational vs non-relational databases, helping you make informed architecture decisions.