Relational Database Concepts
Back in 1970, an IBM researcher named Edgar F. Codd published a paper that changed everything. He proposed organizing data using mathematical set theory instead of the hierarchical and network models everyone was using. People thought he was crazy. Today, his relational model powers most of the world’s critical systems.
Let me show you why this matters for your daily work as a DBA.
What Is a Relational Database?
A relational database stores data in tables (called relations) where:
- Each table represents one type of entity (customers, orders, products)
- Rows contain individual records
- Columns contain attributes
- Tables connect through relationships using keys
Simple example:
CUSTOMERS Table
customer_id | name | email
101 | John Smith | john@email.com
102 | Jane Doe | jane@email.com
ORDERS Table
order_id | customer_id | amount
1001 | 101 | 500
1002 | 101 | 750
1003 | 102 | 300The customer_id links these tables—that’s the “relational” part.
Core Concepts You Must Know
1. Tables (Relations)
A table is a collection of related data organized in rows and columns.
Key rules:
- Each table has a unique name
- Column names must be unique within the table
- All values in a column must be of the same data type
- Order of rows doesn’t matter (it’s a set, not a list)
- Each row should be uniquely identifiable
2. Primary Keys
A primary key uniquely identifies each row in a table.
sql
CREATE TABLE employees (
employee_id NUMBER PRIMARY KEY, -- This is the primary key
name VARCHAR2(100),
email VARCHAR2(100)
);Rules:
- Must be unique
- Cannot be NULL
- Should be immutable (never changes)
- Can be a single column or multiple columns (composite key)
Common mistake: Using email as a primary key. What if someone changes their email? Use an ID instead.
💡 Interview Insight: “Why not use email as a primary key?” Answer: Emails can change, may not be unique in all contexts, and are longer to index than numeric IDs. Primary keys should be stable and efficient.
3. Foreign Keys
A foreign key creates relationships between tables by referencing another table’s primary key.
CREATE TABLE orders (
order_id NUMBER PRIMARY KEY,
customer_id NUMBER,
amount NUMBER,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);What this enforces:
- Can’t create an order for a non-existent customer
- Can’t delete a customer who has orders (unless you configure CASCADE)
- Maintains referential integrity automatically
This is referential integrity—one of the most powerful features of relational databases.
4. Relationships Between Tables
Three types of relationships exist:
One-to-One (1:1)
- One employee has one employee badge
- Rare in practice, often combined into one table
One-to-Many (1:M)
- One customer has many orders
- Most common relationship type
- Implemented with foreign key in the “many” table
Many-to-Many (M:M)
- One student enrolls in many courses
- One course has many students
- Requires a junction table (enrollment table linking students and courses)
sql
-- Junction table for many-to-many
CREATE TABLE enrollments (
student_id NUMBER,
course_id NUMBER,
enrollment_date DATE,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
```
### 5. Normalization
**Normalization** is the process of organizing data to reduce redundancy and improve integrity.
**Example of poor design (unnormalized):**
```
ORDERS Table
order_id | customer_name | customer_email | customer_phone | product | price
1001 | John Smith | john@email.com | 555-1234 | Laptop | 1200
1002 | John Smith | john@email.com | 555-1234 | Mouse | 25
```
**Problems:**
- Customer info repeated for every order
- If John changes his email, must update multiple rows
- Wasted storage space
**After normalization:**
```
CUSTOMERS Table
customer_id | name | email | phone
101 | John Smith | john@email.com | 555-1234
ORDERS Table
order_id | customer_id | product | price
1001 | 101 | Laptop | 1200
1002 | 101 | Mouse | 25Benefits:
- Customer info stored once
- Updates happen in one place
- Less storage, fewer errors
Normal Forms (Quick Overview)
1NF (First Normal Form): Each column contains atomic values (no lists or arrays)
2NF (Second Normal Form): 1NF + no partial dependencies (all non-key columns depend on entire primary key)
3NF (Third Normal Form): 2NF + no transitive dependencies (non-key columns don’t depend on other non-key columns)
BCNF (Boyce-Codd Normal Form): Stricter version of 3NF
Most production databases target 3NF, then denormalize selectively for performance.
💡 Interview Insight: “When would you denormalize?” Answer: For read-heavy workloads, reporting tables, or when join performance becomes a bottleneck. Always denormalize intentionally, not accidentally.
6. Integrity Constraints
Constraints maintain data quality and enforce business rules.
Types of constraints:
CREATE TABLE employees (
employee_id NUMBER PRIMARY KEY, -- Entity integrity
department_id NUMBER NOT NULL, -- Domain integrity
email VARCHAR2(100) UNIQUE, -- Uniqueness
salary NUMBER CHECK (salary > 0), -- Domain integrity
hire_date DATE DEFAULT SYSDATE, -- Default value
FOREIGN KEY (department_id) -- Referential integrity
REFERENCES departments(department_id)
);Entity Integrity: Primary key must be unique and NOT NULL
Domain Integrity: Values must be valid for the data type and within acceptable ranges
Referential Integrity: Foreign keys must reference existing primary keys
User-Defined Integrity: Custom business rules (CHECK constraints, triggers)
7. ACID Properties in Relational Context
Relational databases guarantee ACID properties for transactions:
Atomicity: Transaction succeeds completely or fails completely
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- Both updates succeed or both rollbackConsistency: Database moves from one valid state to another (all constraints satisfied)
Isolation: Concurrent transactions don’t interfere with each other
Durability: Committed changes survive system failures
These guarantees make relational databases ideal for financial systems, e-commerce, and any application where data accuracy is critical.
SQL: The Language of Relational Databases
Structured Query Language (SQL) is the standard language for interacting with relational databases.
Key SQL categories:
DDL (Data Definition Language): Define structure
CREATE TABLE, ALTER TABLE, DROP TABLEDML (Data Manipulation Language): Manipulate data
SELECT, INSERT, UPDATE, DELETEDCL (Data Control Language): Control access
GRANT, REVOKETCL (Transaction Control Language): Manage transactions
COMMIT, ROLLBACK, SAVEPOINTSQL is declarative—you specify what you want, not how to get it. The DBMS figures out the optimal execution path.
Relational Operations
The relational model is built on set theory operations:
SELECT: Filter rows based on conditions (σ)
SELECT * FROM employees WHERE salary > 50000;PROJECT: Choose specific columns (π)
SELECT name, email FROM employees;JOIN: Combine related tables
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;UNION: Combine results from multiple queries (must have same columns)
SELECT name FROM employees_2023
UNION
SELECT name FROM employees_2024;INTERSECT: Find common rows between queries
MINUS (or EXCEPT): Find rows in first query but not in second
These operations follow mathematical set theory, which is why they’re reliable and predictable.
Indexes: Performance Accelerators
Indexes are separate data structures that speed up data retrieval.
CREATE INDEX idx_emp_dept ON employees(department_id);How indexes work:
- Like a book’s index—jump directly to relevant pages instead of reading everything
- Typically implemented as B-tree structures
- Trade-off: Faster reads, slower writes (index must be updated)
When to use indexes:
- Columns used in WHERE clauses
- Columns used in JOIN conditions
- Columns used in ORDER BY
- Foreign key columns
When NOT to use indexes:
- Small tables (full scan is faster)
- Columns with low selectivity (many duplicate values)
- Tables with heavy INSERT/UPDATE activity
💡 Interview Insight: “How do you decide what to index?” Answer: Analyze query patterns, examine execution plans, index columns in WHERE/JOIN clauses, but avoid over-indexing. Monitor index usage and drop unused indexes.
Views: Virtual Tables
Views are saved queries that act like tables.
CREATE VIEW active_employees AS
SELECT employee_id, name, email
FROM employees
WHERE status = 'ACTIVE';
-- Use it like a table
SELECT * FROM active_employees WHERE name LIKE 'J%';Benefits:
- Simplify complex queries
- Provide security (users see only specific columns/rows)
- Abstract underlying schema changes
- No data duplication (it’s just a stored query)
Why Relational Databases Dominate
After 50+ years, the relational model remains dominant because:
Mathematical foundation: Based on solid set theory and relational algebra
Data integrity: Constraints and ACID properties prevent data corruption
Flexibility: SQL lets you ask questions not anticipated during design
Standardization: SQL is (mostly) standardized across different RDBMS
Maturity: Decades of optimization and tooling
Universal understanding: Most developers/DBAs know relational concepts
Yes, NoSQL databases have their place, but for structured data requiring consistency and complex queries, relational databases are still the gold standard.
Relational vs Other Models
Relational:
- Data in tables with relationships
- Schema defined upfront
- ACID guarantees
- SQL for queries
Document (MongoDB):
- Data in JSON-like documents
- Flexible schema
- Eventual consistency
- Query language varies
Key-Value (Redis):
- Simple key-value pairs
- Very fast, very simple
- No complex queries
Graph (Neo4j):
- Nodes and relationships
- Optimized for connected data
- Specialized query language
Choose relational when:
- Data has clear structure and relationships
- Consistency is critical
- Complex queries are needed
- Multiple access patterns required
Practical Example: E-commerce Database
Let me show you a real-world relational schema:
-- Core entities
CREATE TABLE customers (
customer_id NUMBER PRIMARY KEY,
email VARCHAR2(100) UNIQUE NOT NULL,
name VARCHAR2(100) NOT NULL,
created_date DATE DEFAULT SYSDATE
);
CREATE TABLE products (
product_id NUMBER PRIMARY KEY,
name VARCHAR2(200) NOT NULL,
price NUMBER(10,2) CHECK (price > 0),
stock_quantity NUMBER DEFAULT 0
);
CREATE TABLE orders (
order_id NUMBER PRIMARY KEY,
customer_id NUMBER NOT NULL,
order_date DATE DEFAULT SYSDATE,
status VARCHAR2(20) CHECK (status IN ('PENDING','SHIPPED','DELIVERED')),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE TABLE order_items (
order_id NUMBER,
product_id NUMBER,
quantity NUMBER CHECK (quantity > 0),
price_at_purchase NUMBER(10,2),
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);This design ensures:
- No customer can be deleted if they have orders
- No product prices become negative
- Order status is always valid
- Quantity is always positive
- Price at purchase is preserved (even if product price changes later)
That’s the power of relational design with proper constraints.
Coming up next: We’ll explore Types of Databases and understand when to use relational vs non-relational databases, helping you make informed architecture decisions.