Skip to content

Database Design and Development

Database Design and Development

Higher Database Design

Flat Files vs. Relational Databases

Flat files:

  • Single table, simple structure
  • Suitable for small amounts of data
  • Problems: data redundancy, data inconsistency, data dependency, difficulty of sharing, security issues

Relational databases:

  • Multiple related tables
  • Minimise redundancy through normalisation
  • Support concurrent access, security, integrity
  • Query using SQL (Structured Query Language)
FeatureFlat FileRelational Database
TablesSingleMultiple, related
RedundancyHighMinimised by normalisation
Data integrityNot enforcedEnforced by constraints
Concurrent accessNot supportedSupported with locking
QueryingManual (sequential)SQL (declarative, powerful)
ScalabilityPoorGood
SecurityLimitedUser accounts, permissions

Relational Database Concepts

Entity: A thing of interest (e.g., Student, Course).

Attribute: A property of an entity (e.g., student name, course code).

Relation (Table): A set of tuples (rows) with the same attributes (columns).

Tuple (Record): A single row in a table.

Domain: The set of allowable values for an attribute.

Primary Key: A column (or set of columns) that uniquely identifies each row in a table. Must be Unique and not null.

Foreign Key: A column in one table that references the primary key of another table. Establishes Relationships.

Composite key: A primary key made up of two or more columns.

Candidate key: Any column or set of columns that could serve as a primary key.

Worked Example. In a table of Students, which of the following could be a candidate key: StudentID, Name, Email?

StudentID is a candidate key (unique, not null). Name is not (two students can have the same name). Email could be a candidate key if the system requires unique emails.

Entity-Relationship Modelling

Step 1: Identify entities and attributes

For a school system:

  • Student (StudentID, Name, DateOfBirth, Address)
  • Course (CourseCode, CourseName, Credits)
  • Enrolment (StudentID, CourseCode, Grade)

Step 2: Identify relationships

  • A Student can enrol in many Courses
  • A Course can have many Students
  • This is a many-to-many (M:N) relationship, which requires a junction table (Enrolment)

Step 3: Draw the ER diagram

Student (1) ----< Enrolment >---- (1) Course

Normalisation

Normalisation is the process of organising data to minimise redundancy and dependency.

Unnormalised Form (UNF): Raw data with repeating groups.

First Normal Form (1NF): No repeating groups. Every cell contains a single value. Each row is Unique (has a primary key).

Second Normal Form (2NF): In 1NF and no partial dependencies (every non-key attribute depends on The entire primary key, not just part of it). Only applies to tables with composite keys.

Third Normal Form (3NF): In 2NF and no transitive dependencies (no non-key attribute depends on Another non-key attribute).

Example of normalisation:

UNF:

OrderIDCustomerNameCustomerCityProductCodeProductNameQuantity
001J. SmithEdinburghP01Widget5
001J. SmithEdinburghP02Gadget3
002A. JonesGlasgowP01Widget2

1NF (remove repeating groups): Already in 1NF (each cell has a single value).

2NF (remove partial dependencies):

The composite key is (OrderID, ProductCode). CustomerName and CustomerCity depend only on OrderID (partial dependency). ProductName depends only on ProductCode (partial dependency).

Split into:

  • Order (OrderID, CustomerName, CustomerCity)
  • Product (ProductCode, ProductName)
  • OrderLine (OrderID, ProductCode, Quantity)

3NF (remove transitive dependencies):

In Order, CustomerCity depends on CustomerName (transitive dependency). Split into:

  • Customer (CustomerID, CustomerName, CustomerCity)
  • Order (OrderID, CustomerID)
  • Product (ProductCode, ProductName)
  • OrderLine (OrderID, ProductCode, Quantity)

Why normalisation matters. Without normalisation:

  • Update anomaly: If J. Smith moves city, we must update every row with their name.
  • Insertion anomaly: We cannot add a new customer until they place an order.
  • Deletion anomaly: If order 002 is deleted, we lose the information that Widget is a product.

Referential Integrity

Referential integrity ensures that relationships between tables remain consistent.

  • A foreign key value must either be null or match a value in the referenced primary key
  • Cannot delete a record that is referenced by a foreign key in another table (without cascading)

Worked Example. If we try to delete StudentID = 1 from the Student table, but there are Enrolment records with StudentID = 1, the DBMS should reject the deletion (or cascade it to also Delete the enrolments).

Data Types in SQL

SQL Data TypeDescriptionExample
INT / INTEGERWhole number42
REAL / FLOATDecimal number3.14
DECIMAL(p, s)Fixed-point decimalDECIMAL(10, 2)
CHAR(n)Fixed-length stringCHAR(10)
VARCHAR(n)Variable-length stringVARCHAR(255)
DATEDate (YYYY-MM-DD)‘2026-04-14’
BOOLEANTrue/FalseTRUE
TEXTLong text’Description…’

SQL (Structured Query Language)

Data Definition Language (DDL)

Creating tables:

CREATE TABLE Student (
StudentID INT PRIMARY KEY,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
DateOfBirth DATE,
House VARCHAR(20)
);
CREATE TABLE Course (
CourseCode VARCHAR(10) PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
Teacher VARCHAR(50),
Credits INT
);
CREATE TABLE Enrolment (
StudentID INT,
CourseCode VARCHAR(10),
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseCode),
FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
FOREIGN KEY (CourseCode) REFERENCES Course(CourseCode)
);

Modifying tables:

ALTER TABLE Student ADD COLUMN Email VARCHAR(100);
ALTER TABLE Student DROP COLUMN House;

Data Manipulation Language (DML)

INSERT:

INSERT INTO Student (StudentID, FirstName, LastName, DateOfBirth, House)
VALUES (1, 'Alice', 'MacDonald', '2005-03-15', 'Gryffindor');
INSERT INTO Student (StudentID, FirstName, LastName, DateOfBirth)
VALUES (2, 'Bob', 'Campbell', '2004-11-22');

UPDATE:

UPDATE Student
SET House = 'Slytherin'
WHERE StudentID = 2;

DELETE:

DELETE FROM Enrolment
WHERE Grade IS NULL;

Queries (SELECT)

Basic SELECT:

SELECT FirstName, LastName, House
FROM Student
WHERE House = 'Gryffindor';

Wildcards and pattern matching:

SELECT * FROM Student
WHERE LastName LIKE 'Mc%';

ORDER BY:

SELECT FirstName, LastName, DateOfBirth
FROM Student
ORDER BY DateOfBirth DESC;

Aggregate functions:

SELECT COUNT(*) AS TotalStudents
FROM Student;
SELECT House, COUNT(*) AS StudentCount
FROM Student
GROUP BY House
HAVING COUNT(*) > 5;

JOIN:

SELECT Student.FirstName, Student.LastName, Course.CourseName, Enrolment.Grade
FROM Student
INNER JOIN Enrolment ON Student.StudentID = Enrolment.StudentID
INNER JOIN Course ON Enrolment.CourseCode = Course.CourseCode
WHERE Enrolment.Grade IS NOT NULL
ORDER BY Student.LastName;

LEFT JOIN: Returns all records from the left table and matching records from the right table.

SELECT Student.FirstName, Student.LastName, Course.CourseName
FROM Student
LEFT JOIN Enrolment ON Student.StudentID = Enrolment.StudentID
LEFT JOIN Course ON Enrolment.CourseCode = Course.CourseCode
ORDER BY Student.LastName;

Subqueries:

SELECT FirstName, LastName
FROM Student
WHERE StudentID IN (
SELECT StudentID FROM Enrolment
WHERE CourseCode = 'MATH101'
);

Worked Example. Find all students who are enrolled in more than 3 courses.

SELECT FirstName, LastName
FROM Student
WHERE StudentID IN (
SELECT StudentID FROM Enrolment
GROUP BY StudentID
HAVING COUNT(*) > 3
);

Data Validation

NOT NULL: Column cannot be empty.

UNIQUE: All values in the column must be different.

CHECK: Values must satisfy a condition.

CREATE TABLE Product (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100) NOT NULL,
Price DECIMAL(10, 2) CHECK (Price > 0),
StockQuantity INT CHECK (StockQuantity >= 0)
);

Advanced Higher Database Concepts

Transaction Management (ACID)

Atomicity: A transaction is all-or-nothing. If any part fails, the entire transaction is rolled Back.

Consistency: The database moves from one valid state to another.

Isolation: Concurrent transactions do not interfere with each other.

Durability: Once committed, changes are permanent even if the system crashes.

BEGIN TRANSACTION;
UPDATE Account SET Balance = Balance - 100 WHERE AccountID = 1;
UPDATE Account SET Balance = Balance + 100 WHERE AccountID = 2;
COMMIT;

If any statement fails: ROLLBACK;

Worked Example. Explain why ACID properties are important for a bank transfer.

Atomicity ensures either both the debit and credit happen, or neither does. Consistency ensures Balances remain valid (no negative balances if not allowed). Isolation prevents two simultaneous Transfers from interfering. Durability ensures the transfer is permanent once committed.

Indexing

An index is a data structure that improves the speed of data retrieval.

CREATE INDEX idx_student_name ON Student(LastName, FirstName);

Advantages: Faster SELECT queries.

Disadvantages: Slower INSERT, UPDATE, DELETE; uses additional storage.

When to create an index: On columns frequently used in WHERE clauses, JOIN conditions, and ORDER BY clauses. Do not index columns that are rarely queried or frequently updated.

Views

A view is a virtual table based on a SELECT query.

CREATE VIEW TopStudents AS
SELECT Student.FirstName, Student.LastName, AVG(Enrolment.Grade) AS AverageGrade
FROM Student
JOIN Enrolment ON Student.StudentID = Enrolment.StudentID
GROUP BY Student.StudentID, Student.FirstName, Student.LastName
HAVING AVG(Enrolment.Grade) >= 70;

Benefits of views: Simplify complex queries, restrict data access (security), present data in a Different format.


Advanced SQL Techniques

Window Functions (Higher)

Window functions perform calculations across a set of rows related to the current row.

SELECT FirstName, LastName, Grade,
RANK() OVER (ORDER BY Grade DESC) AS Rank
FROM Students;

Common window functions:

FunctionPurpose
ROW_NUMBERSequential number (no ties)
RANKRank with gaps for ties
DENSE_RANKRank without gaps for ties
LAG / LEADAccess value from previous/next row
SUM / AVGRunning total / running average

Common Table Expressions (CTEs)

A CTE is a named temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE.

WITH HighScorers AS (
SELECT StudentID, AVG(Mark) AS AverageMark
FROM Results
GROUP BY StudentID
HAVING AVG(Mark) >= 75
)
SELECT s.FirstName, s.LastName, hs.AverageMark
FROM Students s
JOIN HighScorers hs ON s.StudentID = hs.StudentID
ORDER BY hs.AverageMark DESC;

Stored Procedures

A stored procedure is a reusable set of SQL statements stored in the database.

CREATE PROCEDURE EnrolStudent(
IN pStudentID INT,
IN pCourseCode VARCHAR(10)
)
BEGIN
INSERT INTO Enrolment (StudentID, CourseCode)
VALUES (pStudentID, pCourseCode);
END;

Benefits: Improved performance (pre-compiled), reduced network traffic, code reuse, security (controls data access).

SQL Injection Prevention in Detail

Vulnerable code:

query = f"SELECT * FROM Users WHERE username = '{username}' AND password = '{password}'"

Attack: Enter admin' OR '1'='1 as username. The query becomes:

SELECT * FROM Users WHERE username = 'admin' OR '1'='1' AND password = ''

This returns all users, bypassing authentication.

Prevention methods:

  1. Parameterised queries (always use these).
  2. Input validation (whitelist allowed characters).
  3. Stored procedures (with parameters).
  4. Least privilege (application database account has minimal permissions).

Database Normalisation Practice

Worked Example. Normalise the following table to 3NF.

PlayerIDNameTeamNameTeamCitySport
1J. SmithRangersGlasgowFootball
2A. JonesCelticGlasgowFootball
3M. BrownRangersGlasgowFootball

1NF: Already in 1NF (single values per cell).

2NF: No composite key, so already in 2NF.

3NF: TeamCity depends on TeamName, not PlayerID (transitive dependency).

Result:

  • Player (PlayerID, Name, TeamName)
  • Team (TeamName, TeamCity, Sport)

Database Indexing in Detail

An index is a B-tree (or hash) data structure that speeds up data retrieval at the cost of slower Writes and additional storage.

When to create an index:

  • Columns frequently used in WHERE clauses.
  • Columns used in JOIN conditions.
  • Columns used in ORDER BY clauses.

When NOT to create an index:

  • Small tables (full table scan is fast enough).
  • Columns that are frequently updated (index must be rebuilt).
  • Columns with low selectivity (e.g., boolean with mostly the same value).

Practice Questions

  1. Write SQL using a CTE to find the top 5 students by average grade across all their courses.

  2. Explain what a stored procedure is and give two advantages of using them.

  3. A table has the following functional dependencies: A -> B, B -> C, C -> D. Is this table in 3NF? If not, normalise it.

  4. Write SQL to create a stored procedure that transfers a student from one course to another.

  5. Explain how a B-tree index works. Why is it more efficient than a sequential scan for large tables?

  6. Write SQL that uses a window function to rank students by grade within each subject.

  7. Explain three methods of preventing SQL injection. For each, explain how it works.

  8. A company database has tables for Departments, Employees, and Projects. Design the schema with appropriate relationships and write the CREATE TABLE statements.

  9. Explain the difference between ROW_NUMBER, RANK, and DENSE_RANK with an example showing how they differ when there are ties.

  10. Write SQL to find all students who scored above the average in every course they are enrolled in.

  11. Explain the concept of a database deadlock. How can deadlocks be prevented?

  12. Compare the performance of an inner join vs a subquery for finding matching records. When would each be more appropriate?

Database Security

Access Control

SQL provides GRANT and REVOKE statements for managing permissions.

GRANT SELECT, INSERT ON Students TO 'librarian'@'localhost';
GRANT ALL PRIVILEGES ON Students TO 'admin'@'localhost';
REVOKE DELETE ON Students FROM 'librarian'@'localhost';

Principle of least privilege: Users should only have the minimum permissions needed to do Their job. A librarian who only needs to search for books should not have DELETE permissions.

Backup Strategies

StrategyDescriptionRecovery Point Objective
Full backupComplete copy of the entire databaseLast backup
Differential backupAll changes since the last full backupLast full backup
Incremental backupAll changes since the last backup (any type)Last backup
Transaction logRecords all transactions for point-in-time recoveryAny point in time

Worked Example. A database is backed up fully on Sunday. Differential backups run Monday to Saturday. The database crashes on Thursday. What backups are needed to restore?

Restore the full backup from Sunday, then the differential backup from Wednesday (the most recent Differential before the crash).

Database Transactions in Practice

Isolation levels:

LevelDirty ReadNon-repeatable ReadPhantom Read
Read UncommittedYesYesYes
Read CommittedNoYesYes
Repeatable ReadNoNoYes
SerializableNoNoNo

Deadlocks: Occur when two transactions each hold a lock that the other needs.

Example: Transaction A locks row 1 and needs row 2. Transaction B locks row 2 and needs row 1. Neither can proceed.

Prevention: Always acquire locks in the same order, use timeouts, or use deadlock detection.

Data Warehousing (Higher)

A data warehouse is a large, centralised database used for reporting and analysis. It integrates Data from multiple sources.

Data warehouse vs operational database:

FeatureOperational DatabaseData Warehouse
PurposeDay-to-day transactionsReporting and analysis
Data volumeModerateVery large
Data structureNormalised (3NF)Denormalised (star schema)
Query patternMany small reads/writesFew large, complex reads
Historical dataCurrent stateYears of historical data

Star schema: A central fact table surrounded by dimension tables.

Example: Sales fact table surrounded by Date, Product, Store, and Customer dimension tables.

Additional Practice Questions

  1. Explain the difference between a full backup, a differential backup, and an incremental backup. When would you use each?

  2. Write SQL GRANT and REVOKE statements for the following scenario: a sales clerk can only INSERT and SELECT from the Orders table, while the manager has full privileges.

  3. Explain what a deadlock is and describe two strategies for preventing deadlocks.

  4. Explain the difference between the four transaction isolation levels. Which is the safest and which is the most performant?

  5. Describe the star schema used in data warehousing. What is a fact table and what is a dimension table?

  6. Write SQL to create a trigger that logs all INSERT operations on the Students table to an AuditLog table.

  7. Explain the concept of database replication. Why is it used and what are the challenges?

  8. A company needs to store 5 years of transaction data for analysis. Explain why a data warehouse is more suitable than an operational database for this purpose.

Worked Examples

See the examples integrated throughout the sections above.

Common Pitfalls

  1. Forgetting primary keys: Every table must have a primary key to uniquely identify each record.

  2. Data redundancy: Not normalising leads to update anomalies, insertion anomalies, and deletion anomalies.

  3. NULL in aggregates: COUNT(*) counts all rows including NULL; COUNT(column) counts non-NULL values.

  4. INNER JOIN vs. LEFT JOIN: INNER JOIN returns only matching rows; LEFT JOIN returns all rows from the left table.

  5. String quotes: SQL uses single quotes for string literals, not double quotes.

  6. Forgetting WHERE in UPDATE/DELETE. Without WHERE, all rows are affected.

  7. Confusing HAVING and WHERE. WHERE filters rows before grouping; HAVING filters groups after GROUP BY.

  8. Not using parameterised queries — leads to SQL injection vulnerabilities.


Practice Questions

  1. Normalise the following data to 3NF:
InvoiceNoCustomerNameCustomerAddressProductCodeProductNameQuantityPrice
1001J. Brown5 High StP01Pen102.00
1001J. Brown5 High StP02Pencil201.50
1002M. Wilson12 Main RdP01Pen52.00
  1. Write SQL to create the tables for your normalised schema from question 1, including primary and foreign keys.

  2. Write SQL queries to: (a) Find all customers who live on ‘High St’ (b) Calculate the total value of each invoice (c) Find the most expensive product

  3. Explain the difference between 2NF and 3NF with an example.

  4. Write SQL to create a view showing all products that are low in stock (quantity less than 10).

  5. Explain the ACID properties of transactions with an example.

  6. Write SQL to add a new column ‘Category’ to the Product table and update all existing records.

  7. Explain when you would use a LEFT JOIN instead of an INNER JOIN.

  8. Write SQL to find the student with the highest average grade across all their courses.

  9. Explain the difference between DELETE and DROP in SQL. When would you use each?

  10. Write SQL to find all pairs of students who are enrolled in the same course.

  11. A database has tables Author (AuthorID, Name, Nationality) and Book (ISBN, Title, AuthorID, Price). Write SQL to find the total value of books by each nationality.

  12. Explain why an index on a frequently updated column might slow down the database overall.

  13. Write SQL that uses a subquery to find students who are NOT enrolled in any course.

  14. Explain the difference between a candidate key and a primary key. Can a table have more than one candidate key?

  15. A hospital database needs tables for Patient, Doctor, and Appointment. Design the schema with appropriate primary and foreign keys. Write the CREATE TABLE statements.

  16. Explain the three types of anomalies that can occur in an unnormalised database. Give a specific example of each.

  17. Write SQL to create a transaction that transfers a student from one course to another, ensuring the enrolment records are updated atomically.

Summary

This topic covers the essential chemistry of database design and development, including key reactions, underlying theories, and practical applications.

Key concepts include:

  • Brønsted-Lowry theory
  • strong and weak acids/bases
  • pH calculations
  • titration curves and indicators
  • hydrolysis of salts

Mastery of these concepts requires both theoretical understanding and the ability to apply knowledge to unfamiliar contexts, particularly in calculation and practical questions.