Home Relational Algebra in Databases
Post
Cancel

Relational Algebra in Databases

When delving into the world of databases, one cannot overlook the fundamental role of relational algebra. It is the backbone of structured query language (SQL), providing a theoretical framework that forms the basis of database operations.


Contents


What is Relational Algebra?

  • An Algebra is a mathematical system consisting of:
    • Operands: variables or values from which new values can be constructed
    • Operators: symbols denoting procedures that construct new values from the given values
  • A Relational Algebra
    • Operands: relations or variables that represent relations
    • Operators: common things that can be performed on relations

Relational algebra is a set of operations that are used to manipulate and retrieve data stored in relational database structures. It consists of a collection of procedural operations that take one or two relations as input and produce a new relation as output. These operations are mathematical in nature and are similar to those used in traditional set algebra.

  • The input of a relational algebra operator is one or more relations
  • The result of an operation is always a relation
  • Necessary to use relational algebra in a cascaded manner
    • Cascaded manner usually refers to a sequential or hierarchical order in which operations or events occur, one after another, often with each step triggering the next.

The Significance of Relational Algebra

Relational algebra is not just an academic concept but is the foundation upon which SQL and other database query languages are built. It provides a formal understanding of how queries are processed and optimized by database management systems (DBMS).

  • Understanding relational algebra helps database professionals to:
    • Model databases more efficiently.
    • Write more effective and optimized queries.
    • Understand the internal workings of a DBMS.
    • Design and optimize the database schema.

Core Operations and Functions in Relational Algebra

Relational algebra forms the theoretical underpinning of relational databases, providing a systematic way to query and manipulate data using a variety of operations and functions.

Relations as Subset of a Cartesian Product

Set Operations

Set operations in relational algebra correspond closely to those in mathematical set theory, providing a way to manipulate sets of tuples (rows).

  • Union (∪): Combines the tuples from two relations and eliminates duplicate tuples. Both relations must be union-compatible—meaning they must have the same number of attributes and corresponding attributes must have the same data types.

    • Example: If R1 and R2 are two relations: R1 ∪ R2 will contain all tuples that are in R1, R2, or in both.

  • Difference (-): Generates a set of tuples that are in the first relation but not in the second. Again, the relations involved must be union-compatible.

    • Example: R1 - R2 will have tuples that are in R1 but not in R2.

  • Intersection (∩): Yields tuples that are present in both relations. The relations must be union-compatible for intersection as well.

    • Example: R1 ∩ R2 will produce a new set of tuples that are both in R1 and R2. R ∩ S = R - (R - S)

R ∩ S = R - (R - S)

  • Cartesian Product (×): Creates a set of combined tuples from two relations, which are not required to be union-compatible.

    • Example: If R1(a1, a2) and R2(b1, b2) are two relations, R1 × R2 will result in a new relation with attributes (a1, a2, b1, b2), combining each tuple from R1 with every tuple from R2.

Commutativity does not hold for Cartesian Product (named or unnamed).

named

unnamed

  • Set Division (÷): This operation is used when we want to find tuples in one relation (R1) that are related to all tuples in another relation (R2).

    • Example: If R1 represents Students who are enrolled in Courses represented by R2, then Students ÷ Courses will yield the set of students who are enrolled in all the courses listed in R2.

Relational Database Specific Operations

These operations are more specific to the relational model and are essential for querying databases.

  • Selection (σ): This operation is used to select rows from a relation that satisfy a certain predicate.

    • Example: σ_(age>25)(Employees) would result in a relation containing employees older than 25.

  • Algebraic laws can be useful in query optimisation
    • σΘ1(σΘ2(R)) = σΘ1⋀Θ2(R)
    • Commutative: σΘ1(σΘ2(R)) = σΘ2(σΘ1(R))
    • σΘ(R⨉S) = σΘ(R) ⨉ S
      • if Θ mentions only attributes of R
  • Cartesian product is expensive, so reducing the size of R (σΘ(R) ⨉ S) is beneficial
  • Renaming (ρ): This is used to change the name of attributes in a relation, allowing for clarity or consistency in query results or expressions.

    • Example: ρ B/A (S) returns a relation with schema identical to S but the attribute name A has been replaced by B. E.g. let S with schema S(A, B, C, D, E), ρ X/B,Y/D (S) creates a relation with schema S(A, X, C, Y, E).
  • Projection (π): Used to select certain columns from a relation, effectively discarding the other columns.

    • Example: π_(name, salary)(Employees) will create a new relation with just the name and salary columns from the Employees relation.

  • Join (⨝): Combines tuples from different relations based on a common attribute.

    • Example: Employees ⨝ Departments joins the Employees and Departments relations on their common attributes (e.g. departmentID).

Set Functions

Set functions perform calculations on a set of values and return a single value.

  • sum: Calculates the sum of a numeric attribute over a set of tuples.

    • Example: sum(π_(salary)(Employees)) would return the total salary for all employees.
  • avg: Computes the average value of a numeric attribute.

    • Example: avg(π_(salary)(Employees)) calculates the average salary of employees.
  • count: Counts the number of tuples.

    • Example: count(σ_(department='Sales')(Employees)) counts the number of employees in the Sales department.
  • max: Finds the maximum value of a numeric attribute.

    • Example: max(π_(age)(Employees)) returns the age of the oldest employee.
  • min: Identifies the minimum value of a numeric attribute.

    • Example: min(π_(age)(Employees)) gives the age of the youngest employee.

Multisets (Bags)

Sets vs. Multisets (Bags)

  • A Multiset (Bag) is like a Set but elements may appear more than once
    • Example: R = {1,3,3,6,6,6,7}, S = {1,1,6,6,7} are both multisets
  • Union of two multisets:
    • {1,3,3,6,6,6,7} ∪ {1,1,6,6,7} = {1,1,1,3,3,6,6,6,6,6,7,7}
  • Difference between two multisets:
    • {1,3,3,6,6,6,7} - {1,1,6,6,7} = {3, 3, 6}
  • Cartesian product:
    • {1, 3, 3} ⨉ {1, 1, 6} = {<1,1>, <1,1>, <1,6>, <3,1>, <3,1>, <3,6>, <3,1>, <3,1>, <3,6>}

Operations on Multisets (Bags)

Given μ(x, B), defined as the number of occurrences of x in multiset B.

  • Union R ∪ S
    • μ(t, R ∪ S) = μ(t, R) + μ(t, S) for all t in R and S
  • Difference R – S
    • μ(t, R – S) = max{μ(t, R) – μ(t, S), 0} for all t in R and S
  • Intersection R ∩ S
    • μ(t, R ∩ S) = min{μ(t, R), μ(t, S)} for all t in R and S
  • Cartesian Product:
    • μ( tt’, R ⨉ S) = μ(t, R) * μ(t’, S) for all t in R and t’ in S

  • Projection:
    • If R is a multiset, then πX(R) is also a multiset
    • If R is a set, then πX(R) may be a multiset
  • Selection:
    • If R is a multiset, then the selection σΘ(R) may be a set
    • If R is a set, then the selection σΘ(R) is a set

Multisets in SQL

SQL is multiset based.

  • Efficiency:
    • Duplicate elimination may take quadratic time
      • For example, after a projection
  • Necessity:
    • Eliminating duplicates might result in information loss/errors (e.g., in computing averages)

Binary Operations: Join (⨝)

  • Binary Operators between two relations
  • Used to combine information from two relations into a new relation
  • Core to Relational Database

Θ-Join (⨝ F)

  • Theta Join combines two relations using a predicate F: R ⨝F S
    • It is equivalent to the cartesian product of the two relations followed by a selection using the predicate:σF (R⨉S)
    • It is called a “theta join” because in the original notation, Θ was used in place of F and was limited to: =, <, >, <=, >=, !=
    • A theta join that only uses the operator = is called an Equijoin

Θ-Join Example

Natural Join (⨝)

  • An natural join is a Θ-join in which no predicate is specified: R ⨝ S
    • It is defined an equijoin over all the common attributes of the two relations
    • The result contains the common attributes followed by the remaining non-common attributes in R and S
      • like an equijoin but the common attributes only appear once
  • Natural Join can be formalised as the Cartesian Product of R and S, followed by the selection on equality amongst the common attributes (A1,.. Ak). Followed by a projection. R ⋈ S = π<list>(σR.A1=S.A1 ⋀ … ⋀ R.Ak = S.Ak (R ⨉ S))
    • where contains
      • All the attributes unique to R
      • All the common attributes
      • All the attributes unique to S
    • Given relations: REGISTERED(student, course, term), TEACHES(lecturer, course, term)
      • REGISTERED ⋈ TEACHES = TAUGHT(student, course, term, lecturer)

Outer Join (⟕, ⟖, ⟗)

  • The Left outer join of two relations R and S is a natural join which also includes tuples from R which do not have corresponding tuples in S; missing values are set to null
    • R ⟕ S = R ⋈ S ∪ ((R - πr1, r2,…,rn(R ⋈ S)) ⨉ {<null,…, null>})

  • other Outer Join:
    • Right Outer Join: R ⟖ S = ((S – πs1, s2,…,sn(R ⋈ S)) ⨉ {}) ∪ R ⋈ S
    • Full Outer Join: R ⟗ S = (((R - πr1, r2,…,rn(R ⋈ S)) ⨉ {}) ∪ ((S – πs1, s2,…,sn(R ⋈ S)) ⨉ {}) ∪ R ⋈ S)

Right Outer Join (R ⟖ S):

1
2
3
4
5
A    | B | C
-------------
a    | 1 | x
a    | 1 | y
NULL | 3 | z

Full Outer Join (R ⟗ S):

1
2
3
4
5
6
A    | B | C
-------------
a    | 1 | x
a    | 1 | y
b    | 2 | NULL
NULL | 3 | z
  • JOIN operations in addition to these outer joins are inner joins.

Semijoin (⋉)

  • Semijion is like a natural join but the resulting attributes are only taken from R

Antijoin (▷)

  • The antijoin is like semijoin but the result only contains tuples from R that have no match in S

Relational Transformations

  • Relational expressions can be transformed with transformation rules
  • Used during SQL query optimisation to rewrite user queries
  • Database engine aims to improve CPU, memory or disk usage

Relational Transformations - General

  • When an expression consists of a series of nested projections, only the last in a sequence of projections is required:
    • πLπM…πN(R) = πL(R)
    • But not if they are extended projections that rely on prior expressions
  • If a selection contains a predicate with conjunctive terms (ie ANDs), the terms can cascade into individual selections
    • σ p∧q∧r (R) = σp(σq(σr(R)))
  • σp(R⨉S) = R ⨝p S

Relational Transformations - Commutative

  • Selection and theta-join are commutative operations
    • σp(σq(R)) = σq(σp(R))
    • R ⨝p S = S ⨝p R
      • But for theta-join, only when using the named perspective
  • Selection and projection are commutative
    • When an expression consists of a selection followed by a projection, the projection can be done first, if the selection predicate only involves attributes in the projection list: - πA1,…Am(σp(R)) = σp(πA1,…Am (R))

Relational Transformations - Associativity

  • Joins exhibits associativity:
    • (R ⨝ S) ⨝ T = R ⨝ (S ⨝ T)

Relational Transformations - Distributivity

  • Where an expression consists of a theta-join followed by a projection, the selection can be performed on both relations prior to the theta-join, if the predicate only involves attributes being joined
    • σp(R ⨝r S) = σp(R) ⨝r σp(S)
    • In this case, selection distributes over theta-join
  • Selection also distributes over set operations
    • σp(R ∪ S) = σp(R) ∪ σp(S)
    • σp(R ∩ S) = σp(R) ∩ σp(S)
    • σp(R - S) = σp(R) - σp(S)
  • Projection distributes over set union
    • πL(R ∪ S)= πL(R) ∪ πL(S)
  • Projection distributes over theta join
    • π L1 ∪ L2 (R ⨝r S) = πL1(R) ⨝r πL2(S)

Relational Algebra and SQL

SQL Relational Algebra
SELECT Projection π
FROM Cartesian Product
WHERE Selection σ

Example With Selection

DB Schema: FACULTY(name, dpt, salary), CHAIR(dpt, name)

Query: Find the salaries of department chairs


  • Relational Algebra:
    • C-SALARY(dpt,salary) = πF.dpt, F.salary(σF.name = C.name ⋀ F.dpt = C.dpt (FACULTY ⨉ CHAIR))
    • or C-SALARY(dpt,salary) = πdpt, salary(FACULTY ⋈ CHAIR)
  • SQL:
    • SELECT FACULTY.dpt, FACULTY.salary FROM FACULTY, CHAIR WHERE FACULTY.name = CHAIR.name AND FACULTY.dpt = CHAIR.dpt

Example No Selection

Goal: Compute the Cartesian product of relations of S and T


  • Relational algebra:
    • S ⨉ T
  • SQL:
    • SELECT * FROM S, T

Example Self-Joins

  • Goal: Compute expressions that rely on Self-Joins
    • However, relation names in the FROM list must be distinct
    • This stops us from computing self-joins, ie FROM R, R
    • Many interesting queries involve self-joins
  • DB Schema: FATHER(father-name, child-name)
  • Compute: GRANDFATHER(grandfather-name, grandchild-name)

Relational Algebra:

  • First, take the Cartesian Product of Father and Father

FATHER:

father-name child-name
David Nick
Nick Joe
Joe Mick

FATHER x FATHER:

father-name child-name father-name child-name
David Nick David Nick
Nick Joe David Nick
Joe Mick David Nick
David Nick Nick Joe
Nick Joe Nick Joe
Joe Mick Nick Joe
David Nick Joe Mick
Nick Joe Joe Mick
Joe Mick Joe Mick
  • Second, select where the child-name is the same as the father-name: σ $2=$3 ( FATHER x FATHER)
father-name child-name father-name child-name
David Nick Nick Joe
Nick Joe Joe Mick
  • Then, π$1,$4 (σ $2=$3 ( FATHER x FATHER))
father-name child-name
David Joe
Nick Mick
  • Final, ρ grandfather-name/father-name, grandchild-name/child-name (π$1,$4 (σ$2=$3 (FATHER ⨉ FATHER)))
    • get: GRANDFATHER(grandfather-name, grandchild-name)
grandfather-name grandchild-name
David Joe
Nick Mick

SQL:

  • SQL does not support referencing columns by position number
  • Instead, SQL supports an aliasing mechanism
    • SELECT R.father-name AS grandfather-name, T.child-name AS grandchild-name FROM FATHER AS R, FATHER AS T WHERE R.child-name = T.father-name

Relational Completeness of SQL

  • SQL can express all relational algebra queries
    • i.e., it is a relationally complete database query language
  • As we saw:
    • SELECT DISTINCT …
    • FROM …
    • WHERE …
  • can express Cartesian product, projection, and selection

  • SQL has explicit constructs for union and difference:
    • Union R ∪ S: (SELECT * FROM R) UNION (SELECT * FROM S)
    • INTERSECT R ∩ S: (SELECT * FROM R) INTERSECT (SELECT * FROM S)
    • Difference R – S: (SELECT * FROM R) EXCEPT (SELECT * FROM S)
  • SQL supports these operations: UNION, EXCEPT, INTERSECT, UNION ALL, INTERSECT ALL and **EXCEPT ALL **. (There may be some differences in specific SQL implementations (e.g. MySQL, PostgreSQL, etc.) or some database systems may not support specific operators.)
    • UNION, INTERSECT and EXCEPT eliminates duplicates! (Set semantics)
    • UNION ALL, INTERSECT ALL and EXCEPT ALL does not (Multiset semantics)
This post is licensed under CC BY 4.0 by the author.
ip