Query Processing
Relational Operators Notes
Logical vs Physical Operators
Logical Operators: What they do
- Examples: Union, Selection, Join, Grouping
Physical Operators How they do it (How they perfrom a Selection, which would be using a B-tree)
- Example: Nest Loop join, sort-merge, hash join, index join
Relational Operators that we will Review
- Select
- Project
- Join
- Aggregation
- Set Operators
Relational Model
- Quick Review
Artist(name, year, country)
name | year | country |
---|---|---|
Wu Tang Clan | 1992 | USA |
Notorious B.I.G. | 1992 | USA |
Tupac | 1994 | USA |
A relation is an unordered set that contain the relationship of attributes that represent entites
A tuple is a set of attribute values (also know as its domain) in the relation
- Values are normally atomic/scalar
- The special value NULL is a member of every domain
Relational Algebra is procedural query language,which takes Relation as input and generate relation as output
Select and Project
What is Select? -> Access path = a way to retrieve tuples from a table
The SELECT operation is used for selecting a subset of the tuples according to a given syntax of selection
Sigma(σ)Symbol denotes it in relational algebra
- File scan
- Scan the “Entire” File
- I/O cost: O(n), where n = # of pages
- EXPENSIVE $$$
- Index Scan:
- Uses an index available, think hash or B+ Tree
- I/O cost: varies on the index we pick
- Hash Index: O(1)
- B+ Tree Index: O(logN) + X
How do we choose the right index?
- Selectivity of an access path = fraction of data pages that need to be retrieved
Joins
-
Review Relational Algebra
-
A Regular Join returns the cross product
https://www.youtube.com/watch?v=pJWCwfv983Q&t=38s
Nested Loop Join
Block Nested Loop Join
Index Nested Loop Join
Joins and Memory
Sort-Merge Join (SMJ)
Hash Join (HJ)
SMJ Vs. HJ
Sort
WHy do we need to sort?
- Becuase Tuples in a table of relational mode have no specific order
Sorting Algorithms
-
If Data fits in memory, then we can use standard sorting algorithm like quick sort
-
If Data does not fit in memory, we can use external sorting