DBMS Notes

Transaction

Execution of a sequence of operations, basic unit of change in DBMS; No partial transaction is allowed.

Strawman System

Sequencial sacrifice parallel <—> Parallel cause possible danger:

Temporary Inconsistency: Unavoidable, not an issue
Permanent Inconsistency: Unacceptable, cause problem with correctness and integrity

Definition

Boundaries of transaction defined by the client: BEGIN: transaction start COMMIT: all transaction modifications are saved/DBMS overrides this and abort ABORT: all transaction modifications are undone, looks like the transaction never happens ACID:

Atomicity: Either all changes happen, or all not happen
Consistency: If transaction and database at the beginning is consistent, also consistent when transaction finish
Isolation: When a transaction is happening, it has illusion of being isolated from other transactions.
Durability: If transaction commit, its effect to DBMS is persistent.

Atomicity

Logging: Logs all actions for undo when abort. Maitain undo records on memory and disk
Shadow Paging: DBMS make copies of transaction and change the copy when modification. Copied pages be visible only when commit.

Consistency

Database Consistency: Real world entities, like age not be negative. Transaction in the future can see effects on past within the databse
Transaction Consistency: If database is consistent before transaction, it should also be consistent after transaction

Isolation

Concurrency Control:

Pessimistic: DBMS assumes conflict exist, so avoid conflict in the first place.
Optimistic: DBMS assumes conflict rare, so deal with conflict after commit

Execution Schedule:

Serial Schedule: No interleave action
Equivalent Schedule: 1st schedule effect = 2nd schedule effect
Serializable Schedule: == any serial execution of transaction. Different execution –> Different result, but all are correct.

Conflict:

Read-Write Conflict(Unrepeatable Reads): A transaction not able to get the same value when reading the same object multiple times
Write-Read Conflict(Dirty Reads): A transaction sees the write effect of another transaction before that other txn is commited.
Write-Write Conflict(Lost Update): A transaction overwrite the uncommited change of a concurrent txn

Conflict Serializability :

Conflict Equivalent: Involve same operation of same transaction and every pair of conflict operations are in same order. If Schedule A is Conflict Equivalent to Schedule B, A can be Conflict Serialized.
Dependency Graph: There exists an edge from Ti –> Tj iff an operation Oi conflict with Oj. Schedule is conflict serializable iff the graph is acyclic

ransaction Locks

Shared Lock(S-Lock): Allow multiple transactions to read the same object at the same time. If a txn hold the lock, other can still access the lock
Exclusive Lcok(X-Lock): Prevent other txn from modifying the object. Other txn cannot get both S-Lock and X-Lock

Lock Manager is centralized and grants or block request based on what locks are currently held by other txn. NO-NEED to be duriable, since transaction abort when DBMS crash.

Two Phase Locking (2PL)

Phase1: Growing: Each txn acquires lock
Phase2: Shrinking: Enter this phase when the first txn free its lock. Txns only release lock.

2PL is sufficient to guarantee conflict serializability, but susceptible to cascading aborts, when a txn abort but another txn must roll-back.

Strong Strict Two Phase Locking:

Any value written by a txn is never read or overwrite by another txn until the txn first commit. Only release lock when commit. Fix cascading aborts but limit concurrency

Deadlock Handling

Deadlock Detection: Wait-for graph with direct edge from Ti –> Tj if Ti waits for Tj to release the lock.

DBMS pick a victim to abort, using several combinations of heuristic. When aborting a txn, DBMS determine how far to roll-back.

Deadlock Prevention: Kill a txn when two txns are trying to acquire the same lock:
- Wait-Die (Old wait for Young):If requesting transaction has higher priority than holding transaction, it waits.
- Wound-Die (Young wait for Old):If requesting transaction has higher priority than holding transaction, it acquire, the holding txn aborts.

Lock Granularities

Use hierarchy to implicitly assign lock to chidren objects.

Intention Locks:

Allow high level node to be locked in shared or exclusive mode, no check on descendent node

Intention-Shared (IS): Explicit locking at lower level with shared locks
Intention-Exclusive (IX): Explicit locking at lower level with shared or exclusive locks
Shared+Intention-Exclusive (SIX): Sub-tree rooted at the node is locked in shared mode, and explicit locking at lower level with exclusive mode

Timestamp Ordering Concurrency Control

Each transation Ti is assigned with a unique timestamp TS(Ti), monotonically increasing If TS(Ti) < TS(Tj) DBMS execute Ti before Tj

System-Clock: edge case like daylight saving
Logical-Counter: Overflow and maitanance between multiple machines.

Always a combination of these two.

Basic T/O

Each object X assigned with the successfully last execution of Read or Write(R-TS(X) & W-TS(X)).Any txn tries to violate the order will abort

Read Operation: If TS(Ti) < W-TS(X), violation, cannot read before write; Otherwise, valid, R-TS(X) update to max(R-TS(X), TS(Ti)) —> Copy of X to ensure repeat read for Ti
Write Operation If TS(Ti) < R-TS(X) or TS(Ti) < W-TS(X), Ti must restart; Otherwise, valid, update R-TS(X) and W-TS(X).

Optimization: Thomas Write Rule

If TS(Ti) < W-TS(X), ignore violation and let Ti continue, since no other txn will read Ti’s write. Conflict Serializable for Basic T/O, but may also starvation for long transactions if short transactions keep causing conflict, and permits not Recovery: if transactions commits only after all other txns whose change they read from, commit.

Optimistic Concurrency Control

Works best when conflicts are low:

All txns read-only
Access disjoint subset of data

Each txn has a Private Space for read and write. No other txns can access this private space. When commit, DBMS check for a txn’s write-set for conflicts —> install into global database.

Read-Phase: DBMS tracks read/write sets of txn and store in private space
Validation-Phase: DBMS checks conflict when a txn commits
Write-Phase: If valid, DBMS apply change to global. Otherwise abort & restart the txn.

Validation-Phase

Check whether all txns go from older change to younger change; If TS(Ti) < TS(Tj):

Ti completes all three phases before Tj begin
Ti completes before Tj’s write phase, and Ti no write to any object read by Tj.
Ti completes read phase before Tj does, and Ti no write to any object read/written by Tj.

Isolation Level

Anomalies:

Dirty Read: Reading uncommited data
Unrepeatable Read: Redoing a read result differently from another read
Phantoms Read: Insertion/Deletion result differently for same range of query scan.

Isolation Levels:

Level

Dirty-Read

Unrepeatable-Read

Phantom-Read

Serializable

Repeatable Read

Read-Committed

Read-Uncomitted

Above only focus on 2PL-based DBMS anomalies.

Two additional levels:

Cursor Stability:
- Between repeatable reads & read-committed.
- Prevent Lost Update
- Default in IBM DB2
Snapshot Isolation:
- Guarantee all txns read the same snapshot of database when start
- Txn commit only if write do not conflict with any concurrent updates made since the snapshot
- Susceptible to write skew anomaly.

Multi-Version Concurrency Control (MVCC)

Allows DBMS to maitain multiple physical versions of single logical object in database.

When txn write, DBMS create new version of that object
When txn read, read the newest version exist

Writer NO block write; Reader NO block read —> Allow modify while other txn read older version.

Time Travel Queries: Queries based on state of databse of some arbitrary time point.
Design Decision:
- Concurrency Control Protocol (2PL, T/O, OCC ….)
- Version Storgae
- Garbage Collection
- Index Management

Version Storage

Version Chain: Created by tuple’s pointer field —> Linked List of version by timestamp —> Index point to “head” (Newest/Oldest) —> Thread traverse the chain for correct version

Append-Only Storage: All physical versions in one table; Append new version & Update chain;
- Oldest-to-Newest(O2N) sort: chain traversal for look-ups;
- Newest-to-Oldest(N2O): update index pointer for new version;
Time-Travel Storage: Seperate table: Time-Travel table <— Older versions. Copy old to main table with new version; Tuple in main table point to older version in time-travel table
Delta Storage: Similar to Time-Travel, only store changes between version (Deltas); Fast in write but Slow in read

Garbage Collection:

Clean reclaimable: No active txn can see the version OR Made by txn aborted

Tuple-Level GC: Direct examine tuples: a. Background Vacuuming: Seperate thread periodically scan; Optimization: Dirty Page Bitmap: track modified since last scan to skip unchange.
b. Cooperative Cleaning: Worker thread scan when traverse chain; Only for O2N.
Transaction-Level GC: Each txn track their own old version. When txn complete and all older version from the txn not visible, trigger GC by DBMS.

Index Management

Primary Key (pkey) point to version chain head, If update, treat as DELETE –> INSERT.

Secondary Key Management:

Logical Pointer: Map logical ID (Indirection Layer) –> location of tuple: Update Mapping
Physical Pointer: Physical address –> version chain head; Update each index

History of DBMS

IDS: Integrated Data Store:
- Network data model
- Tuple-at-a-time queries
IMS: Information Management System:
- Hierarchical data model
- Programmer-defined physical storage format
- Tupe-at-a-time queries
Relational-Model:
- Store database in simple data structures
- Access data through high-level language
- Physical storgae left up to implementation
Object-Oriented Database:
- JSON & XML
HTAP: Hybrid Transactional-Analytical Processing:
- Distributed / Shared-Nothing
- Relational / SQL
- Open / Close source

On-Disk DBMS

Storage on non-volatile storgae(HDD, SSD……)
Organized as a set of fix-length pages
Buffer pool to cache pages–>Manage movement of pages between disk and memory
- If no page in memory: Retrive from disk & copy frame into its’ buffer pool
- If no free frame: Find a page to evict
- If evict page is dirty: Write the page back to disk
Buffer Pool:
- Tuple access always in memory
- Tuple record id–>Memory location
- Working thread pin pages–>No swap to disk
Concurrency Control:
- Txn stall at any time when accessed data not in memory, execute other txm
  - Set Lock ACID
  - Locks in seperate data structure, no swap to disk
Logging & Recovery:
- Steal + No-force buffer pool policy–>All commit flush to WAL
- Log read both before & after image
- Track LSNs in DBMS

In-Memory DBMSs

Concept and Data Organization

Storage location permanently in memory –> DRAM

Not store in slotted pages, but still manage in tuples in block/page
Direct memory pointer VS record id
Fixed length VS Variable length data pools
Use checksum to detect error from trashing the database

Blockid+offset—>Direct mem address of fix-length block—>attributes of tuple—>Variable length data blocks

Index

Memory-optimized indexes works worse than B+ tree since they were not cache aware.
Indexes usually rebuilt when restart to avoid logging overhead

Query Processing

Sequential scan are not significantly faster than random access
Tuple-at-a-time strategy is too slow.

Logging and Recovery

DMBS need WAL for non-volatile storage
- Use group commit to batch log entry and flush them together to amortize fsync cost
- Possible to use light-weight logging schema
No “dirty” page, no track on LSNs on whole system.

Concurrency Control

A protocol to allow txns to access a database in multi-programmed fashion. —>Group of txns=serial txns —>Atomicity+Isolation in ACID —>Mutex slow: Use Compare & Swap(CAS)

Two Phase Locking(2PL)

txns will conflict–>acquire locks before DB access

Deadlock Detection:
txn maitain a queure of txn holding locks
seperate thread to check the queue
if deadlock found, use heuristic(Least Amount of Work) kill txn

Deadlock Prevention:
check if the lock requested already hold
txn (1) wait, (2) suicide, (3) kill other txn

Timeastamp Ordering(T/O)

txns will rare conflict–>check conflict when commit

OCC:
Store all changes in private workspace
Check conflict at commit and merge
Copy to private->update private->check conflict->write to global

PreviousMax Points on a Line NextWeb Design

Last updated 3 years ago