DBMS Notes
Last updated
Last updated
Execution of a sequence of operations, basic unit of change in DBMS; No partial transaction is allowed.
Sequencial sacrifice parallel <—> Parallel cause possible danger:
Temporary Inconsistency
: Unavoidable, not an issue
Permanent Inconsistency
: Unacceptable, cause problem with correctness and integrity
Boundaries of transaction defined by the client:
BEGIN
: transaction start
COMMIT
: all transaction modifications are saved/DBMS overrides this and abort
ABORT
: all transaction modifications are undone, looks like the transaction never happens
ACID
:
Atomicity: Either all changes happen, or all not happen
Consistency: If transaction and database at the beginning is consistent, also consistent when transaction finish
Isolation: When a transaction is happening, it has illusion of being isolated from other transactions.
Durability: If transaction commit, its effect to DBMS is persistent.
Logging: Logs all actions for undo when abort. Maitain undo records on memory and disk
Shadow Paging: DBMS make copies of transaction and change the copy when modification. Copied pages be visible only when commit.
Database Consistency: Real world entities, like age not be negative. Transaction in the future can see effects on past within the databse
Transaction Consistency: If database is consistent before transaction, it should also be consistent after transaction
Concurrency Control:
Pessimistic: DBMS assumes conflict exist, so avoid conflict in the first place.
Optimistic: DBMS assumes conflict rare, so deal with conflict after commit
Execution Schedule:
Serial Schedule: No interleave action
Equivalent Schedule: 1st schedule effect = 2nd schedule effect
Serializable Schedule: == any serial execution of transaction. Different execution –> Different result, but all are correct.
Conflict:
Read-Write Conflict(Unrepeatable Reads): A transaction not able to get the same value when reading the same object multiple times
Write-Read Conflict(Dirty Reads): A transaction sees the write effect of another transaction before that other txn is commited.
Write-Write Conflict(Lost Update): A transaction overwrite the uncommited change of a concurrent txn
Conflict Serializability :
Conflict Equivalent: Involve same operation of same transaction and every pair of conflict operations are in same order. If Schedule A is Conflict Equivalent
to Schedule B, A can be Conflict Serialized
.
Dependency Graph: There exists an edge from Ti –> Tj iff an operation Oi conflict with Oj. Schedule is conflict serializable iff the graph is acyclic
ransaction Locks
Shared Lock(S-Lock): Allow multiple transactions to read the same object at the same time. If a txn hold the lock, other can still access the lock
Exclusive Lcok(X-Lock): Prevent other txn from modifying the object. Other txn cannot get both S-Lock and X-Lock
Lock Manager
is centralized and grants or block request based on what locks are currently held by other txn. NO-NEED to be duriable, since transaction abort when DBMS crash.
Phase1: Growing: Each txn acquires lock
Phase2: Shrinking: Enter this phase when the first txn free its lock. Txns only release lock.
2PL is sufficient to guarantee conflict serializability, but susceptible to
cascading aborts
, when a txn abort but another txn must roll-back.
Any value written by a txn is never read or overwrite by another txn until the txn first commit. Only release lock when commit.
Fix cascading aborts
but limit concurrency
Deadlock Detection: Wait-for
graph with direct edge from Ti –> Tj if Ti waits for Tj to release the lock.
DBMS pick a victim to abort, using several combinations of heuristic. When aborting a txn, DBMS determine how far to roll-back.
Deadlock Prevention: Kill a txn when two txns are trying to acquire the same lock:
Wait-Die (Old wait for Young):If requesting transaction has higher priority than holding transaction, it waits.
Wound-Die (Young wait for Old):If requesting transaction has higher priority than holding transaction, it acquire, the holding txn aborts.
Use hierarchy to implicitly assign lock to chidren objects.
Allow high level node to be locked in shared or exclusive mode, no check on descendent node
Intention-Shared (IS): Explicit locking at lower level with shared locks
Intention-Exclusive (IX): Explicit locking at lower level with shared or exclusive locks
Shared+Intention-Exclusive (SIX): Sub-tree rooted at the node is locked in shared mode, and explicit locking at lower level with exclusive mode
Each transation Ti is assigned with a unique timestamp TS(Ti), monotonically increasing If TS(Ti) < TS(Tj) DBMS execute Ti before Tj
Always a combination of these two.
Basic T/O
Each object X assigned with the successfully last execution of Read or Write(R-TS(X) & W-TS(X)).Any txn tries to violate the order will abort
Read Operation: If TS(Ti) < W-TS(X), violation, cannot read before write; Otherwise, valid, R-TS(X) update to max(R-TS(X), TS(Ti)) —> Copy of X to ensure repeat read for Ti
Write Operation If TS(Ti) < R-TS(X) or TS(Ti) < W-TS(X), Ti must restart; Otherwise, valid, update R-TS(X) and W-TS(X).
If TS(Ti) < W-TS(X), ignore violation and let Ti continue, since no other txn will read Ti’s write.
Conflict Serializable for Basic T/O, but may also starvation for long transactions if short transactions keep causing conflict, and permits not Recovery
: if transactions commits only after all other txns whose change they read from, commit.
Works best when conflicts are low:
All txns read-only
Access disjoint subset of data
Each txn has a Private Space
for read and write. No other txns can access this private space.
When commit, DBMS check for a txn’s write-set
for conflicts —> install into global database.
Read-Phase: DBMS tracks read/write sets of txn and store in private space
Validation-Phase: DBMS checks conflict when a txn commits
Write-Phase: If valid, DBMS apply change to global. Otherwise abort & restart the txn.
Check whether all txns go from older change to younger change; If TS(Ti) < TS(Tj):
Ti completes all three phases before Tj begin
Ti completes before Tj’s write phase, and Ti no write to any object read by Tj.
Ti completes read phase before Tj does, and Ti no write to any object read/written by Tj.
Anomalies:
Dirty Read: Reading uncommited data
Unrepeatable Read: Redoing a read result differently from another read
Phantoms Read: Insertion/Deletion result differently for same range of query scan.
Isolation Levels:
Serializable
X
X
X
Repeatable Read
X
X
O
Read-Committed
X
O
O
Read-Uncomitted
O
O
O
Above only focus on 2PL-based DBMS anomalies.
Two additional levels:
Cursor Stability:
Between repeatable reads & read-committed.
Prevent Lost Update
Default in IBM DB2
Snapshot Isolation:
Guarantee all txns read the same snapshot of database when start
Txn commit only if write do not conflict with any concurrent updates made since the snapshot
Susceptible to write skew anomaly.
Multi-Version Concurrency Control (MVCC)
Allows DBMS to maitain multiple physical
versions of single logical
object in database.
When txn write, DBMS create new version of that object
When txn read, read the newest version exist
Writer NO block write; Reader NO block read —> Allow modify while other txn read older version.
Time Travel Queries: Queries based on state of databse of some arbitrary time point.
Design Decision:
Concurrency Control Protocol (2PL, T/O, OCC ….)
Version Storgae
Garbage Collection
Index Management
Version Chain: Created by tuple’s pointer field —> Linked List of version by timestamp —> Index point to “head” (Newest/Oldest) —> Thread traverse the chain for correct version
Append-Only Storage: All physical versions in one table; Append new version & Update chain;
Oldest-to-Newest(O2N) sort: chain traversal for look-ups;
Newest-to-Oldest(N2O): update index pointer for new version;
Time-Travel Storage: Seperate table: Time-Travel table <— Older versions. Copy old to main table with new version; Tuple in main table point to older version in time-travel table
Delta Storage: Similar to Time-Travel, only store changes between version (Deltas); Fast in write but Slow in read
Clean reclaimable
: No active txn can see the version OR Made by txn aborted
Tuple-Level GC: Direct examine tuples:
a. Background Vacuuming: Seperate thread periodically scan; Optimization: Dirty Page Bitmap
: track modified since last scan to skip unchange.
b. Cooperative Cleaning: Worker thread scan when traverse chain; Only for O2N.
Transaction-Level GC: Each txn track their own old version. When txn complete and all older version from the txn not visible, trigger GC by DBMS.
Primary Key (pkey) point to version chain head, If update, treat as DELETE –> INSERT.
Secondary Key Management:
Logical Pointer: Map logical ID (Indirection Layer) –> location of tuple: Update Mapping
Physical Pointer: Physical address –> version chain head; Update each index
IDS: Integrated Data Store:
Network data model
Tuple-at-a-time queries
IMS: Information Management System:
Hierarchical data model
Programmer-defined physical storage format
Tupe-at-a-time queries
Relational-Model:
Store database in simple data structures
Access data through high-level language
Physical storgae left up to implementation
Object-Oriented Database:
JSON & XML
HTAP: Hybrid Transactional-Analytical Processing:
Distributed / Shared-Nothing
Relational / SQL
Open / Close source
Storage on non-volatile storgae(HDD, SSD……)
Organized as a set of fix-length pages
Buffer pool to cache pages–>Manage movement of pages between disk and memory
If no page in memory: Retrive from disk & copy frame
into its’ buffer pool
If no free frame: Find a page to evict
If evict page is dirty: Write the page back to disk
Buffer Pool:
Tuple access always in memory
Tuple record id–>Memory location
Working thread pin pages–>No swap to disk
Concurrency Control:
Txn stall at any time when accessed data not in memory, execute other txm
Set Lock ACID
Locks in seperate data structure, no swap to disk
Logging & Recovery:
Steal + No-force
buffer pool policy–>All commit flush to WAL
Log read both before & after image
Track LSNs in DBMS
Storage location permanently
in memory –> DRAM
Not store in slotted pages, but still manage in tuples in block/page
Direct memory pointer VS record id
Fixed length VS Variable length data pools
Use checksum to detect error from trashing the database
Blockid+offset—>Direct mem address of fix-length block—>attributes of tuple—>Variable length data blocks
Memory-optimized indexes works worse than B+ tree since they were not cache aware.
Indexes usually rebuilt when restart to avoid logging overhead
Sequential scan are not significantly faster than random access
Tuple-at-a-time strategy is too slow.
DMBS need WAL for non-volatile storage
Use group commit
to batch log entry and flush them together to amortize fsync
cost
Possible to use light-weight logging schema
No “dirty” page, no track on LSNs on whole system.
A protocol to allow txns to access a database in multi-programmed fashion. —>Group of txns=serial txns —>Atomicity+Isolation in ACID —>Mutex slow: Use Compare & Swap(CAS)
Two Phase Locking(2PL)
txns will conflict–>acquire locks before DB access
Deadlock Detection:
txn maitain a queure of txn holding locks
seperate thread to check the queue
if deadlock found, use heuristic(Least Amount of Work) kill txn
Deadlock Prevention:
check if the lock requested already hold
txn (1) wait, (2) suicide, (3) kill other txn
Timeastamp Ordering(T/O)
txns will rare conflict–>check conflict when commit
OCC:
Store all changes in private workspace
Check conflict at commit and merge
Copy to private->update private->check conflict->write to global