Personal Blog
  • đź’»Notes for Computer Science
  • Leetcode
    • Array
      • Container with most water
      • 3Sum
      • Next Permutation
      • Valid Sudoku
      • Permutation II
      • Combination Sum
      • Triangle
      • Maximal Square
      • Pairs of Songs with Total Duration Divisible by 60
      • Numbers At Most N Given Digit Set
      • Possible Sum
      • Swap Lex Order
      • Partition Equal Subset Sum
      • Domino and Tromino
      • Numbers At Most N Given Digits
      • Car Pooling
      • Surrounding Regions
      • Min Size Subarray Sum
      • Burst Balloons
      • Jump Game I
      • Jump Game II
      • House Robber II
      • Delete and Earn
      • Word Break
      • Decode Ways
      • Longest Increasing Subsequence
      • Cherry Pickup
      • Rotate Image
    • LinkedList
      • IsListPalindrome
      • Linked List Cycle
      • MergeTwoLinkedList
      • ReverseNodeInKGroup
      • RearrangeLastN
      • Remove Duplicates From Sorted List
      • RemoveKFromList
    • String
      • Generate Parentheses
      • Longest Valid Parentheses
      • Longest Common Subsequence
      • Count and Say
      • Decode String
      • Permutation in String
    • Tree
      • House Robber III
      • Convert Sorted Array to Binary Search Tree
      • Restore Binary Tree
      • Populating Next Right Pointers in Each Node II
      • Subtree of Another Tree
    • Graph
      • All Paths from Source to Target
      • Reorder Routes to Make All Paths Lead to the City Zero
      • Max Points on a Line
  • DBMS
    • DBMS Notes
  • Web App
    • Web Design
    • JavaScript
    • React.js
    • ReactNative
    • Mobile Design
    • Dialogue Flow
  • AnaplanIntern
    • Splunk
    • Docker
    • Kubernetes
  • đź’° Notes for Finance Concept
  • Analysis Concept
    • Volume Spread Analysis
    • Smart Money Concepts
Powered by GitBook
On this page
  • Transaction
  • Strawman System
  • Definition
  • Atomicity
  • Consistency
  • Isolation
  • Two Phase Locking (2PL)
  • Deadlock Handling
  • Lock Granularities
  • Timestamp Ordering Concurrency Control
  • Optimization: Thomas Write Rule
  • Optimistic Concurrency Control
  • Validation-Phase
  • Isolation Level
  • Version Storage
  • Garbage Collection:
  • Index Management
  • History of DBMS
  • On-Disk DBMS
  • In-Memory DBMSs
  1. DBMS

DBMS Notes

PreviousMax Points on a LineNextWeb Design

Last updated 3 years ago

Transaction

Execution of a sequence of operations, basic unit of change in DBMS; No partial transaction is allowed.

Strawman System

Sequencial sacrifice parallel <—> Parallel cause possible danger:

  • Temporary Inconsistency: Unavoidable, not an issue

  • Permanent Inconsistency: Unacceptable, cause problem with correctness and integrity

Definition

Boundaries of transaction defined by the client: BEGIN: transaction start COMMIT: all transaction modifications are saved/DBMS overrides this and abort ABORT: all transaction modifications are undone, looks like the transaction never happens ACID:

  • Atomicity: Either all changes happen, or all not happen

  • Consistency: If transaction and database at the beginning is consistent, also consistent when transaction finish

  • Isolation: When a transaction is happening, it has illusion of being isolated from other transactions.

  • Durability: If transaction commit, its effect to DBMS is persistent.

Atomicity

  1. Logging: Logs all actions for undo when abort. Maitain undo records on memory and disk

  2. Shadow Paging: DBMS make copies of transaction and change the copy when modification. Copied pages be visible only when commit.

Consistency

  1. Database Consistency: Real world entities, like age not be negative. Transaction in the future can see effects on past within the databse

  2. Transaction Consistency: If database is consistent before transaction, it should also be consistent after transaction

Isolation

Concurrency Control:

  1. Pessimistic: DBMS assumes conflict exist, so avoid conflict in the first place.

  2. Optimistic: DBMS assumes conflict rare, so deal with conflict after commit

Execution Schedule:

  1. Serial Schedule: No interleave action

  2. Equivalent Schedule: 1st schedule effect = 2nd schedule effect

  3. Serializable Schedule: == any serial execution of transaction. Different execution –> Different result, but all are correct.

Conflict:

  1. Read-Write Conflict(Unrepeatable Reads): A transaction not able to get the same value when reading the same object multiple times

  2. Write-Read Conflict(Dirty Reads): A transaction sees the write effect of another transaction before that other txn is commited.

  3. Write-Write Conflict(Lost Update): A transaction overwrite the uncommited change of a concurrent txn

Conflict Serializability :

  1. Conflict Equivalent: Involve same operation of same transaction and every pair of conflict operations are in same order. If Schedule A is Conflict Equivalent to Schedule B, A can be Conflict Serialized.

  2. Dependency Graph: There exists an edge from Ti –> Tj iff an operation Oi conflict with Oj. Schedule is conflict serializable iff the graph is acyclic

ransaction Locks

  1. Shared Lock(S-Lock): Allow multiple transactions to read the same object at the same time. If a txn hold the lock, other can still access the lock

  2. Exclusive Lcok(X-Lock): Prevent other txn from modifying the object. Other txn cannot get both S-Lock and X-Lock

Lock Manager is centralized and grants or block request based on what locks are currently held by other txn. NO-NEED to be duriable, since transaction abort when DBMS crash.

Two Phase Locking (2PL)

  1. Phase1: Growing: Each txn acquires lock

  2. Phase2: Shrinking: Enter this phase when the first txn free its lock. Txns only release lock.

2PL is sufficient to guarantee conflict serializability, but susceptible to cascading aborts, when a txn abort but another txn must roll-back.

Strong Strict Two Phase Locking:

Any value written by a txn is never read or overwrite by another txn until the txn first commit. Only release lock when commit. Fix cascading aborts but limit concurrency

Deadlock Handling

  1. Deadlock Detection: Wait-for graph with direct edge from Ti –> Tj if Ti waits for Tj to release the lock.

DBMS pick a victim to abort, using several combinations of heuristic. When aborting a txn, DBMS determine how far to roll-back.

  1. Deadlock Prevention: Kill a txn when two txns are trying to acquire the same lock:

    • Wait-Die (Old wait for Young):If requesting transaction has higher priority than holding transaction, it waits.

    • Wound-Die (Young wait for Old):If requesting transaction has higher priority than holding transaction, it acquire, the holding txn aborts.

Lock Granularities

Use hierarchy to implicitly assign lock to chidren objects.

Intention Locks:

Allow high level node to be locked in shared or exclusive mode, no check on descendent node

  • Intention-Shared (IS): Explicit locking at lower level with shared locks

  • Intention-Exclusive (IX): Explicit locking at lower level with shared or exclusive locks

  • Shared+Intention-Exclusive (SIX): Sub-tree rooted at the node is locked in shared mode, and explicit locking at lower level with exclusive mode

Timestamp Ordering Concurrency Control

Each transation Ti is assigned with a unique timestamp TS(Ti), monotonically increasing If TS(Ti) < TS(Tj) DBMS execute Ti before Tj

Always a combination of these two.

Basic T/O

Each object X assigned with the successfully last execution of Read or Write(R-TS(X) & W-TS(X)).Any txn tries to violate the order will abort

  • Read Operation: If TS(Ti) < W-TS(X), violation, cannot read before write; Otherwise, valid, R-TS(X) update to max(R-TS(X), TS(Ti)) —> Copy of X to ensure repeat read for Ti

  • Write Operation If TS(Ti) < R-TS(X) or TS(Ti) < W-TS(X), Ti must restart; Otherwise, valid, update R-TS(X) and W-TS(X).

Optimization: Thomas Write Rule

If TS(Ti) < W-TS(X), ignore violation and let Ti continue, since no other txn will read Ti’s write. Conflict Serializable for Basic T/O, but may also starvation for long transactions if short transactions keep causing conflict, and permits not Recovery: if transactions commits only after all other txns whose change they read from, commit.

Optimistic Concurrency Control

Works best when conflicts are low:

  • All txns read-only

  • Access disjoint subset of data

Each txn has a Private Space for read and write. No other txns can access this private space. When commit, DBMS check for a txn’s write-set for conflicts —> install into global database.

  1. Read-Phase: DBMS tracks read/write sets of txn and store in private space

  2. Validation-Phase: DBMS checks conflict when a txn commits

  3. Write-Phase: If valid, DBMS apply change to global. Otherwise abort & restart the txn.

Validation-Phase

Check whether all txns go from older change to younger change; If TS(Ti) < TS(Tj):

  1. Ti completes all three phases before Tj begin

  2. Ti completes before Tj’s write phase, and Ti no write to any object read by Tj.

  3. Ti completes read phase before Tj does, and Ti no write to any object read/written by Tj.

Isolation Level

Anomalies:

  1. Dirty Read: Reading uncommited data

  2. Unrepeatable Read: Redoing a read result differently from another read

  3. Phantoms Read: Insertion/Deletion result differently for same range of query scan.

Isolation Levels:

Level
Dirty-Read
Unrepeatable-Read
Phantom-Read

Serializable

X

X

X

Repeatable Read

X

X

O

Read-Committed

X

O

O

Read-Uncomitted

O

O

O

Above only focus on 2PL-based DBMS anomalies.

Two additional levels:

  1. Cursor Stability:

    • Between repeatable reads & read-committed.

    • Prevent Lost Update

    • Default in IBM DB2

  2. Snapshot Isolation:

    • Guarantee all txns read the same snapshot of database when start

    • Txn commit only if write do not conflict with any concurrent updates made since the snapshot

    • Susceptible to write skew anomaly.

Multi-Version Concurrency Control (MVCC)

Allows DBMS to maitain multiple physical versions of single logical object in database.

  • When txn write, DBMS create new version of that object

  • When txn read, read the newest version exist

Writer NO block write; Reader NO block read —> Allow modify while other txn read older version.

  • Time Travel Queries: Queries based on state of databse of some arbitrary time point.

  • Design Decision:

    • Concurrency Control Protocol (2PL, T/O, OCC ….)

    • Version Storgae

    • Garbage Collection

    • Index Management

Version Storage

  • Version Chain: Created by tuple’s pointer field —> Linked List of version by timestamp —> Index point to “head” (Newest/Oldest) —> Thread traverse the chain for correct version

  1. Append-Only Storage: All physical versions in one table; Append new version & Update chain;

    • Oldest-to-Newest(O2N) sort: chain traversal for look-ups;

    • Newest-to-Oldest(N2O): update index pointer for new version;

  2. Time-Travel Storage: Seperate table: Time-Travel table <— Older versions. Copy old to main table with new version; Tuple in main table point to older version in time-travel table

  3. Delta Storage: Similar to Time-Travel, only store changes between version (Deltas); Fast in write but Slow in read

Garbage Collection:

Clean reclaimable: No active txn can see the version OR Made by txn aborted

  1. Tuple-Level GC: Direct examine tuples: a. Background Vacuuming: Seperate thread periodically scan; Optimization: Dirty Page Bitmap: track modified since last scan to skip unchange.

    b. Cooperative Cleaning: Worker thread scan when traverse chain; Only for O2N.

  2. Transaction-Level GC: Each txn track their own old version. When txn complete and all older version from the txn not visible, trigger GC by DBMS.

Index Management

Primary Key (pkey) point to version chain head, If update, treat as DELETE –> INSERT.

Secondary Key Management:

  • Logical Pointer: Map logical ID (Indirection Layer) –> location of tuple: Update Mapping

  • Physical Pointer: Physical address –> version chain head; Update each index

History of DBMS

  • IDS: Integrated Data Store:

    • Network data model

    • Tuple-at-a-time queries

  • IMS: Information Management System:

    • Hierarchical data model

    • Programmer-defined physical storage format

    • Tupe-at-a-time queries

  • Relational-Model:

    • Store database in simple data structures

    • Access data through high-level language

    • Physical storgae left up to implementation

  • Object-Oriented Database:

    • JSON & XML

  • HTAP: Hybrid Transactional-Analytical Processing:

    • Distributed / Shared-Nothing

    • Relational / SQL

    • Open / Close source

On-Disk DBMS

  • Storage on non-volatile storgae(HDD, SSD……)

  • Organized as a set of fix-length pages

  • Buffer pool to cache pages–>Manage movement of pages between disk and memory

    • If no page in memory: Retrive from disk & copy frame into its’ buffer pool

    • If no free frame: Find a page to evict

    • If evict page is dirty: Write the page back to disk

  • Buffer Pool:

    • Tuple access always in memory

    • Tuple record id–>Memory location

    • Working thread pin pages–>No swap to disk

  • Concurrency Control:

    • Txn stall at any time when accessed data not in memory, execute other txm

      • Set Lock ACID

      • Locks in seperate data structure, no swap to disk

  • Logging & Recovery:

    • Steal + No-force buffer pool policy–>All commit flush to WAL

    • Log read both before & after image

    • Track LSNs in DBMS

In-Memory DBMSs

Concept and Data Organization

Storage location permanently in memory –> DRAM

  • Not store in slotted pages, but still manage in tuples in block/page

  • Direct memory pointer VS record id

  • Fixed length VS Variable length data pools

  • Use checksum to detect error from trashing the database

Blockid+offset—>Direct mem address of fix-length block—>attributes of tuple—>Variable length data blocks

Index

  • Memory-optimized indexes works worse than B+ tree since they were not cache aware.

  • Indexes usually rebuilt when restart to avoid logging overhead

Query Processing

  • Sequential scan are not significantly faster than random access

  • Tuple-at-a-time strategy is too slow.

Logging and Recovery

  • DMBS need WAL for non-volatile storage

    • Use group commit to batch log entry and flush them together to amortize fsync cost

    • Possible to use light-weight logging schema

  • No “dirty” page, no track on LSNs on whole system.

Concurrency Control

A protocol to allow txns to access a database in multi-programmed fashion. —>Group of txns=serial txns —>Atomicity+Isolation in ACID —>Mutex slow: Use Compare & Swap(CAS)

Two Phase Locking(2PL)

txns will conflict–>acquire locks before DB access

Deadlock Detection:

  • txn maitain a queure of txn holding locks

  • seperate thread to check the queue

  • if deadlock found, use heuristic(Least Amount of Work) kill txn

Deadlock Prevention:

  • check if the lock requested already hold

  • txn (1) wait, (2) suicide, (3) kill other txn

Timeastamp Ordering(T/O)

txns will rare conflict–>check conflict when commit

OCC:

  • Store all changes in private workspace

  • Check conflict at commit and merge

  • Copy to private->update private->check conflict->write to global