APS 重点复习课程之数据库系统
Overview
Main Concept
Candidate Key
Primary Key
- Primary Attribute
Foreign Key
The Structure of DBS
- Internal Schema
- Schema
- External Schema
SQL
Norm Form
- 1NF
- 2NF
- BCNF
- 3NF
ER Model
Transaction
- Atomic
- Durability
- Isolation
- Consistency
Failure and Recovery
- Transaction Failure: Backward Scan Log File
- System Failure: Forward Scan Log File
- Media Failure
Concurrency Control
- Lost Update
- Not-repeatable Read
- Dirty Read
Lock
- S-Lock
- X-Lock
Two Phase Locking
- in the former phase add locks only
- in the later phase release locks only
Deadlock
- Reasons
- Solutions
Main Concept
DBMS: abbr. of Data Base Management System, is a piece of software designed to store and manage databases. z.B. DB2, Oracle, MySQL, MS SQL Server, …
DBS: abbr. of Data Base System = DBMS + data + application programs
Entity, Attribute, Key
Domain: a collection of the same data type
Entity type, Relationship
Super keys: set of attributes of table for which every row has distinct set of values
Candidate Key: The value of an attribute group can uniquely identify a tuple, but its subset cannot. aka “minimal” super keys.
Primary Key: One of the candidate key, chosen by DBA
Primary Attribute: the attributes of candidate key
Foreign Key
The Structure of DBS
3-level schema structure of the DB
Internal Schema
A description of the physical structure and storage of the data.
Methods included: Hash map or B+ Tree
Schema
A description of the logical structure and characteristics of all data in the database, such as the type of data and the length of data, the relationships between data as well.
A specific value of schema is called instance.
External Schema
A description of the logical structure and characteristics of the local data that users can see a database have a lots of external schema.
It’s the view of users, an external schema can be used by multiple applications.
It’s effective precaution of database security, a user can only see the data in the external schema while other data out of external schema is invisible.
Secondary Image Function of the DB
External mode/mode image
- When the schema is changed, we can adjust the external mode/mode image to let the external schema remain unchanged. Because the application is designed based on external mode, if external mode remain unchanged, then the applications don’t need any changes as well.
- Each external mode have a image to the unique mode.
- This kind of image ensure the logical independence between data and program.
Mode/Internal mode image
- When the internal schema is changed, we can adjust the internal mode/mode image to let the schema remain unchanged, thus the applications don’t need any change.
- That ensure the physical independence between data and program.
Why 3-level Schema
- User can access the data without too much care about the storage and organization data.
- Simplify the design of application, the independence between data and program ensure that.
SQL
Create/Drop Schema
Create a schema can be seen as the definition of the namespace.
1 | CREATE SCHEMA XX Authorization <usrname> |
Drop schema:
1 | DROP SCHEMA XX CASCADE | RESTRICT |
CASCADE: drop all the related objects of the schema as well
RESTRICT: It can be dropped only when there is no existed objects of this schema.
Table
Create table:
1 | CREATE TABLE XX |
- Only one PRIMARY KEY, several UNIQUE attributes.
- No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL.
Alter and drop table:
1 | ALTER TABLE Persons |
Trigger
1 | CREATE TRIGGER trigger_name trigger_time trigger_event ON tb_name FOR EACH ROW trigger_stmt |
trigger_name: name of trigger
trigger_time: time to trigger, BEFORE
or AFTER
trigger_event: incident to trigger, INSERT
, DELETE
orUPDATE
.
tb_name: Name of table to set trigger on.
trigger_stmt: What does trigger do, can be a SQL instruction or multi-statement started after BEGIN
and end before END
.
There are 6 types of triggers in MySQL:
1 | BEFORE INSERT |
Query
select… from… where…
1 | SELECT AttributeName |
Joint Query
1 | SELECT student sno,sname |
exists
z.B. Find name of employee in department which only have only one employee.
1 | SELECT empname FROM employee e1 |
Nested Query
Nest a internal query inside the WHERE of external query.
any & all
Find the beers not sold for the lowest price.
1 | SELECT beer FROM sells WHERE price>ANY(SELECT price FROM sells) |
- x = ANY is true if and only if x equals at least one tuple in the relation.
- Similarly, = can be replaced by any comparison operator.
- x > ANY, means x is larger than at least one tuple in the relation.
Find the beers sold for the highest price.
1 | SELECT beer FROM sells WHERE price>=ANY(SELECT price FROM sells) |
- x > ALL, means x is larger than any one tuple in the relation.
union, intersection, difference
Find the drinkers and beers such that the drinker likes the beer and frequents a bar that serves it.
1 | (SELECT * FROM likes) INTERSECT (SELECT drink,beer FROM sells,frequents WHERE frequents.bar=sells.bar) |
distinct
Find the different price for beers.
1 | SELECT DISTINCT price FROM sells |
aggregations
sum, avg, min, max, count, count(*).
1 | SELECT AVG(price) |
grouping
Given a table workson(empno, projectno, job, enterdate)
How many projects does each employee participate in?
1 | SELECT empno,COUNT(*) |
How many employees participate in each project?
1 | SELECT projectno,COUNT(*) |
How many employees for each job?
1 | SELECT job,COUNT(*) |
having
HAVING clauses are selections on groups.
Find the average price of those beers that are either served in at least 3 bars or manufactured by Busch.
Beers(name(p-key),manf); Sells(bar(p-key), beer(p-key), price)
1 | SELECT beer,AVG(price) |
SQL Authorization
Privileges
Important privileges on a relation:
SELECT INSERT DELETE UPDATE
Grant and Revoke
GRANT:
1 | GRANT <list_of_privileges> ON <relation_or_object> TO <list_of_authorization_ID> |
if you want the recipients to be able to pass the privileges to others, add:
1 | WITH GRANT OPTION |
REVOKE:
1 | REVOKE <list_of_privileges> ON <relation_or_object> FROM <list_of_authorization_ID> |
REVOKE OPTIONS:
1 | CASCADE#any grants made by a revoke are also not in force. |
Grant Diagrams
…
Norm Form
Functional Dependency
X, Y is the subset of the attribute set U.
If there are no such 2 tuples (X, Y), if every X have only one corresponding Y, then , aka Y functional depends on X.
Fully functional dependency:
- if and to every true subset of X named X’, we have then Y fully functional depends on X, remarked as .
Partial functional dependency:
- if and Y is not fully functional depends on X, then Y partly depends on X, remarked as .
Transitive functional dependency:
- if , , , then Z transitive functional depends on X. remark as
Relational Patterns 非常重要⚠️
Schema design motivation:
- redundancy
- inconsistency
- update anomalies
- deletion anomalies
- insertion anomalies
1NF first normal form
every attributes are inseparable.
2NF second normal form
Every non-primary attribute fully functionally depends on any candidate key.
BCNF Boyce Codd normal form
A relation R is in BCNF iff:
- Whenever there is a nontrivial FD . (nontrivial means A is not a member of set X s.t. )
- for R, it is the case that is a super-key for R.
BCNF removes certain types of redundancy & avoids info loss.
3NF third normal form
violates 3NF iff X is not a super-key and A is not prime.
z.B. FD: AB→C and C→B, we have keys AB and AC. Thus A, B and C are each prime. Although C→B violates BCNF, it doesn’t violate 3NF.
A table is in 3NF iff both of the following conditions hold:
- The relation R (table) is in second normal form (2NF).
- Every non-prime attribute of R is non-transitively dependent on every key of R.
4NF fourth normal form
Multivalued dependencies (MVD) is an assertion that if 2 tuples of a relation agree on all the attributes of X, then their components in the set of attributes Y may be swapped, and the result will be 2 tuples that are also in the relation.
Data redundancy.
z.B. Drinkers(name, addr, phones, beersLiked). 2 MVD in this: name→→phones, name →→beersLiked.
A relation R is in 4NF if whenever X→→Y is a nontrivial MVD, then X is a super-key.
nontrivial: Y is not a subset of X, and X & Y are not, together, all the attributes.
ER Model
Entity-Relationship Model
ER Diagram
Relation Multiplicity
- one - one: ←菱形→
- many - one: —菱形→
- many - many —菱形—
Multiway Relationships
Converting a multiway one into many binary ones.
Constraints
Keys: unique identifier. z.B. social security number for a person
Referential integrity constraints: z.B. if you work for a company, it must exist in the database.
Domain constraints: z.B. people’s ages are not negative.
General constraints: all other constraints.
Design principles
Be faithful
Design should reflect ideas of the DB.
Avoid redundancy
- Redundancy occurs when we say the same thing in two different ways.
- Redundancy wastes space and encourage inconsistency.
Entity Sets VS Attributes
Don’t use an entity set when an attribute will do.
An entity set should satisfy at least one of the following:
- It has at least one non-key attribute.
- It is the “many” in a many - one or many - many relationships.
Limit the use of weak entity sets
weak entity set: a weak entity set has one or more many - one relationships to other entity sets. The key for a weak entity set is its own underlined attributes and the keys for the supporting entity sets.
Chain of Weakness
Translating Subclass Entities
3 approaches:
Object-oriented: each entity belongs to exactly one class; create relation for each class, with all its attributes.
ER style: create one relation for each entity set, with only the key attributes and attributes attached to that entity set; entity represented in all relations to whose entity set it belongs.
Use nulls: create one relation; entities have null in attributes that don’t belong to them.
space-wasting
Integrity Constraints
Entity Integrity
Attribute of primary key can’t be NULL.
Referential Integrity
It requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (either in the same or a different relation), then the referenced value must exist.
User-defined Integrity
User define which attributes should be UNIQUE or NOT NULL.
Transaction
A sequence of operations, we can either accomplish all these operations or none of them.
Atomic
Either all transaction are executed or none.
Durability
Once after the commit of transactions, its operation to database should be permanently saved.
Isolation
The execution of transactions won’t disturb each others.
Consistency
The database can only contain the successfully executed transactions, otherwise will the database not in a consistent state.
z.B. After a bank transfer the total deposits should remain unchanged.
Failure and Recovery
3 Failure
Transaction Failure: Backward Scan Log File
The transaction doesn’t reach the commit or the rollback, then the database might not be in a consistent state, thereby we should undo the failed transaction without disturbing the other transactions.
System Failure: Forward Scan Log File
The system corrupt and every transaction in the memory is interrupted, which means the transactions haven’t reach to the commit or rollback need to be undo and the committed transactions maybe still store in the data buffer and haven’t write to the physical database, these transactions need to be redone.
Media Failure
The physical storage media corrupt, z.B. the disk is physical broken.
Log File
Record the operation of every transaction, each log record consists of the begin transaction mark and the commit or rollback mark, the new value after update and the old value before update, as well as the ID of transaction.
- The registration order should strictly correspond to the order of execution.
- Register on the log file first, then write to the database.
System Recovery Strategy
The recovery of transaction failure:
Backward scan log file, undo the operation of the transaction, delete the insert, undo the update (write the value before update back to the database)
The recovery of system failure:
Forward scan log file, put finished process before into redo-list, unfinished process into undo-list.
The recovery of media failure:
Load the latest backup of database and the backup of log file, do the same thing as system failure.
Concurrency Control
In order to ensure the isolation and consistency we need concurrency
control the concurrency can lead to following failures:
Lost Update
Transaction A and B write the same data back to the database then the later update is lost.
Not-repeatable Read
Transaction B update the data after transaction A have read it then the transaction A can’t reproduce the previous data.
Dirty Read
Transaction A write a data back to the disk and the same time transaction B read the updated data, but transaction A is undo by some reason and the data recover to the original value.
Lock
Share/Exclusive lock
S-Lock
aka share lock:
If transaction T gives a S lock to data A, then T can read A but can’t update A and the other transaction can only give S lock to A but not X locks.
X-Lock
aka exclusive lock:
If transaction T gives a X lock to data A, then only T can read or update A and all the other transactions can’t give any lock to A.
Serializable Scheduling Method
The concurrency of multiple transaction is correct only if the result is the same as the result of serial execution.
Precedence Graph
Top sort…
Two Phase Locking Protocol (2PL)
In order to ensure two transactions are serializable.
in the former phase add locks only
in the later phase release locks only
z.B. S-lock A S-lock B X-lock C Unlock A Unlock C Unlock B
Deadlock
Reasons
Refer to operating system
The condition of the deadlock
- mutual exclusion
- Hold and wait or resource holding
- no preemption 不剥夺条件
- circular wait
Solutions
- Visit object at the same order
- Avoid transactions between users.
- Keep transactions short, at the same batch.