Sunday 4 December 2011

KEY - DataBase


Alternate key
In the context of relational databases, an alternate key (or secondary key) is any candidate key which is not
selected to be the primary key (PK).
For example, a relational database with a table "employee" could have attributes like "employee_id",
"national_insurance_number", and so on. In this case, both "employee_id" and "national_insurance_number" serve
as unique identifiers for a given employee, and could thus arguably be used for a primary key. Hence, both of them
are called "candidate keys". If, for example, "national_insurance_number" was chosen as the primary key,
"employee_id" would become the alternate key.
In a practical RDMS, it is normal to declare any candidate keys / alternate keys not used as the primary key as
'unique', to prevent errors where an alternate key ends up with duplicate values due to typos (this would not happen
for the candidate key chosen to be the primary key, the RDMS would automatically detect it in that case). In the
example above, if "national_insurance_number" was chosen as the primary key, "employee_id" would be declared
unique, and vice versa. In some practical designs, a designer may use a surrogate key instead of a natural candidate
key - here too the unused alternate key must be declared as 'unique'.

Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a
set of attributes such that
1. the relation does not have two distinct tuples with the same values for these attributes (which means that the set
of attributes is a superkey)
2. there is no proper subset of these attributes for which (1) holds (which means that the set is minimal).
Since a relation contains no duplicate tuples, the set of all its attributes is a superkey if NULL values are not used. It
follows that every relation will have at least one candidate key.
The candidate keys of a relation tell us all the possible ways we can identify its tuples. As such they are an important
concept for the design database schema.
For practical reasons RDBMSs usually require that for each relation one of its candidate keys is declared as the
primary key, which means that it is considered as the preferred way to identify individual tuples. Foreign keys, for
example, are usually required to reference such a primary key and not any of the other candidate keys.
Example
The definition of candidate keys can be illustrated with the following (abstract) example. Consider a relation variable
(relvar) R with attributes (A, B, C, D) that has only the following two legal values r1 and r2:
r1
A B C D
a1 b1 c1 d1
a1 b2 c2 d1
a2 b1 c2 d1

r2
A B C D
a1 b1 c1 d1
a1 b2 c2 d1
a1 b1 c2 d2
Here r2 differs from r1 only in the A and D values of the last tuple.
For r1 the following sets have the uniqueness property, i.e., there are no two tuples in the instance with the same
values for the attributes in the set:
{A,B}, {A,C}, {B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
For r2 the uniqueness property holds for the following sets;
{B,C}, {B,D}, {C,D}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Since superkeys of a relvar are those sets of attributes that have the uniqueness property for all legal values of that
relvar and because we assume that r1 and r2 are all the legal values that R can take, we can determine the set of
superkeys of R by taking the intersection of the two lists:
{B,C}, {A,B,C}, {A,B,D}, {A,C,D}, {B,C,D}, {A,B,C,D}
Finally we need to select those sets for which there is no proper subset in the list, which are in this case:
{B,C}, {A,B,D}, {A,C,D}
These are indeed the candidate keys of relvar R.
We have to consider all the relations that might be assigned to a relvar to determine whether a certain set of
attributes is a candidate key. For example, if we had considered only r1 then we would have concluded that {A,B} is
a candidate key, which is incorrect. However, we might be able to conclude from such a relation that a certain set is
not a candidate key, because that set does not have the uniqueness property (example {A,D} for r1). Note that the
existence of a proper subset of a set that has the uniqueness property cannot in general be used as evidence that the
superset is not a candidate key. In particular, note that in the case of an empty relation, every subset of the heading
has the uniqueness property, including the empty set.

Determining candidate keys
The previous example only illustrates the definition of a candidate key and not how these are determined in practice.
It is important to determine all superkeys, which is especially difficult if the relation represents a set of relationships
rather than a set of entities. Therefore it is often useful to attempt to find any "forgotten" superkeys by also
determining the functional dependencies. Consider for example the relation Marriage(Husband, Wife, Date) for
which it will trivially hold that {Husband, Wife, Date} is a superkey. If we assume that a certain person can only
marry once on a given date then this implies the functional dependencies {Husband,Date}Wife and
{Wife,Date}Husband. From this then we can derive more superkeys by applying the following rule:
if S is a superkey and XY a functional dependency
then (S-Y)+X is also a superkey
where '-' is the set difference and '+' the set union. In this case this leads to the derivation of the superkeys {Husband,
Date} and {Wife, Date}


Compound key
In database design, a compound key is a key that consists of 2 or more attributes that uniquely identify an entity
occurrence. Each attribute that makes up the compound key is a simple key in its own right.
This is often confused with a composite key whereby even though this is also a key that consists of 2 or more
attributes that uniquely identify an entity occurrence, at least one attribute that makes up the composite key is not a
simple key in its own right.
An example might be an entity that represents the modules each student is attending at University. The entity has a
studentId and a moduleCode as its primary key. Each of the attributes that make up the primary key are simple keys
because each represents a unique reference when identifying a student in one instance and a module in the other.
In contrast, using the same example, imagine we identified a student by their firstName + lastName. In our table
representing students on modules our primary key would now be firstName + lastName + moduleCode. Because
firstName + lastName represent a unique reference to a student, it is not a simple key, it is a combination of
attributes used to uniquely identify a student. Therefore the primary key for this entity is a composite key.
No restriction is applied to the attributes regarding their (initial) ownership within the data model. This means that
any one, none, or all, of the multiple attributes within the compound key can be foreign keys. Indeed, a foreign key
may itself be a compound key.
Compound keys almost always originate from attributive or associative entities (tables) within the model, but this is
not an absolute.
External links
• Composite Inverse Functional Properties [1]: for an equivalent notion in the Semantic Web
• Relation Database terms of reference, Keys [2]: An overview of the different types of keys in an RDBMS


Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.[1]
A foreign key is a field in a relational table that matches the primary key of another table. The foreign key can be
used to cross-reference tables.
The foreign key identifies a column or a set of columns in one (referencing) table that refers to a set of columns in
another (referenced) table. The columns in the referencing table must be the primary key or other candidate key in
the referenced table. The values in one row of the referencing columns must occur in a single row in the referenced
table. Thus, a row in the referencing table cannot contain values that don't exist in the referenced table (except
potentially NULL). This way references can be made to link information together and it is an essential part of
database normalization. Multiple rows in the referencing table may refer to the same row in the referenced table.
Most of the time, it reflects the one (parent table or referenced table) to many (child table, or referencing table)
relationship.
The referencing and referenced table may be the same table, i.e. the foreign key refers back to the same table. Such a
foreign key is known in SQL:2003 as a self-referencing or recursive foreign key.
A table may have multiple foreign keys, and each foreign key can have a different referenced table. Each foreign key
is enforced independently by the database system. Therefore, cascading relationships between tables can be
established using foreign keys.
Improper foreign key/primary key relationships or not enforcing those relationships are often the source of many
database and data modeling problems.
Defining Foreign Keys
Foreign keys are defined in the ANSI SQL Standard, through a FOREIGN KEY constraint. The syntax to add such a
constraint to an existing table is defined in SQL:2003 as shown below. Omitting the column list in the
REFERENCES clause implies that the foreign key shall reference the primary key of the referenced table.
ALTER TABLE <table identifier>
ADD [ CONSTRAINT <constraint identifier> ]
FOREIGN KEY ( <column expression> {, <column expression>}... )
REFERENCES <table identifier> [ ( <column expression> {, <column expression>}... ) ]
[ ON UPDATE <referential action> ]
[ ON DELETE <referential action> ]
Likewise, foreign keys can be defined as part of the CREATE TABLE SQL statement.
CREATE TABLE table_name (
id INTEGER PRIMARY KEY,
col2 CHARACTER VARYING(20),
col3 INTEGER,
...
FOREIGN KEY(col3)
REFERENCES other_table(key_col) ON DELETE CASCADE,
... )
If the foreign key is a single column only, the column can be marked as such using the following syntax:
CREATE TABLE table_name (
id INTEGER PRIMARY KEY,
Foreign key 2
col2 CHARACTER VARYING(20),
col3 INTEGER REFERENCES other_table(column_name),
... )
Foreign keys can be defined with stored proc statement.
sp_foreignkey tabname, pktabname, col1 [, col2] ... [, col8]
tabname : is the name of the table or view that contains the foreign key to be defined. pktabname : is the name
of the table or view that has the primary key to which the foreign key applies. The primary key must already be
defined. col1 : is the name of the first column that makes up the foreign key. The foreign key must have at least one
column and can have a maximum of eight columns.
Referential Actions
Because the Database Management System enforces referential constraints, it must ensure data integrity if rows in a
referenced table are to be deleted (or updated). If dependent rows in referencing tables still exist, those references
have to be considered. SQL:2003 specifies 5 different referential actions that shall take place in such occurrences:
• CASCADE
• RESTRICT
• NO ACTION
• SET NULL
• SET DEFAULT
CASCADE
Whenever rows in the master (referenced) table are deleted, the respective rows of the child (referencing) table with
a matching foreign key column will get deleted as well. This is called a cascade delete.
Example Tables: Customer(customer_id, cname, caddress) and Order(customer_id, products, payment)
Customer is the master table and Order is the child table, where 'customer_id' is the foreign key in Order and
represents the customer who placed the order. When a row of Customer is deleted, any Order row matching the
deleted Customer's customer_id will also be deleted.
NOTE: In Microsoft SQL, a cascading delete to a self-referencing table is not allowed. You must either use a trigger,
create a stored procedure, or handle the cascading delete from the calling application. An example of this is where a
single table has an ID as identity and a ParentID with a relationship to ID in the same table.
RESTRICT
A value cannot be updated or deleted when a row exists in a foreign key table that references the value in the
referenced table.
Similarly, a row cannot be deleted as long as there is a reference to it from a foreign key table.
NO ACTION
NO ACTION and RESTRICT are very much alike. The main difference between NO ACTION and RESTRICT is
that with NO ACTION the referential integrity check is done after trying to alter the table. RESTRICT does the
check before trying to execute the UPDATE or DELETE statement. Both referential actions act the same if the
referential integrity check fails: the UPDATE or DELETE statement will result in an error.
Foreign key 3
In other words, when an UPDATE or DELETE statement is executed on the referenced table using the referential
action NO ACTION, the DBMS verifies at the end of the statement execution that none of the referential
relationships are violated. This is different from RESTRICT, which assumes at the outset that the operation will
violate the constraint. Using NO ACTION, the triggers or the semantics of the statement itself may yield an end state
in which no foreign key relationships are violated by the time the constraint is finally checked, thus allowing the
statement to complete successfully.
SET NULL
The foreign key values in the referencing row are set to NULL when the referenced row is updated or deleted. This is
only possible if the respective columns in the referencing table are nullable. Due to the semantics of NULL, a
referencing row with NULLs in the foreign key columns does not require a referenced row.
SET DEFAULT
Similar to SET NULL, the foreign key values in the referencing row are set to the column default when the
referenced row is updated or deleted.
Triggers
Referential actions are generally implemented as implied triggers (i.e. triggers with system-generated names, often
hidden.) As such, they are subject to the same limitations as user-defined triggers, and their order of execution
relative to other triggers may need to be considered; in some cases it may become necessary to replace the referential
action with its equivalent user-defined trigger to ensure proper execution order, or to work around mutating-table
limitations.
Another important limitation appears with transaction isolation: your changes to a row may not be able to fully
cascade because the row is referenced by data your transaction cannot "see", and therefore cannot cascade onto. An
example: while your transaction is attempting to renumber a customer account, a simultaneous transaction is
attempting to create a new invoice for that same customer; while a CASCADE rule may fix all the invoice rows your
transaction can see to keep them consistent with the renumbered customer row, it won't reach into another
transaction to fix the data there; because the database cannot guarantee consistent data when the two transactions
commit, one of them will be forced to rollback (often on a first-come-first-served basis.)
Example
As a first example to illustrate foreign keys, suppose an accounts database has a table with invoices and each invoice
is associated with a particular supplier. Supplier details (such as address or phone number) are kept in a separate
table; each supplier is given a 'supplier number' to identify it. Each invoice record has an attribute containing the
supplier number for that invoice. Then, the 'supplier number' is the primary key in the Supplier table. The foreign
key in the Invoices table points to that primary key. The relational schema is the following. Primary keys are marked
in bold, and foreign keys are marked in italics.
Supplier ( SupplierNumber, Name, Address, Type )
Invoices ( InvoiceNumber, SupplierNumber, Text )
The corresponding Data Definition Language statement is as follows.
CREATE TABLE Supplier (
SupplierNumber INTEGER NOT NULL,
Name VARCHAR(20) NOT NULL,
Address VARCHAR(50) NOT NULL,
Foreign key 4
Type VARCHAR(10),
CONSTRAINT supplier_pk PRIMARY KEY(SupplierNumber),
CONSTRAINT number_value CHECK (SupplierNumber > 0) )
CREATE TABLE Invoices (
InvoiceNumber INTEGER NOT NULL,
SupplierNumber INTEGER NOT NULL,
Text VARCHAR(4096),
CONSTRAINT invoice_pk PRIMARY KEY(InvoiceNumber),
CONSTRAINT inumber_value CHECK (InvoiceNumber > 0),
CONSTRAINT supplier_fk FOREIGN KEY(SupplierNumber)
REFERENCES Supplier(SupplierNumber)
ON UPDATE CASCADE ON DELETE RESTRICT )

Definition: 
The primary key of a relational table uniquely identifies each record in the table. It can either be a normal attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one record per person) or it can be generated by the DBMS (such as a globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a single attribute or multiple attributes in combination.
Examples:
Imagine we have a STUDENTS table that contains a record for each student at a university. The student's unique student ID number would be a good choice for a primary key in the STUDENTS table. The student's first and last name would not be a good choice, as there is always the chance that more than one student might have the same name. 

Primary Keys
The first type of key we’ll discuss is the primary key. Every database table should have one or more columns designated as the primary key. The value this key holds should be unique for each record in the database. For example, assume we have a table called Employees that contains personnel information for every employee in our firm. We’d need to select an appropriate primary key that would uniquely identify each employee. Your first thought might be to use the employee’s name. 

This wouldn’t work out very well because it’s conceivable that you’d hire two employees with the same name. A better choice might be to use a unique employee ID number that you assign to each employee when they’re hired. Some organizations choose to use Social Security Numbers (or similar government identifiers) for this task because each employee already has one and they’re guaranteed to be unique. However, the use of Social Security Numbers for this purpose is highly controversial due to privacy concerns. (If you work for a government organization, the use of a Social Security Number may even be illegal under the Privacy Act of 1974.) For this reason, most organizations have shifted to the use of unique identifiers (employee ID, student ID, etc.) that don’t share these privacy concerns. 

Once you decide upon a primary key and set it up in the database, the database management system will enforce the uniqueness of the key. If you try to insert a record into a table with a primary key that duplicates an existing record, the insert will fail. 

Most databases are also capable of generating their own primary keys. Microsoft Access, for example, may be configured to use the AutoNumber data type to assign a unique ID to each record in the table. While effective, this is a bad design practice because it leaves you with a meaningless value in each record in the table. Why not use that space to store something useful?

Foreign Keys

The other type of key that we’ll discuss in this course is the foreign key. These keys are used to create relationships between tables. Natural relationships exist between tables in most database structures. Returning to our employees database, let’s imagine that we wanted to add a table containing departmental information to the database. This new table might be called Departments and would contain a large amount of information about the department as a whole. We’d also want to include information about the employees in the department, but it would be redundant to have the same information in two tables (Employees and Departments). Instead, we can create a relationship between the two tables. 

Let’s assume that the Departments table uses the Department Name column as the primary key. To create a relationship between the two tables, we add a new column to the Employees table called Department. We then fill in the name of the department to which each employee belongs. We also inform the database management systemthat the Department column in the Employees table is a foreign key that references the Departments table. The database will then enforce referential integrity by ensuring that all of the values in the Departments column of the Employees table have corresponding entries in the Departments table. 

Note that there is no uniqueness constraint for a foreign key. We may (and most likely do!) have more than one employee belonging to a single department. Similarly, there’s no requirement that an entry in the Departments table have any corresponding entry in the Employees table. It is possible that we’d have a department with no employees.

Superkey
A superkey is defined in the relational model of database organization as a set of attributes of a relation variable for
which it holds that in all relations assigned to that variable, there are no two distinct tuples (rows) that have the same
values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar upon
which all attributes of the relvar are functionally dependent.
Note that if attribute set K is a superkey of relvar R, then at all times it is the case that the projection of R over K has
the same cardinality as R itself.
Informally, a superkey is a set of columns within a table whose values can be used to uniquely identify a row. A
candidate key is a minimal set of columns necessary to identify a row, this is also called a minimal superkey. For
example, given an employee table, consisting of the columns employeeID, name, job, and departmentID, we could
use the employeeID in combination with any or all other columns of this table to uniquely identify a row in the table.
Examples of superkeys in this table would be {employeeID, Name}, {employeeID, Name, job}, and {employeeID,
Name, job, departmentID}.
In a real database we do not need values for all of those columns to identify a row. We only need, per our example,
the set {employeeID}. This is a minimal superkey that is, a minimal set of columns that can be used to identify a
single row. So, employeeID is a candidate key.
Example
English Monarchs
Monarch Name Monarch Number Royal House
Edward II Plantagenet
Edward III Plantagenet
Richard III Plantagenet
Henry IV Lancaster

Unique key
In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the
Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can
have the same value (or combination of values) in those columns if NULL values are not used. Depending on its
design, a table may have arbitrarily many unique keys but at most one primary key.
Unique keys do not enforce the NOT NULL constraint in practice. Because NULL is not an actual value (it
represents the lack of a value), when two rows are compared, and both rows have NULL in a column, the column
values are not considered to be equal. Thus, in order for a unique key to uniquely identify each row in a table, NULL
values must not be used. However, a column defined as a unique key column allows only one NULL value, which in
turn can uniquely identify that row/tuple.
A unique key should uniquely identify all possible rows that exist in a table and not only the currently existing rows.
Examples of unique keys are Social Security numbers (associated with a specific person[1] ) or ISBNs (associated
with a specific book). Telephone books and dictionaries cannot use names, words, or Dewey Decimal system
numbers as candidate keys because they do not uniquely identify telephone numbers or words.
A table can have at most one primary key, but more than one unique key. A primary key is a combination of
columns which uniquely specify a row. It is a special case of unique keys. One difference is that primary keys have
an implicit NOT NULL constraint while unique keys do not. Thus, the values in unique key columns may or may not
be NULL, and in fact such a column may contain multiple NULL fields.[2] Another difference is that primary keys
must be defined using another syntax.
The relational model, as expressed through relational calculus and relational algebra, does not distinguish between
primary keys and other kinds of keys. Primary keys were added to the SQL standard mainly as a convenience to the
application programmer.
Unique keys as well as primary keys can be referenced by foreign keys.
Defining primary keys
Primary keys are defined in the ANSI SQL Standard, through the PRIMARY KEY constraint. The syntax to add
such a constraint to an existing table is defined in SQL:2003 like this:
ALTER TABLE <table identifier>
ADD [ CONSTRAINT <constraint identifier> ]
PRIMARY KEY ( <column expression> {, <column expression>}... )
The primary key can also be specified directly during table creation. In the SQL Standard, primary keys may consist
of one or multiple columns. Each column participating in the primary key is implicitly defined as NOT NULL. Note
that some DBMS require explicitly marking primary-key columns as NOT NULL.
CREATE TABLE table_name (
id_col INT,
col2 CHARACTER VARYING(20),
...
CONSTRAINT tab_pk PRIMARY KEY(id_col),
...
)
If the primary key consists only of a single column, the column can be marked as such using the following syntax:
Unique key 2
CREATE TABLE table_name (
id_col INT PRIMARY KEY,
col2 CHARACTER VARYING(20),
...
)
Defining unique keys
The definition of unique keys is syntactically very similar to primary keys.
ALTER TABLE <table identifier>
ADD [ CONSTRAINT <constraint identifier> ]
UNIQUE ( <column expression> {, <column expression>}... )
Likewise, unique keys can be defined as part of the CREATE TABLE SQL statement.
CREATE TABLE table_name (
id_col INT,
col2 CHARACTER VARYING(20),
key_col SMALLINT,
...
CONSTRAINT key_unique UNIQUE(key_col),
...
)
CREATE TABLE table_name (
id_col INT PRIMARY KEY,
col2 CHARACTER VARYING(20),
...
key_col SMALLINT UNIQUE,
...
)


Surrogate keys
In some design situations the natural key that uniquely identifies a tuple in a relation is difficult to use for software
development. For example, it may involve multiple columns or large text fields. A surrogate key can be used as the
primary key. In other situations there may be more than one candidate key for a relation, and no candidate key is
obviously preferred. A surrogate key may be used as the primary key to avoid giving one candidate key artificial
primacy over the others.
Since primary keys exist primarily as a convenience to the programmer, surrogate primary keys are often usedin
many cases exclusivelyin database application design.
Due to the popularity of surrogate primary keys, many developers and in some cases even theoreticians have come to
regard surrogate primary keys as an inalienable part of the relational data model. This is largely due to a migration of
principles from the Object-Oriented Programming model to the relational model, creating the hybrid
object-relational model. In the ORM, these additional restrictions are placed on primary keys:
• Primary keys should be immutable, that is, not changed until the record is destroyed.
• Primary keys should be anonymous integer or numeric identifiers.
Unique key 3
However, neither of these restrictions is part of the relational model or any SQL standard. Due diligence should be
applied when deciding on the immutability of primary key values during database and application design. Some
database systems even imply that values in primary key columns cannot be changed using the UPDATE SQL
statement.
Notes
[1] SSN uniqueness: Rare SSN duplicates do exist in the field, a condition that led to problems with early commercial computer systems that
relied on SSN uniqueness. Practitioners are taught that well-known duplications in SSN assignments occurred in the early days of the SSN
system. This situation points out the complexity of designing systems that assume unique keys in real-world data.
[2] MySQL 5.5 Reference Manual :: 12.1.14. CREATE TABLE Syntax (http:/ / dev. mysql. com/ doc/ refman/ 5. 5/ en/ create-table. html) "For
all engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL."
External links
• Relation Database terms of reference, Keys (http:/ / rdbms. opengrass. net/ 2_Database Design/ 2.
1_TermsOfReference/ 2. 1. 2_Keys. html): An overview of the different types of keys in an RDBMS