Normalization
Normalization
theory is built the concept of normal forms. Normalizations reduces redundancy.
Redundancy is unnecessary repetition of data. It can cause problems with
storage and retrieval of data. During the process of normalization, dependencies
can be identified, which can cause problems during deletion and updation.
Normalization theory is based on the fundamental notation of functional
dependency. Normalization helps in simplifying the structure of tables.
Suppose for an entity customer if attributes like cunno,
name, address are required, then it can be perceived that for a particular
cusno, only one name and address is possible. Hence the attribute name, address
are said to be functionally dependent on the attribute cusno. There are four
levels of normalization.
Need of Normalization
Normalization is the aim of well design Relational Database
Management System (RDBMS). It is step by step set of rules by which data is put
in its simplest forms. We normalize the relational database management system because
of the following reasons:
Minimize data redundancy i.e. no unnecessarily duplication
of data.
To make database structure flexible i.e. it should be
possible to add new data values and rows without reorganizing the database
structure.
Data should be consistent throughout the database i.e. it
should not suffer from following anomalies.
Insert Anomaly - Due to lack of data i.e., all the data
available for insertion such that null values in keys should be avoided. This
kind of anomaly can seriously damage a database
Update Anomaly - It is due to data redundancy i.e. multiple
occurrences of same values in a column. This can lead to inefficiency.
Deletion Anomaly - It leads to loss of data for rows that
are not stored else where. It could result in loss of vital data.
Complex queries required by the user should be easy to
handle.
On decomposition of a relation into smaller relations with
fewer attributes on normalization the resulting relations whenever joined must
result in the same relation without any extra rows. The join operations can be
performed in any order. This is known as Lossless Join decomposition.
The resulting relations (tables) obtained on normalization
should possess the properties such as each row must be identified by a unique
key, no repeating groups, homogenous columns, each column is assigned a unique
name etc.
ADVANTAGES OF NORMALIZATION
- More
efficient data structure.
- Avoid
redundant fields or columns.
- More
flexible data structure i.e. we should be able to add new rows and data
values easily
- Better
understanding of data.
- Ensures
that distinct tables exist when necessary.
-
Easier to maintain data structure i.e. it is easy to perform
operations and complex queries can be easily handled.
-
Minimizes data duplication.
- Close modeling of real world entities, processes and their
relationships.
DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
1. You
cannot start building the database before you know what the user needs.
2. On
Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
3.
It
is very time consuming and difficult process in normalizing relations of higher
degree.
4. Careless
decomposition may leads to bad design of database which may leads to serious
problems.
Normal forms-
First Normal Form
- Second Normal Form
- Third Normal Form
- Boyce-Codd Normal Form
- Fourth Normal Form
- Fifth Normal Form
First Normal Form (1st NF)
- The table cells must be of single value.
- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
- Identify each set of related data with a primary key.
Second Normal Form (2nd NF)
- Remove Partial
Dependencies.
- Functional
Dependency: The value of one attribute in a table is determined entirely
by the value of another.
- Partial
Dependency: A type of functional dependency where an attribute is
functionally dependent on only part of the primary key (primary key must
be a composite key).
- Create separate
table with the functionally dependent data and the part of the key on
which it depends. Tables created at this step will usually contain
descriptions of resources.
A relation is in 2NF if it is in 1NF and every non-key attribute is fully dependent on each candidate key of the relation.
Third
Normal Form (3rd NF)
- Remove
transitive dependencies.
- Transitive
Dependency A type of functional dependency where an attribute is
functionally dependent on an attribute other than the primary key. Thus
its value is only indirectly determined by the primary key.
- Create a
separate table containing the attribute and the fields that are
functionally dependent on it. Tables created at this step will usually
contain descriptions of either resources or agents. Keep a copy of the key
attribute in the original file.
A
relation is in third normal form, if it is in 2NF and every
non-key attribute of the relation is non-transitively dependent on each
candidate key of the relation.
Boyce-Codd
Normal Form (BCNF)
- When a relation
has more than one candidate key, anomalies may result even though the
relation is in 3NF.
- 3NF does not
deal satisfactorily with the case of a relation with overlapping candidate
keys
- i.e. composite
candidate keys with at least one attribute in common.
- BCNF is based on
the concept of a determinant.
- A determinant is
any attribute (simple or composite) on which some other attribute is fully
functionally dependent.
- A relation is in
BCNF is, and only if, every determinant is a candidate key.
Fourth
Normal Form (4th NF)
An entity is in Fourth Normal Form (4NF) when it meets the requirement of being
in Third Normal Form (3NF) and additionally:
- Has no multiple
sets of multi-valued dependencies. In other words, 4NF states that no
entity can have more than a single one-to-many relationship within an
entity if the one-to-many attributes are independent of each other.
- Fourth Normal
Form applies to situations involving many-to-many relationships.
In
relational databases, many-to-many relationships are expressed through
cross-reference tables.
A table is in fourth normal form (4NF) if and only if it is in BCNF and
contains no more than one multi-valued dependency.
Fifth Normal Form (5th NF)
- A relation that
has a join dependency cannot be decomposed by a projection into other
relations without spurious results
- A relation is in
5NF when its information content cannot be reconstructed from several
smaller relations i.e. from relations having fewer attributes than the
original relation
A
table is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it is
in 4NF and it cannot have a lossless decomposition into any number of smaller
tables.
Denormalization
Normalization
helps in reducing redundancy. But at times when information is required from
more than one table, at a faster rate, it is wiser to introduce some amount of
redundancy in the table. This intentional introduction of redundancy in to the
table is called Denormalization.
For example, if the query were used to display the names of
students and their marks obtained in each subject, then it would be advisable
to maintain information about the students. Otherwise, every time such a query
is made, a join will have to be carried out on the two tables. For the sake of
better performance some amount of redundancy has to be introduced.
The language widely used in relational database is
Structured Query Language, more popularly known as SQL. It is the standard
relational database access language. They can be roughly divided in to three
categories based on their functions namely Data Definition Language, Data
Manipulation Language and Data Control Language.