Normalization
Normalization
theory is built the concept of normal forms. Normalizations reduces redundancy.
Redundancy is unnecessary repetition of data. It can cause problems with
storage and retrieval of data. During the process of normalization, dependencies
can be identified, which can cause problems during deletion and updation.
Normalization theory is based on the fundamental notation of functional
dependency. Normalization helps in simplifying the structure of tables.
Suppose for an entity customer if attributes like cunno,
name, address are required, then it can be perceived that for a particular
cusno, only one name and address is possible. Hence the attribute name, address
are said to be functionally dependent on the attribute cusno. There are four
levels of normalization.
Need of Normalization
Normalization is the aim of well design Relational Database
Management System (RDBMS). It is step by step set of rules by which data is put
in its simplest forms. We normalize the relational database management system because
of the following reasons:
Minimize data redundancy i.e. no unnecessarily duplication
of data.
To make database structure flexible i.e. it should be
possible to add new data values and rows without reorganizing the database
structure.
Data should be consistent throughout the database i.e. it
should not suffer from following anomalies.
Insert Anomaly - Due to lack of data i.e., all the data
available for insertion such that null values in keys should be avoided. This
kind of anomaly can seriously damage a database
Update Anomaly - It is due to data redundancy i.e. multiple
occurrences of same values in a column. This can lead to inefficiency.
Deletion Anomaly - It leads to loss of data for rows that
are not stored else where. It could result in loss of vital data.
Complex queries required by the user should be easy to
handle.
On decomposition of a relation into smaller relations with
fewer attributes on normalization the resulting relations whenever joined must
result in the same relation without any extra rows. The join operations can be
performed in any order. This is known as Lossless Join decomposition.
The resulting relations (tables) obtained on normalization
should possess the properties such as each row must be identified by a unique
key, no repeating groups, homogenous columns, each column is assigned a unique
name etc.
ADVANTAGES OF NORMALIZATION
- More efficient data structure.
- Avoid redundant fields or columns.
- More flexible data structure i.e. we should be able to add new rows and data values easily
- Better understanding of data.
- Ensures that distinct tables exist when necessary.
- Easier to maintain data structure i.e. it is easy to perform operations and complex queries can be easily handled.
- Minimizes data duplication.
- Close modeling of real world entities, processes and their relationships.
DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
1. You
cannot start building the database before you know what the user needs.
2. On
Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
3.
It
is very time consuming and difficult process in normalizing relations of higher
degree.
4. Careless
decomposition may leads to bad design of database which may leads to serious
problems.
Normal forms-
First Normal Form
- Second Normal Form
- Third Normal Form
- Boyce-Codd Normal Form
- Fourth Normal Form
- Fifth Normal Form
First Normal Form (1st NF)
- The table cells must be of single value.
- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
- Identify each set of related data with a primary key.
Second Normal Form (2nd NF)
- Remove Partial Dependencies.
- Functional Dependency: The value of one attribute in a table is determined entirely by the value of another.
- Partial Dependency: A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key).
- Create separate table with the functionally dependent data and the part of the key on which it depends. Tables created at this step will usually contain descriptions of resources.
Third
Normal Form (3rd NF)
- Remove transitive dependencies.
- Transitive Dependency A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
- Create a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step will usually contain descriptions of either resources or agents. Keep a copy of the key attribute in the original file.
A relation is in third normal form, if it is in 2NF and every non-key attribute of the relation is non-transitively dependent on each candidate key of the relation.
Boyce-Codd
Normal Form (BCNF)
- When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.
- 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys
- i.e. composite candidate keys with at least one attribute in common.
- BCNF is based on the concept of a determinant.
- A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.
- A relation is in BCNF is, and only if, every determinant is a candidate key.
Fourth
Normal Form (4th NF)
An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:
An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:
- Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other.
- Fourth Normal Form applies to situations involving many-to-many relationships.
In relational databases, many-to-many relationships are expressed through cross-reference tables.
A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than one multi-valued dependency.
Fifth Normal Form (5th NF)
- A relation that has a join dependency cannot be decomposed by a projection into other relations without spurious results
- A relation is in 5NF when its information content cannot be reconstructed from several smaller relations i.e. from relations having fewer attributes than the original relation
A table is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it is in 4NF and it cannot have a lossless decomposition into any number of smaller tables.
Denormalization
Normalization
helps in reducing redundancy. But at times when information is required from
more than one table, at a faster rate, it is wiser to introduce some amount of
redundancy in the table. This intentional introduction of redundancy in to the
table is called Denormalization.
For example, if the query were used to display the names of
students and their marks obtained in each subject, then it would be advisable
to maintain information about the students. Otherwise, every time such a query
is made, a join will have to be carried out on the two tables. For the sake of
better performance some amount of redundancy has to be introduced.
The language widely used in relational database is
Structured Query Language, more popularly known as SQL. It is the standard
relational database access language. They can be roughly divided in to three
categories based on their functions namely Data Definition Language, Data
Manipulation Language and Data Control Language.