Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
- Eliminating reduntant(useless) data.
- Ensuring data dependencies make sense i.e data is logically stored.
The video below will give you a good overview of Database Normalization. If you want you can skip the video, as the concept is covered in detail, below the video.
Problems Without Normalization
If a table is not properly normalized and have data redundancy then it will not only eat up extra memory space but will also make it difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anamolies are very frequent if database is not normalized. To understand these anomalies let us take an example of a Student table.
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields , (Head of Department) and is repeated for the students who are in the same branch in the college, this is Data Redundancy.
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.
In our Student table, two different informations are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is Deletion anomaly.
Normalization rules are divided into the following normal forms:
- First Normal Form
- Second Normal Form
- Third Normal Form
First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4 rules:
- It should only have single(atomic) valued attributes/columns.
- Values stored in a column should be of the same domain
- All the columns in a table should have unique names.
- And the order in which data is stored, does not matter.
In the next tutorial, we will discuss about the First Normal Form in details.
Second Normal Form (2NF)
For a table to be in the Second Normal Form,
- It should be in the First Normal form.
- And, it should not have Partial Dependency.
To understand what is Partial Dependency and how to normalize a table to 2nd normal for, jump to the Second Normal Form tutorial.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
- It is in the Second Normal form.
- And, it doesn't have Transitive Dependency.
Here is the Third Normal Form tutorial. But we suggest you to first study about the second normal form and then head over to the third normal form.
Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:
- R must be in 3rd Normal Form
- and, for each functional dependency ( X → Y ), X should be a super Key.
What is Normalization?
Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data.
It divides larger tables to smaller tables and links them using relationships.
In this tutorial, you will learn-
The inventor of the relational model Edgar Codd proposed the theory of normalization with the introduction of First Normal Form, and he continued to extend theory with Second and Third Normal Form. Later he joined with Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
Theory of Data Normalization in SQL is still being developed further. For example, there are discussions even on 6th Normal Form. However, in most practical applications, normalization achieves its best in 3rd Normal Form. The evolution of Normalization theories is illustrated below-
Database Normalization Examples -
Assume a video library maintains a database of movies rented out. Without any normalization, all information is stored in one table as shown below.
Here you see Movies Rented column has multiple values.
Database Normal Forms
Now let's move into 1st Normal Forms
1NF (First Normal Form) Rules
- Each table cell should contain a single value.
- Each record needs to be unique.
The above table in 1NF-
Before we proceed let's understand a few things --
What is a KEY?
A KEY is a value used to identify a record in a table uniquely. A KEY could be a single column or combination of multiple columns
Note: Columns in a table that are NOT used to identify a record uniquely are called non-key columns.
What is a Primary Key?
A primary is a single column value used to identify a database record uniquely.
It has following attributes
What is Composite Key?
A composite key is a primary key composed of multiple columns used to identify a record uniquely
In our database, we have two people with the same name Robert Phil, but they live in different places.
Hence, we require both Full Name and Address to identify a record uniquely. That is a composite key.
Let's move into second normal form 2NF
2NF (Second Normal Form) Rules
- Rule 1- Be in 1NF
- Rule 2- Single Column Primary Key
It is clear that we can't move forward to make our simple database in 2nd Normalization form unless we partition the table above.
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information. Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for table 1. Records can be uniquely identified in Table 1 using membership id
Database - Foreign Key
In Table 2, Membership_ID is the Foreign Key
| Foreign Key references the primary key of another Table! It helps connect your Tables |
Why do you need a foreign key?
Suppose an idiot inserts a record in Table B such as
You will only be able to insert values into your foreign key that exist in the unique key in the parent table. This helps in referential integrity.
The above problem can be overcome by declaring membership id from Table2 as foreign key of membership id from Table1
Now, if somebody tries to insert a value in the membership id field that does not exist in the parent table, an error will be shown!
What are transitive functional dependencies?
A transitive functional dependency is when changing a non-key column, might cause any of the other non-key columns to change
Consider the table 1. Changing the non-key column Full Name may change Salutation.
Let's move into 3NF
3NF (Third Normal Form) Rules
- Rule 1- Be in 2NF
- Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 3
Now our little example is at a level that cannot further be decomposed to attain higher forms of normalization. In fact, it is already in higher normalization forms. Separate efforts for moving into next levels of normalizing data are normally needed in complex databases. However, we will be discussing next levels of normalizations in brief in the following.
Boyce-Codd Normal Form (BCNF)
Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.
Sometimes is BCNF is also referred as 3.5 Normal Form.
4NF (Fourth Normal Form) Rules
If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4th Normal Form.
5NF (Fifth Normal Form) Rules
A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.
6NF (Sixth Normal Form) Proposed
6th Normal Form is not standardized, yet however, it is being discussed by database experts for some time. Hopefully, we would have a clear & standardized definition for 6th Normal Form in the near future...
That's all to Normalization!!!
- Database designing is critical to the successful implementation of a database management system that meets the data requirements of an enterprise system.
- Normalization helps produce database systems that are cost-effective and have better security models.
- Functional dependencies are a very important component of the normalize data process
- Most database systems are normalized database up to the third normal forms.
- A primary uniquely identifies are record in a Table and cannot be null
- A foreign key helps connect table and references a primary key