Database Management System
The related information when placed is an
organized form makes a database. In simple words, an organized collection of
related information is known as database. The organization of data/information
is necessary because unorganized information has no meaning. There are so many
examples of organized information, more precisely and the most common are, the
dictionary, the telephone directory, student record register, your own address
book and many more. In each of these the data is stored in some particular
order i.e. in an organized form.
Introduction to Database Management Systems
We tend to think of a
database as information stored on a computer that we can access and alter, but
the database is just information. It’s
a bit like a filing cabinet full of files in an office. Without procedures in
place, we wouldn’t know where to put the files in order to retrieve them when
we wanted them again. Without the forms, each file would be a random collection
of information, and even when we found it we wouldn’t be able to make sense of
it, or compare it to other files easily.
What is a Database Management System (DBMS)?
The database management system provides
ways to organize, store, retrieve and interact with the data in the database.
It consists of:
·
A
modeling language, used to define the database schema, or structure. Common
database structures are hierarchical, network, relational and object based.
Models differ in how they connect related information. The most widely used,
particularly in Web applications, is the relational database model, which we’ll
cover in more detail in another article.
·
A
database engine that manages the
database structure and optimizes the storage of data, whether that is fields,
records, files or objects, for a balance between quick retrieval and efficient
use of space.
·
A
database query language, such as SQL, that enables developers
to write programs that extract data from the database, present it to the user,
and save and store changes.
·
A
transaction mechanism that validates data entered against
allowed types before storing it, and also ensures multiple users cannot update
the same information simultaneously, potentially corrupting the data.
What Are Database Management Systems Used For?
Database
management systems are used to enable developers to create a database, fill it
with information and create ways to query and change that information without
having to worry about the technical aspects of data storage and retrieval.
Other features of database management systems include:
·
User access and security management systems
provide appropriate data access to multiple users while protecting sensitive
data.
·
Data backup to ensure consistent
availability of data.
·
Access logs, making it easier for a
database manager to see how the database is being used.
·
Rules enforcement to ensure only data of
the prescribed type is stored in each field, for example, date fields may be
set to only contain dates within a set range.
·
Formulas such as counting, averaging and
summing included in the DBMS make statistical analysis and representation of
the data simpler.
·
Performance monitoring and optimization
tools may also be included to allow the user to tweak the database settings for
speed and efficiency.
Many
Web applications rely on a DBMS, from search engines and article directories,
to social networks like Facebook and Twitter. Almost any site that offers a
user registration with personal logon details rather than a single shared
password will probably require a database, as will ecommerce systems, blogs and
collaborative sites such as Wikis and multiple user content management systems.
Relational
Database Management Systems
As we discussed in
our last article “An Introduction to Database Management Systems,” (DMS), one
of the key components of any DMS is the modeling language used to define the
database schema, or structure. Common
database structures are hierarchical, network, relational and object. Models
differ in how they connect related information. The most widely used,
particularly in Web applications, is the relational database model, which we’re
going to look at in more detail here.
What
is a Relational Database?
At
its simplest, a database is an electronic store of data, for example a
company’s sales records. In a file based system, each sale would have its own
file, where the recording clerk would have to enter every piece of required
information. That might include sales order number, items ordered and
quantities, customer name with billing and shipping address, contact name and
phone number, plus various other details. Obviously, entering all of these
details for every order in full would be both time consuming, and open to human
error, such as spelling mistakes. It would also mean if a detail, like a
customer phone number changed, it would have to be changed in multiple places.
Relational
databases are designed to make data entry and management simpler, quicker and
less prone to data corruption by storing information so that each piece of data
is stored only once, and referenced by other pieces of data when needed. This
is achieved by creating relations between data sets, hence the term relational
database.
To
see how some of the key principles of relational databases work, let’s take our
example order records through the process of database development.
Tables,
Rows and Columns
The
basic unit of a relational database is a table, also referred to as a relation
or base relvar. Within a table, each column represents an attribute, and each
row an entry. So for example, our order records might consists of columns
labeled “Order Number, Customer Name, Customer Contact, Contact Phone Number,
Order Date, Delivery Date, Shipping Address, Billing Address, Item Ordered,
Qty, Price,” and so on. Each row in this table would then represent an order.
Details might be entered in a table, much like a spread sheet, and the
resulting flat file database would contain the output for all the files, and
all of their details. Although this might prevent a lot of user errors by
constraining the data input, each piece of information might appear multiple
times. For example, if an order contains multiple items, the whole line might
be duplicated except for the item ordered columns. Clearly, this is not an
efficient use of storage space, and updating data, though quicker than
searching multiple files, would still be time consuming.
Normalizing
The
process of normalizing a relational database consists of separating data into
smaller, more closely related information. In this example, initially, we might
separate the customer information from the order information so that all the
customer details are stored in one table, and a single reference, Customer
Number, included in the order table. Through a series of logical iterations,
each designed to prevent extensive data duplication, we might end up with
several tables. For example, we might have a list of customers, a list of
contacts, a list of addresses, a list of orders and a list of products.
Keys and Primary Keys
Having
separated all these types of information, we need a way to identify records
within each table in order to reference them in other tables. Since the main
reason for maintaining smaller tables is to ensure integrity of the data and
avoid duplication, we also need to specify which column within a table must be
unique. We do this by assigning a column within a table as a “key.” With some
tables, such as orders, there are “natural keys,” like order numbers, which
have to be unique for the system to work anyway. For other tables, such as
contacts, these natural keys do not exist. We may have multiple contacts with
the same name, so we can’t use them as a key. Instead, we assign a computer
generated “primary key,” that creates an artificial, unique column in the table
and assigns each row an incrementally increasing number.
Relationships
Now
that we have a way to identify the entries in a table, we can pull the
information from the table by simply referencing that column in another table.
This is done by “joining” tables, and creating resulting relationships, which
may be one to one, meaning each unique entry in one table must also be unique
in the other, one to many, where an entry must be unique in one table but may
be duplicated in the other, or many to many, where entries need not be unique
in either table. So for example, we might join the customer table primary key
to the order table customer number column by instructing the database to
INNER
JOIN Customer_Id
ON Customers.Customer_Id=Orders.Customer_Id
Database Languages
A DBMS must provide appropriate languages and
interfaces for each category of users to express database queries and updates.
Database Languages are used to create and maintain
database on computer.
Data
Definition Language (DDL)
It is a language that allows
the users to define data and their relationship to other types of data. It is
mainly used to create files, databases, data dictionary and tables within
databases.
It is also used to specify
the structure of each table, set of associated values with each attribute,
integrity constraints, security and authorization information for each table and physical
storage structure of each table on disk.
Data
Manipulation Language (DML)
It is a language that
provides a set of operations to support the basic data manipulation operations
on the data held in the databases. It allows users to insert, update, delete
and retrieve data from the database. The part of DML that involves data
retrieval is called a query language.
Data Control
Language (DCL)
DCL statements control access
to data and the database using statements such as GRANT and REVOKE. A privilege
can either be granted to a User with the help of GRANT statement. The
privileges assigned can be SELECT, ALTER, DELETE, EXECUTE, INSERT, INDEX etc.
In addition to granting of privileges, you can also revoke (taken back) it by
using REVOKE command.
DBMS Architecture
DBMS architecture is the way in which
the data in a database is viewed (or represented to) by users. It helps you
represent your data in an understandable way to the users, by hiding the
complex bits that deal with the working of the system. Remember, DBMS
architecture is not about how the DBMS software operates or how it
processes data.
We’re going to take a look at the
ANSI-SPARC DBMS standard model. ANSI is the acronym for American National
Standards Institute. It sets standards for American goods so that they can be
used anywhere in the world without compatibility problems. In the case of DBMS
software, ANSI has standardized SQL, so that most DBMS products use SQL as the
main query language. The ANSI has also standardized a three level DBMS
architecture model followed by most database systems, and it’s known as the
abstract ANSI-SPARC design standard.
The ANSI-SPARC Database Architectture
is set up into three tiers. Let’s take a closer look at them.
The Internal Level (Physical Representation of Data)
The internal level is the lowest level
in a three tiered database. This level deals with how the stored data on the
database is represented to the user. This level shows exactly how the data is
stored and organized for access on your system. This is the most technical of
the three levels. However, the internal level view is still abstract –even
if it shows how the data is stored physically, it will not show how the
database software operates on it.
So how exactly is data stored on this
level? There are several considerations to be made when storing data. Some of
them include figuring out the right space allocation techniques, data
compression techniques (if necessary), security and encryption and the access
paths the software can take to retrieve the data. Most DBMS software products
make sure that data access is optimized and that data uses minimum storage
space. The OS you’re running is actually in charge of managing the physical
storage space.
The Conceptual Level (Holistic Representation of Data)
The conceptual level tells you how the
database was structured logically. This level tells you about the relationship
between the data members of your database, exactly what data is stored in it
and what a user will need to use the database. This level does not concern
itself with how this logical structure will actually be implemented. It’s
actually an overview of your database.
The conceptual level acts as a sort of
a buffer between the internal level and the external level. It helps hide the
complexity of the database and hides how the data is physically stored in it.
The database administrator will have to
be conversant with this layer, because most of his operations are carried out
on it. Only a database administrator is allowed to modify or structure this
level. It provides a global view of the database, as well as the hardware and
software necessary for running it .
The External Level (User Representation of Data)
This is the uppermost level in the
database. It implements the concept of abstraction as much as possible. This
level is also known as the view level because it deals with how a user views
your database. The external level is what allows a user to access a customized
version of the data in your database. Multiple users can work on a database on
the same time because of it.
The external level also hides the
working of the database from your users. It maintains the security of the
database by giving users access only to the data which they need at a
particular time. Any data that is not needed will not be displayed.
Three “schemas” (internal, conceptual
and external) show how the database is internally and externally structured,
and so this type of database architecture is also known as the “three-schema”
architecture.
Advantages of the Three Tiered ANSI-SPARC Architecture
The ANSI-SPARC standard architecture is
three tiered- that is, there are three levels through which it is represented.
This three tiered representation offers several advantages, which as follows:
- It
makes the database abstract. It hides the details of how the data is
stored physically in an electronic system, which makes it easier to
understand and easier to use for an average user. It also allows the user
to concentrate on the data rather than worrying about how it should be
stored.
- It
lets users view the same data, but it makes the data customizable to fit
each user. Each user should be able to access a source of data, and any
change made to their customized data source should not affect data sources
being used by other users. This means that the data in your database is
independent.
- The
three tiered architecture model also allows migration to another system to
be seamless. The database appears the same on different systems, even if
the physical storage is changed. Its structure remains the same, which
makes it portable.
- The
model also allows a database admin to make changes to the database
structure or make upgrades to it without disturbing a user currently on
the system.
- The
model allows a database admin to change the storage medium of the database
without disturbing a user who is currently on the system.
What
are flat file and relational databases?
A database is a
collection of data, which is organized into files called tables. These tables
provide a systematic way of accessing, managing, and updating data. A
relational database is one that contains multiple tables of data that relate to
each other through special key fields. Relational databases are far more
flexible (though harder to design and maintain) than what are known as flat
file databases, which contain a single table of data.
To understand the
advantages of a relational database, imagine the needs of two small companies
that take customer orders for their products. Company A uses a flat file
database with a single table named orders to record orders they
receive, while Company B uses a relational database with two tables: orders and customers.
When a customer places
an order with Company A, a new record (or row) in the tableorders is created. Because Company A has only one
table of data, all the information pertaining to that order must be put into a
single record. This means that the customer's general information, such as name
and address, is stored in the same record as the order information, such as
product description, quantity, and price. If customers place more than one
order, their general information will need to be re-entered and thus duplicated
for each order they place.
Whenever there is
duplicate data, as in the case above, many inconsistencies may arise when users
try to query the database. Additionally, a customer's change of address would
require the database manager to find all records in orders that the customer placed, and change the
address data for each one.
Company B is much
better off with its relational database. Each of its customers has one and only
one record of general information stored in the table customers. Each customer's record is identified by a
unique customer code which will serve as the relational key. When a customer
orders from Company B, the record in orders need contain only a
reference to the customer's code, because all of the customer's general
information is already stored in customers.
This approach to
entering data solves the problems of duplicate data and making changes to
customer information. The database manager need change only one record in customers if someone changes addresses.
Entity Types
The Entiy Relationship (ER) model consists of different
types of entities. The existence of an entity may depends on the existence of
one or more other entities, such an entity is said to be existence
dependent.Entities whose existence not depending on any other entities is
termed as not existence dependent.
Entities based on their characteristics are classified
as follows.
- Strong Entities
- Weak Entities
- Recursive Entities
- Composite Entities
Strong Entity Vs Weak Entity
An entity set that has a primary key is termed as strong entity set. An
entity set that does not have sufficient attributes to form a primary key is
termed as a weak entity set.
A weak entity is existence dependent. That is the existence
of a weak entity depends on the existence of an identifying entity set.
The discriminator (or partial key) is used to identify other
attributes of a weak entity set.
The primary key of a weak entity set is formed by primary
key of identifying entity set and the discriminator of weak entity set.
The existence of a weak entity is indicated by a double
rectangle in the ER diagram.
We underline the discriminator of a weak entity set with a
dashed line in the ER diagram.
Recursive Entity
A recursive entity is one in which a relation can exist between occurrences of
the same entity set. This occurs in a unary relationship.
Composite Entities
If a Many to Many relationship exist we must create a bridge
entity to convert it into 1 to Many. Bridge entity composed of the primary keys
of each of the entities to be connected. The bridge entity is known as a
composite entity. A composite entity is represented by a diamond shape with in
a rectangle in an ER Diagram.
Normalization is the process
of removing redundant data from your tables in order to improve storage
efficiency, data integrity and scalability. This improvement is balanced
against an increase in complexity and potential performance losses from the
joining of the normalized tables at query-time. There are two goals of the
normalization process: eliminating redundant data (for example, storing the
same data in more than one table) and ensuring data dependencies make sense
(only storing related data in a table). Both of these are worthy goals as they
reduce the amount of space a database consumes and ensure that data is logically
stored.
WHY WE NEED NORMALIZATION?
Normalization is the aim of
well design Relational Database Management System (RDBMS). It is step by step
set of rules by which data is put in its simplest forms. We normalize the
relational database management system because of the following reasons:
· Minimize data redundancy i.e. no unnecessarily duplication of data.
· To make database structure flexible i.e. it should be possible to
add new data values and rows without reorganizing the database structure.
·
Data should be consistent throughout the database i.e. it should
not suffer from following anomalies.
- · Insert Anomaly - Due to lack of data i.e., all the data available for insertion such that null values in keys should be avoided. This kind of anomaly can seriously damage a database
- · Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same values in a column. This can lead to inefficiency.
- · Deletion Anomaly - It leads to loss of data for rows that are not stored elsewhere. It could result in loss of vital data.
·
Complex queries required by the user should be easy to handle.
·
On decomposition of a relation into smaller relations with fewer
attributes on normalization the resulting relations whenever joined must result
in the same relation without any extra rows. The join operations can be
performed in any order. This is known as Lossless Join decomposition.
·
The resulting relations (tables) obtained on normalization should
possess the properties such as each row must be identified by a unique key, no
repeating groups, homogenous columns, each column is assigned a unique name
etc.
ADVANTAGES
OF NORMALIZATION
The
following are the advantages of the normalization.
·
More efficient data structure.
·
Avoid redundant fields or columns.
·
More flexible data structure i.e. we should be able to add new
rows and data values easily
·
Better understanding of data.
·
Ensures that distinct tables exist when necessary.
ü
Easier to maintain data structure i.e. it is easy to perform
operations and complex queries can be easily handled.
ü
Minimizes data duplication.
ü
Close modeling of real world entities, processes and their
relationships.
DISADVANTAGES
OF NORMALIZATION
The
following are disadvantages of normalization.
ü
You cannot start building the database before you know what the
user needs.
ü
On Normalizing the relations to higher normal forms i.e. 4NF, 5NF
the performance degrades.
ü
It is very time consuming and difficult process in normalizing
relations of higher degree.
ü
Careless decomposition may leads to bad design of database which
may leads to serious problems.
How many
normal forms are there?
They are
·
First Normal Form
·
Second Normal Form
·
Third Normal Form
·
Boyce-Codd Normal Form
·
Fourth Normal Form
·
Fifth Normal Form
·
Sixth or Domain-key Normal form
No comments:
Post a Comment