Database Management System

The related information when placed is an organized form makes a database. In simple words, an organized collection of related information is known as database. The organization of data/information is necessary because unorganized information has no meaning. There are so many examples of organized information, more precisely and the most common are, the dictionary, the telephone directory, student record register, your own address book and many more. In each of these the data is stored in some particular order i.e. in an organized form.

Introduction to Database Management Systems

We tend to think of a database as information stored on a computer that we can access and alter, but the database is just information. It’s a bit like a filing cabinet full of files in an office. Without procedures in place, we wouldn’t know where to put the files in order to retrieve them when we wanted them again. Without the forms, each file would be a random collection of information, and even when we found it we wouldn’t be able to make sense of it, or compare it to other files easily.

What is a Database Management System (DBMS)?

The database management system provides ways to organize, store, retrieve and interact with the data in the database. It consists of:

· A modeling language, used to define the database schema, or structure. Common database structures are hierarchical, network, relational and object based. Models differ in how they connect related information. The most widely used, particularly in Web applications, is the relational database model, which we’ll cover in more detail in another article.

· A database engine that manages the database structure and optimizes the storage of data, whether that is fields, records, files or objects, for a balance between quick retrieval and efficient use of space.

· A database query language, such as SQL, that enables developers to write programs that extract data from the database, present it to the user, and save and store changes.

· A transaction mechanism that validates data entered against allowed types before storing it, and also ensures multiple users cannot update the same information simultaneously, potentially corrupting the data.

What Are Database Management Systems Used For?

Database management systems are used to enable developers to create a database, fill it with information and create ways to query and change that information without having to worry about the technical aspects of data storage and retrieval. Other features of database management systems include:

· User access and security management systems provide appropriate data access to multiple users while protecting sensitive data.

· Data backup to ensure consistent availability of data.

· Access logs, making it easier for a database manager to see how the database is being used.

· Rules enforcement to ensure only data of the prescribed type is stored in each field, for example, date fields may be set to only contain dates within a set range.

· Formulas such as counting, averaging and summing included in the DBMS make statistical analysis and representation of the data simpler.

· Performance monitoring and optimization tools may also be included to allow the user to tweak the database settings for speed and efficiency.

Many Web applications rely on a DBMS, from search engines and article directories, to social networks like Facebook and Twitter. Almost any site that offers a user registration with personal logon details rather than a single shared password will probably require a database, as will ecommerce systems, blogs and collaborative sites such as Wikis and multiple user content management systems.

Relational Database Management Systems

As we discussed in our last article “An Introduction to Database Management Systems,” (DMS), one of the key components of any DMS is the modeling language used to define the database schema, or structure. Common database structures are hierarchical, network, relational and object. Models differ in how they connect related information. The most widely used, particularly in Web applications, is the relational database model, which we’re going to look at in more detail here.

What is a Relational Database?

At its simplest, a database is an electronic store of data, for example a company’s sales records. In a file based system, each sale would have its own file, where the recording clerk would have to enter every piece of required information. That might include sales order number, items ordered and quantities, customer name with billing and shipping address, contact name and phone number, plus various other details. Obviously, entering all of these details for every order in full would be both time consuming, and open to human error, such as spelling mistakes. It would also mean if a detail, like a customer phone number changed, it would have to be changed in multiple places.

Relational databases are designed to make data entry and management simpler, quicker and less prone to data corruption by storing information so that each piece of data is stored only once, and referenced by other pieces of data when needed. This is achieved by creating relations between data sets, hence the term relational database.

To see how some of the key principles of relational databases work, let’s take our example order records through the process of database development.

Tables, Rows and Columns

The basic unit of a relational database is a table, also referred to as a relation or base relvar. Within a table, each column represents an attribute, and each row an entry. So for example, our order records might consists of columns labeled “Order Number, Customer Name, Customer Contact, Contact Phone Number, Order Date, Delivery Date, Shipping Address, Billing Address, Item Ordered, Qty, Price,” and so on. Each row in this table would then represent an order. Details might be entered in a table, much like a spread sheet, and the resulting flat file database would contain the output for all the files, and all of their details. Although this might prevent a lot of user errors by constraining the data input, each piece of information might appear multiple times. For example, if an order contains multiple items, the whole line might be duplicated except for the item ordered columns. Clearly, this is not an efficient use of storage space, and updating data, though quicker than searching multiple files, would still be time consuming.

Normalizing

The process of normalizing a relational database consists of separating data into smaller, more closely related information. In this example, initially, we might separate the customer information from the order information so that all the customer details are stored in one table, and a single reference, Customer Number, included in the order table. Through a series of logical iterations, each designed to prevent extensive data duplication, we might end up with several tables. For example, we might have a list of customers, a list of contacts, a list of addresses, a list of orders and a list of products.

Keys and Primary Keys

Having separated all these types of information, we need a way to identify records within each table in order to reference them in other tables. Since the main reason for maintaining smaller tables is to ensure integrity of the data and avoid duplication, we also need to specify which column within a table must be unique. We do this by assigning a column within a table as a “key.” With some tables, such as orders, there are “natural keys,” like order numbers, which have to be unique for the system to work anyway. For other tables, such as contacts, these natural keys do not exist. We may have multiple contacts with the same name, so we can’t use them as a key. Instead, we assign a computer generated “primary key,” that creates an artificial, unique column in the table and assigns each row an incrementally increasing number.

Relationships

Now that we have a way to identify the entries in a table, we can pull the information from the table by simply referencing that column in another table. This is done by “joining” tables, and creating resulting relationships, which may be one to one, meaning each unique entry in one table must also be unique in the other, one to many, where an entry must be unique in one table but may be duplicated in the other, or many to many, where entries need not be unique in either table. So for example, we might join the customer table primary key to the order table customer number column by instructing the database to

INNER JOIN Customer_Id

ON Customers.Customer_Id=Orders.Customer_Id

Database Languages

A DBMS must provide appropriate languages and interfaces for each category of users to express database queries and updates. Database Languages are used to create and maintain database on computer.

Data Definition Language (DDL)

It is a language that allows the users to define data and their relationship to other types of data. It is mainly used to create files, databases, data dictionary and tables within databases.

It is also used to specify the structure of each table, set of associated values with each attribute, integrity constraints, security and authorization information for each table and physical storage structure of each table on disk.

Data Manipulation Language (DML)

It is a language that provides a set of operations to support the basic data manipulation operations on the data held in the databases. It allows users to insert, update, delete and retrieve data from the database. The part of DML that involves data retrieval is called a query language.

Data Control Language (DCL)

DCL statements control access to data and the database using statements such as GRANT and REVOKE. A privilege can either be granted to a User with the help of GRANT statement. The privileges assigned can be SELECT, ALTER, DELETE, EXECUTE, INSERT, INDEX etc. In addition to granting of privileges, you can also revoke (taken back) it by using REVOKE command.

DBMS Architecture

DBMS architecture is the way in which the data in a database is viewed (or represented to) by users. It helps you represent your data in an understandable way to the users, by hiding the complex bits that deal with the working of the system. Remember, DBMS architecture is not about how the DBMS software operates or how it processes data.

We’re going to take a look at the ANSI-SPARC DBMS standard model. ANSI is the acronym for American National Standards Institute. It sets standards for American goods so that they can be used anywhere in the world without compatibility problems. In the case of DBMS software, ANSI has standardized SQL, so that most DBMS products use SQL as the main query language. The ANSI has also standardized a three level DBMS architecture model followed by most database systems, and it’s known as the abstract ANSI-SPARC design standard.

The ANSI-SPARC Database Architectture is set up into three tiers. Let’s take a closer look at them.

The Internal Level (Physical Representation of Data)

The internal level is the lowest level in a three tiered database. This level deals with how the stored data on the database is represented to the user. This level shows exactly how the data is stored and organized for access on your system. This is the most technical of the three levels. However, the internal level view is still abstract –even if it shows how the data is stored physically, it will not show how the database software operates on it.

So how exactly is data stored on this level? There are several considerations to be made when storing data. Some of them include figuring out the right space allocation techniques, data compression techniques (if necessary), security and encryption and the access paths the software can take to retrieve the data. Most DBMS software products make sure that data access is optimized and that data uses minimum storage space. The OS you’re running is actually in charge of managing the physical storage space.

The Conceptual Level (Holistic Representation of Data)

The conceptual level tells you how the database was structured logically. This level tells you about the relationship between the data members of your database, exactly what data is stored in it and what a user will need to use the database. This level does not concern itself with how this logical structure will actually be implemented. It’s actually an overview of your database.

The conceptual level acts as a sort of a buffer between the internal level and the external level. It helps hide the complexity of the database and hides how the data is physically stored in it.

The database administrator will have to be conversant with this layer, because most of his operations are carried out on it. Only a database administrator is allowed to modify or structure this level. It provides a global view of the database, as well as the hardware and software necessary for running it .

The External Level (User Representation of Data)

This is the uppermost level in the database. It implements the concept of abstraction as much as possible. This level is also known as the view level because it deals with how a user views your database. The external level is what allows a user to access a customized version of the data in your database. Multiple users can work on a database on the same time because of it.

The external level also hides the working of the database from your users. It maintains the security of the database by giving users access only to the data which they need at a particular time. Any data that is not needed will not be displayed.

Three “schemas” (internal, conceptual and external) show how the database is internally and externally structured, and so this type of database architecture is also known as the “three-schema” architecture.

Advantages of the Three Tiered ANSI-SPARC Architecture

The ANSI-SPARC standard architecture is three tiered- that is, there are three levels through which it is represented. This three tiered representation offers several advantages, which as follows:

It makes the database abstract. It hides the details of how the data is stored physically in an electronic system, which makes it easier to understand and easier to use for an average user. It also allows the user to concentrate on the data rather than worrying about how it should be stored.
It lets users view the same data, but it makes the data customizable to fit each user. Each user should be able to access a source of data, and any change made to their customized data source should not affect data sources being used by other users. This means that the data in your database is independent.
The three tiered architecture model also allows migration to another system to be seamless. The database appears the same on different systems, even if the physical storage is changed. Its structure remains the same, which makes it portable.
The model also allows a database admin to make changes to the database structure or make upgrades to it without disturbing a user currently on the system.
The model allows a database admin to change the storage medium of the database without disturbing a user who is currently on the system.

What are flat file and relational databases?

A database is a collection of data, which is organized into files called tables. These tables provide a systematic way of accessing, managing, and updating data. A relational database is one that contains multiple tables of data that relate to each other through special key fields. Relational databases are far more flexible (though harder to design and maintain) than what are known as flat file databases, which contain a single table of data.

To understand the advantages of a relational database, imagine the needs of two small companies that take customer orders for their products. Company A uses a flat file database with a single table named orders to record orders they receive, while Company B uses a relational database with two tables: orders and customers.

When a customer places an order with Company A, a new record (or row) in the tableorders is created. Because Company A has only one table of data, all the information pertaining to that order must be put into a single record. This means that the customer's general information, such as name and address, is stored in the same record as the order information, such as product description, quantity, and price. If customers place more than one order, their general information will need to be re-entered and thus duplicated for each order they place.

Whenever there is duplicate data, as in the case above, many inconsistencies may arise when users try to query the database. Additionally, a customer's change of address would require the database manager to find all records in orders that the customer placed, and change the address data for each one.

Company B is much better off with its relational database. Each of its customers has one and only one record of general information stored in the table customers. Each customer's record is identified by a unique customer code which will serve as the relational key. When a customer orders from Company B, the record in orders need contain only a reference to the customer's code, because all of the customer's general information is already stored in customers.

This approach to entering data solves the problems of duplicate data and making changes to customer information. The database manager need change only one record in customers if someone changes addresses.

Entity Types

The Entiy Relationship (ER) model consists of different types of entities. The existence of an entity may depends on the existence of one or more other entities, such an entity is said to be existence dependent.Entities whose existence not depending on any other entities is termed as not existence dependent.

Entities based on their characteristics are classified as follows.

Strong Entities
Weak Entities
Recursive Entities
Composite Entities

Strong Entity Vs Weak Entity

An entity set that has a primary key is termed as strong entity set. An entity set that does not have sufficient attributes to form a primary key is termed as a weak entity set.

A weak entity is existence dependent. That is the existence of a weak entity depends on the existence of an identifying entity set.

The discriminator (or partial key) is used to identify other attributes of a weak entity set.

The primary key of a weak entity set is formed by primary key of identifying entity set and the discriminator of weak entity set.

The existence of a weak entity is indicated by a double rectangle in the ER diagram.

We underline the discriminator of a weak entity set with a dashed line in the ER diagram.

Recursive Entity

A recursive entity is one in which a relation can exist between occurrences of the same entity set. This occurs in a unary relationship.

Composite Entities

If a Many to Many relationship exist we must create a bridge entity to convert it into 1 to Many. Bridge entity composed of the primary keys of each of the entities to be connected. The bridge entity is known as a composite entity. A composite entity is represented by a diamond shape with in a rectangle in an ER Diagram.

Database Normalization

Normalization is the process of removing redundant data from your tables in order to improve storage efficiency, data integrity and scalability. This improvement is balanced against an increase in complexity and potential performance losses from the joining of the normalized tables at query-time. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

WHY WE NEED NORMALIZATION?

Normalization is the aim of well design Relational Database Management System (RDBMS). It is step by step set of rules by which data is put in its simplest forms. We normalize the relational database management system because of the following reasons:

· Minimize data redundancy i.e. no unnecessarily duplication of data.

· To make database structure flexible i.e. it should be possible to add new data values and rows without reorganizing the database structure.

· Data should be consistent throughout the database i.e. it should not suffer from following anomalies.

· Insert Anomaly - Due to lack of data i.e., all the data available for insertion such that null values in keys should be avoided. This kind of anomaly can seriously damage a database
· Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same values in a column. This can lead to inefficiency.
· Deletion Anomaly - It leads to loss of data for rows that are not stored elsewhere. It could result in loss of vital data.

· Complex queries required by the user should be easy to handle.

· On decomposition of a relation into smaller relations with fewer attributes on normalization the resulting relations whenever joined must result in the same relation without any extra rows. The join operations can be performed in any order. This is known as Lossless Join decomposition.

· The resulting relations (tables) obtained on normalization should possess the properties such as each row must be identified by a unique key, no repeating groups, homogenous columns, each column is assigned a unique name etc.

ADVANTAGES OF NORMALIZATION

The following are the advantages of the normalization.

· More efficient data structure.

· Avoid redundant fields or columns.

· More flexible data structure i.e. we should be able to add new rows and data values easily

· Better understanding of data.

· Ensures that distinct tables exist when necessary.

ü Easier to maintain data structure i.e. it is easy to perform operations and complex queries can be easily handled.

ü Minimizes data duplication.

ü Close modeling of real world entities, processes and their relationships.

DISADVANTAGES OF NORMALIZATION

The following are disadvantages of normalization.

ü You cannot start building the database before you know what the user needs.

ü On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance degrades.

ü It is very time consuming and difficult process in normalizing relations of higher degree.

ü Careless decomposition may leads to bad design of database which may leads to serious problems.

How many normal forms are there?

They are

· First Normal Form

· Second Normal Form

· Third Normal Form

· Boyce-Codd Normal Form

· Fourth Normal Form

· Fifth Normal Form

· Sixth or Domain-key Normal form

COMPUTER SCIENCE MATERIAL

Pages

Database Management System