by
Matthew Martin
Data management is a key function for many organisations. The data may cover a range of areas: customers, suppliers, accounts, manufacturing, in fact almost any area of an organisations activities. The collection of data and its assessment has become increasingly important in the last few decades. Organisations find it extremely useful to convenient to store data digitally. Also, with the development of advanced techniques for processing data, notably data mining, it has come to play an important part in the competitiveness of companies.
The key tool for the management of data is the database (DB). Databases are designed and administered by a Database Administrator (DBA).
Data is organised hierarchically (do not confuse this with the hierarchical data model, see below).
Data is organised using the basic unit of the field. A field holds a single data entry, such as a name, a line of an address, a date of birth (DoB), etc. These form the columns of the table (see below), although they are usually not referred to as columns..
A record is a collection of fields. For example, a the details of a customer may form a record, each detail being a field containing information about the customer’s name, address, phone number, etc. Records form rows in a database table. A record is also sometimes called a tuple in a relational database system.
The records for a particular type of data are held together in a table. Often the tables are each held in a separate file, so sometimes the term file is used instead.
The database is a collection of tables (files). In some systems the term database can be used to mean a collection of related files, so it is possible to create more than one database on the same system. e.g. one for sales and marketing data and a separate one for accounts.
As stated above a record is a collection of fields, forming a row in a table. New rows are added sequentially. As a result this means that the data is not in alphabetic, or any other order.
In order to make finding a record easier, the records are each assigned a unique number, called the key.
Key |
First Name |
Family Name |
DoB |
Start Date |
3551 |
John |
Smith |
11/05/1960 |
01/11/1990 |
The above example employee record shows a single row in a table. The key field is unique for this entry.
Usually the key field is assigned automatically by the database system as an incriminated number when a new record is created. It is normally the case that it is up to the designer of the database to make sure that the key field is included in each table.
In order to allow cross-referencing between tables the keys can be used. Often this is achieved through a bridging table or simply by including the cross-reference key as a part of the record entry. For example, if we want to cross-reference the payroll records with the employee record for the employee, John Smith, we can include the key field reference in the payroll record as an additional field. Alternatively we could build a bridging table, which is simply a list of two or more keys, listing matching keys as a single row (record) in the table.
An index is a list if keys that relates the record location to a physical location on the hard drive. Using the Indexed Sequential Access Method (ISAM), the index is used to identify the cylinder and the track where the record is located. This makes disc access many times faster both for reading records and updating records.
In designing a database consideration needs to be given to the nature of the data, what it will be used for and what database system is being used.
There are three storage models for the logical structure of data.
The data is organised in a tree-like structure. The principle record is called the root and subsequent layers are referred to as children. Each record is divided into segments. The segments are connected in one-to many relationships. The hierarchical data model is used by earlier databases, such as Information Management System (IMS) by IBM. This type of data model is good for high through put systems, such as those used by banks for transaction processing.
This is similar to the hierarchical model, but allows for the representation of many-to-many relationships.
Data is treated as existing in tables (of two dimensions). The majority of database systems today work on this basis, giving them the name Relational Database Management Systems (RDBMS).
There is talk of object databases but the tendency is to simply convert between the objects used by client software and the relational tables of an RDBMS. True object oriented databases are slowly appearing in the market place, e.g. Objectivity.
One of the key tools in the design of database systems is the entity relationship model (ERM). The ERM describes the relationships between different aspects of the data, being able to represent one-to-one, one-to-may and many-to-many relationships.
The normalisation of data involves the streamlining of data, removing redundant data as far as is possible to increase the efficiency of data processing by the database.
RDBMS refers to the software system for managing the data. The relational aspect refers to being able to make relationships between different data records. Sometimes the term Database Management System (DBMS) is used, this is a more general term and omits the relational nature of modern database systems.
RDBMS have three major components:
Data mining allows for the statistical analysis of large quantities of data from multiple perspectives. Sometimes the term on-line analytical processing (OLAP) is used.
A data warehouse is a database system that stores both current and historical data, allowing for analysis of data for reporting. Due to the historical data held, the quantities of data tens to be very large. When combined with data mining techniques, valuable information can often be uncovered.
by
Matthew Martin