The concept of the ER model. The concept of an entity. attributes. Attribute types. Database subject area and its models Types of entities in the database

An entity is a real or abstract entity that is essential to the domain. The entity must have a name expressed by a noun in the singular

An informal way to identify entities is to look for abstractions that describe objects, processes, roles, and other concepts. A formal way to identify entities is to analyze textual descriptions of the subject area, extract nouns and select them as abstractions.

An entity instance is a specific instance of that entity. For example, employee Ivanov can be an instance of the Employee entity.

Each entity must have the following properties:

have a unique name;

have one or more attributes that either belong to an entity or are inherited through a relationship;

have one or more attributes that uniquely identify each entity instance.

Attribute - a characteristic of an entity that is significant for the subject area under consideration and is intended to identify, classify, quantify or express the state of the entity.

There are the following types of attributes:

simple - consists of one data element;

composite - consists of several data elements;

unambiguous - contains one value for one entity;

multivalued - contains several values for one entity;

optional - may have an empty (undefined) value;

derived - a value derived from the value of another attribute.

A unique identifier is a set of attributes whose values together are unique for each entity instance. Removing any attribute from an identifier breaks its uniqueness. Unique identifiers are underlined in the diagram.

Each entity can have any number of relationships with other entities.

Relationships between entities

A relationship is a named association between entities that is meaningful to the domain in question.

The degree of relationship is the number of entities participating in the relationship.

Link power - the number of entity instances participating in the link.

Depending on the power value, the connection can have one of three types:

one-to-one (denoted 1:1).

one-to-many (denoted 1:N).

many-to-many (denoted by M:N).

One to one. Means that in such a relationship, an entity with one role always corresponds to no more than one entity with another role. Since the degree of connection for each entity is 1, they are connected by one line.

One-to-many. An entity with one role can be matched by any number of entities with a different role.

Many-to-many. In this case, each of the associated entities can be represented by any number of instances.

The documentary approach, based on the classical consideration of the subject area, involves the creation of entity types based on the attributes of each document, forming a general set of such entity types (attribute entities) that their union represents a relation in the first normal form (Fig. 4.2).

In this example, a preliminary analysis of documents revealed a set of attributes that must be presented when building a database model. Let us need to build a database model for the formation of the document D1 "Order". The document contains 10 attributes, each of which is represented by a separate entity type containing the corresponding simple type attribute. However, this representation cannot be considered absolutely correct, since there is one attribute in the document that is inappropriate to consider, given that it makes sense only within the document and characterizes some instance of the relationship in its representation in the document. This attribute is "Item number". In fact, its representation in the document, this shows the analysis of the document, is implemented in the process of reflecting the ordered goods, and the numbering is formed in relation to the document as an object of the subject area, and not the area for storing information about the ordered goods. Thus, since the example does not have the task of storing information about the "Document" object and its presentation to the user, the "Item number" attribute is meaningless from the point of view of the relation to the order and the product. However, now we will not remove it from the list of considered attributes and entity types, considering it as an attribute that characterizes the order of goods in the order.

When selecting entity types from document attributes, the developer adds one more characteristic to those already in the document, and the description of entity types will become a little more complete (Table 4.5), taking into account their subsequent representation in the model.

Table 4.5

Description of entity types

	entity type	Note
Order number	Order number
order date	order date

Item Number	Item Number
	Name
The price of the product	The price of the product
Price	Price	Calculated:
Price	Price	Calculated: 511M(<7>) on request<1>
Symbolic price	Symbolic order value	Formed by translating a numeric value into a symbolic expression
Quantity	Quantity

This description represents the characteristics that are important for the database model:

data type - a description in terms of the database model of the types of stored information, defining at the level of physical implementation of the principles of representation and processing in the database;
dimension - a characteristic used for some data types to specify the number of characters (bytes) to be used when storing the corresponding value.

In the presented entity type description table, the "Dimension" column for some data types contains a numeric value enclosed in curly brackets. This is done for the reason that integer, logical data, as well as date and time in databases are represented by standard representation mechanisms, and the size in bytes is always known for storing data of these types.

Numeric data types and the logical data type equated to them always have a fixed dimension:

Boolean, Logical, Tinylnt, Bit - logical (Boolean, Logical) data tin containing the values "True" (true) and "False" (false), identical to a small integer tin (Tinylnt) or bit (Bit) with a minimum dimension of 1 byte (1 bit when identical to the Bit type) and the values "O" or "1";
Bit is an integer data type representing the values "0" or "1", with a dimension of 1 bit;
Tinylnt is an integer data type with a dimension of 1 byte;
SmallInt is an integer data type with a dimension of 2 bytes;
Integer - an integer data type with a dimension of 4 bytes;
BigInt, Long - integer type of large dimension (8 bytes);
Date - date data type, represented in a character version, with a dimension of 8 characters (bytes);
Time - time data type, represented in character form, with a dimension of 8 characters (bytes).

For data types with a fixed dimension, the "Dimension" column is usually not filled in, implying that the indication of the data type itself already defines it and no additional indication is required. For all other data types, it is important to specify the maximum dimension, taking into account the peculiarities of the values \u200b\u200bthat may be present in the document. For numeric data types, you specify the number of bytes that the number should occupy and the number of digits after the decimal point, defining the precision of the real number. Character data types specify the number of characters that should be stored for the corresponding attribute. A character type can be represented in three different ways:

Text, CLOB - a text data type that represents large text information stored in a special way, and in the database table is shown by a set of first characters, the number of which is the dimension of the attribute of this type;

- Character - a string data type that stores exactly the number of characters that is specified as a dimension, filling in the missing characters at the end of the string with spaces;
- Varchar - string data type of variable length, for which the dimension determines the maximum number of characters that a stored string can have.

The Character and Varchar types are very similar in essence, since they define the type of string data, but the particulars of the data representation determine the conditions under which one or another type is used. So, to store data that has a fixed size (for example, TIN (taxpayer identification number), BNK (bank identification code) of a bank, order number, product article, etc.), the Character type is usually used, since such data cannot there may be options when the number of characters may differ from those specified in the dimension of the attribute. The Varchar type is used in all other cases where it is not necessary to store the exact number of characters.

The term "relational" means "relationship-based". A relational database consists of entities (tables) that have some relationship with each other. The name came from English word relation.
Database design consists of two main phases: logical and physical modeling.
During logical modeling, you collect requirements and develop a database model that is independent of a particular DBMS (relational database management system). It's like creating blueprints for your house. You could think over and draw everything: where the kitchen, bedrooms, living room will be. But this is all on paper and in layouts.
During physical modeling, you create a model that is optimized for a specific application and DBMS. It is this model that is implemented in practice. If we return to the house from the previous paragraph, at this stage you will have to build a house somewhere - carry logs, bricks ...

The database design process consists of the following steps:

collection of information;
definition of entities;
defining attributes for each entity;
defining relationships between entities;
normalization;
transformation to a physical model;
database creation.

The first 5 stages form the logical design phase and the remaining two form the physical modeling phase.

Logic phase

The logical phase consists of several stages. They are all discussed below.

Gathering Requirements

At this stage, you need to determine exactly how the database will be used and what information will be stored in it. Gather as much information as possible about what the system should and shouldn't do.

Entity definition

At this stage, you need to define the entities that the database will consist of.

An entity is an object in a database that stores data. An entity can be something real (a house, a person, an object, a place) or an abstract thing (a banking transaction, a department of a company, a bus route). In the physical model, an entity is called a table.

Entities are made up of attributes (columns in a table) and records (rows in a table).

Typically, databases are made up of several primary entities associated with a large number of subordinate entities. Core entities are called independent: they do not depend on any other entity. Subordinate entities are called dependent: in order for one of them to exist, the main table associated with it must exist.
In diagrams, entities are usually represented as rectangles. The name of the entity is indicated inside the rectangle:

Any table has the following characteristics:

there are no identical lines in it;
all columns (attributes) in the table must have different names;
elements within the same column have the same type (string, number, date);
the order of the rows in the table can be arbitrary.

At this stage, you need to identify all categories of information (entities) that will be stored in the database.

Attribute Definition

An attribute represents a property that describes an entity. Attributes are often a number, date, or text. All data stored in an attribute must be of the same type and have the same properties.
In the physical model, attributes are called columns.
After defining the entities, it is necessary to define all the attributes of these entities.
In diagrams, attributes are usually listed within the entity rectangle. In the figure you will find an example of the "Houses" database, only now some attributes are defined for the entities from this database.

Each attribute defines the data type, size, allowed values, and any other rules. These include mandatory, mutable, and uniqueness rules.
The mandatory rule determines whether an attribute is a required part of an entity. If the attribute is an optional part of the entity, then it can be NULL, otherwise not.
You must also determine if the attribute is mutable. Some attribute values cannot change after the entry is created.
And finally, you need to determine if the attribute is unique. If so, then the attribute values cannot be repeated.

Keys

A key is a set of attributes that uniquely identifies an entry. Keys are divided into two classes: simple and compound.
A simple key consists of only one attribute. For example, in the "Passports of the country's citizens" database, the passport number will be a simple key: after all, there are no two passports with the same number.
A composite key consists of several attributes. In the same database "Passports of citizens of the country" there can be a composite key with the following attributes:
surname, name, patronymic, date of birth. This is just an example, since this composite key, in theory, does not provide guaranteed uniqueness of the record.
There are also several types of keys, which are described below.

Possible key

A candidate key is any set of attributes that uniquely identifies an entry in a table. The candidate key can be simple or compound.
Each entity must have at least one possible key, although there may be more than one possible key. None of the primary key attributes can have a NULL value.
A candidate key is also called a surrogate key.

Primary Keys

A primary key is a set of attributes that uniquely identify a record in a table (entity). One of the possible keys becomes the primary key. In diagrams, primary keys are often shown above the main list of attributes or are highlighted with special symbols. The entity in the figure has both key and regular attributes.

Alternative Keys

Any possible key that is not the primary key is called an alternate key. An entity can have multiple alternate keys.

Foreign keys

A foreign key is a collection of attributes that refer to the primary or alternate key of another entity. If a foreign key is not associated with a primary entity, then it can only contain null values. If the key is also composite, then all attributes of the foreign key must be undefined.
In diagrams, attributes that are combined into foreign keys are denoted by special characters. The figure shows two related entities (Houses and their Owners) and the foreign keys formed by them (after all, one person can own more than one house).

Keys are logical constructs, not physical objects. Relational databases have mechanisms to store keys.

Defining Relationships Between Entities

Relational databases allow you to combine information belonging to different entities.
A relationship is a situation in which one entity refers to the primary key of a second entity. Like, for example, the entities House and Master in the previous figure.
Relationships are defined during the base design process. To do this, you should analyze the entities and identify the logical relationships that exist between them.
The relationship type determines the number of entity records associated with another entity record. Relationships are divided into three main types, which are described below.

One to one

Each entry of the first entity corresponds to only one entry from the second entity. And each record of the second entity corresponds to only one record from the first entity. For example, there are two entities: People and Birth Certificates. And one person can only have one birth certificate.

One-to-many

Each record of the first entity can correspond to several records from the second entity. However, each entry of the second entity corresponds to only one entry from the first entity. For example, there are two entities: Order and Order Item. And there can be many items in one order.

many-to-many

Each record of the first entity can correspond to several records from the second entity. However, each record of the second entity can correspond to several records from the first entity. For example, there are two entities: Author and Book. One author can write many books. But a book can have multiple authors.
According to the criterion of mandatory relations are divided into mandatory and optional.

A mandatory relationship means that for each entry from the first entity, there must be related entries in the second entity.
An optional relationship means that a record from the first entity may not have a record in the second entity.

Normalization

Normalization is the process of removing redundant data from a database. Each data element must be stored in the database in one and only one instance. There are five common forms of normalization. As a rule, the database is reduced to the third normal form.
During the normalization process, certain actions are performed to remove redundant data. Normalization improves performance, speeds up sorting and index building, reduces the number of indexes per entity, and speeds up insert and update operations.
A normalized database is usually more flexible. When modifying queries or persisted data, a normalized database typically requires fewer changes, and changes have fewer consequences.

First normal form

To convert an entity to first normal form, you must eliminate duplicate groups of values and ensure that each attribute contains only one value, lists of values are not allowed.
In other words, each attribute in an entity should only be stored in one instance.
For example, in the figure, the House entity is not normalized. It contains several attributes for storing data about the owners of the house (the House entity does not correspond to the first normal form).

To bring the House entity to the first normal form, it is necessary to remove the repeating groups of values, that is, remove the Owner 1-3 attributes, placing them in a separate entity. Result (Entity House reduced to first normal form):

Second normal form

A table in second normal form contains only the data that applies to it. Values of non-key entity attributes depend on the primary key. More precisely, attributes depend on the primary key, on the entire primary key, and only on the primary key.
Entities must be in first normal form to conform to second normal form.
For example, the entity House in the figure has an attribute Price per liter of gasoline, which has nothing to do with houses. This attribute is removed (or you can move it to another entity). And also we move the Mayor attribute to a separate entity - this attribute depends on the city where the house is located, and not on the house.
The figure shows the essence House in the second normal form (the Essence House reduced to the second normal form).

third normal form

Third normal form excludes attributes that do not depend on the entire key. Any entity that is in third normal form is also in second normal form. This is the most common form of a database.
In third normal form, every attribute depends on the key, on the whole key, and on nothing but the key.
For example, the House Owner entity in the figure has a Zodiac sign attribute that depends on the date of birth of the owner of the house, and not on his name (which is the key).
To cast the entity Owner of the house, you need to create the entity Signs of the Zodiac and transfer the attribute Sign of the Zodiac there (Entity Owner of the house, reduced to the third normal form):

Restrictions

Constraints are the rules enforced by the database management system. Constraints define the set of values that can be entered in a column or columns.
For example, you do not want the order amount in your very cool store to be less than 500 rubles. You simply set a limit on the Order Amount column.

Stored procedures

Stored procedures are precompiled procedures stored in a database. Stored procedures can be used to define business rules, with their help it is possible to perform more complex calculations than with the help of constraints alone.
Stored procedures can contain program flow logic as well as database queries. They can take parameters and return results as tables or single values.
Stored procedures are just like regular procedures or functions in any program.

NOTE
Stored procedures reside in the database and run on the database server. They are generally faster than SQL statements because they are stored in compiled form.

Data integrity

By organizing the data into tables and defining the relationships between them, we can assume that a model has been created that correctly reflects the business environment. Now we need to ensure that the data entered into the database gives a correct idea of the state of the matter. In other words, you need to enforce business rules and maintain the integrity of the database.
For example, your company is engaged in the delivery of books. You are unlikely to accept an order from an unknown client, because then you will not even be able to deliver the order. Hence the business rule: orders are accepted only from customers whose information is in the database.
The correctness of data in relational databases is ensured by a set of rules. Data integrity rules fall into four categories.

Entity Integrity- each entity record must have a unique identifier and contain data. After all, you need to somehow distinguish between all these records in the database.
Attribute Integrity- each attribute accepts only valid values. For example, the purchase amount can definitely not be less than zero.
Referential Integrity- a set of rules that ensure the logical consistency of primary and foreign keys when inserting, updating and deleting records. Referential integrity ensures that for every foreign key there is a corresponding primary key. Let's take the previous example with the entities Home Owner and Home. Let's say you are Vasya Ivanov and own a house. You changed your last name to Sidorov and made the appropriate changes to the House owner entity. Definitely you would like your house to continue to be yours under your new name, and not belong to a certain Vasya Ivanov, who no longer exists.
Custom Integrity Rules- any integrity rules that do not belong to any of the listed categories.

triggers

Trigger is an analogue of a stored procedure, which is called automatically when the data in the table changes.
Triggers are a powerful mechanism for maintaining database integrity. Triggers are called before or after data changes in the table.
With the help of triggers, you can not only undo these changes, but also change the data in any other table.
For example, you are creating an Internet forum and you want to make sure that the forum list shows the latest forum post. Of course, you can take a message from the Forum Posts entity, but this will increase the complexity of your request and its execution time. It's easier to add a trigger to the Forum Posts entity that records the last post added to the Forums entity, in the Last Post attribute. This will greatly simplify the query.

Business rules

Business rules define the restrictions placed on the data according to the requirements of the business (those for whom you are creating the base). Business rules may consist of a set of steps required to complete a specific task, or they may simply be checks that verify that the data entered is correct. Business rules may include data integrity rules. Unlike other rules, their main purpose is to ensure that business transactions are conducted correctly.
For example, in the Very Tough Guys company, it may be customary that only white, blue, and black cars are purchased for official use.
The business rule for the Vehicle Color attribute of the Company Vehicles entity would then be that the vehicle can only be white, blue, or black.
Most DBMSs provide the means to:

to specify default values;
to check the data before entering it into the database;
to maintain relationships between tables;
to ensure the uniqueness of values;
for storing stored procedures directly in the database.

All of these features can be used to implement business rules in a database.

Physical model

The next step, after creating the logical model, is to build the physical model. The physical model is the practical implementation of the database. The physical model defines all the objects that you have to implement.
When moving from a logical model to a physical entity, they are converted to tables, and attributes to columns.
Relationships between entities can be converted to tables or left as foreign keys.
Primary keys are converted to primary key constraints. Possible keys are in uniqueness constraints.

Denormalization

Denormalization- this is a deliberate change in the structure of the base that violates the rules of normal forms. This is usually done to improve database performance.
Theoretically, one should always strive for a fully normalized base, but in practice, a fully normalized base almost always means a performance drop. Over-normalizing a database can result in multiple tables being accessed each time data is retrieved. Typically, four tables or fewer must participate in a query.
Standard denormalization techniques are: combining several tables into one, storing the same attributes in several tables, and storing summary or calculated data in a table.

3 . Data Model Components

Entity, entity definition, sources of information about entities

The data model - a conceptual description of the subject area - is the most abstract level of database design. The data model consists of entities, attributes, domains, and relationships. Further - about each of the elements in detail.

3.1 Entities

An entity is something about which information needs to be stored in a database.

When designing databases, it is enough to describe the current situation - and most nouns and some verbs will be candidates for entities. For example: "Customers buy goods. Employees sell goods to customers. Suppliers supply goods" - customers, goods, employees and suppliers are entities. The verbs "buy" and "sell" are also entities (although they can be the same entity, different from the point of view of the buyer and seller).

When designing a database, the main source of information about entities is a conversation with the customer in order to understand his business processes. In addition, standard documents used in business processes are analyzed: forms, reports, instructions, etc. After receiving such a list, it is necessary to check it for completeness and coherence, as well as to identify duplicates - the same entities, which are called different words, and entities that are actually different but are described by the same term.

Entities can model specific concepts (customers, goods, calls) and abstract ones (the agent is responsible for the client, the student is enrolled in the course).

The basic concepts of the DB model "entity-relationship" (ER-model): entities, relationships between them and their attributes (properties).

Essence- any concrete or abstract object in the subject area under consideration. Entities are the basic types of information that are stored in the database (in a relational database, each entity is assigned a table). Entities can include: students, clients, departments, etc. Entity instance and entity type are different concepts. The concept of an entity type refers to a set of homogeneous persons, objects or events acting as a whole (for example, a student, a client, etc.). An entity instance refers, for example, to a particular person in a set. An entity type can be a student, and an instance can be Petrov, Sidorov, etc.

Attribute is a property of an entity in the subject area. Its name must be unique for a particular entity type. For example, for the entity student, the following attributes can be used: last name, first name, patronymic, date and place of birth, passport data, etc. In a relational database, attributes are stored in table fields.

Connection– the relationship between entities in the subject area. Relationships are connections between parts of the database (in a relational database, this is a connection between table records).

Essences is data that is classified by type, and relationships show how these types of data relate to each other. If we describe a certain subject area in terms of an entity - a relationship, then we get an entity - relationship model for this database.

arrow is symbol relationships: one-to-many.

The main advantages of ER-models: * visibility; * models allow you to design databases with a large number of objects and attributes;

The main elements of ER-models: * objects (entities); * object attributes; * links between objects.

Relationship between entities is characterized by: * relationship type (1:1, 1:N, N:M); * membership class. A class can be required or optional. If each instance of an entity participates in a relationship, then the membership class is mandatory, otherwise it is optional.

The concept of data normalization. functional dependency.

Normalization is a formal method for analyzing relationships based on their primary key and existing relationships. Its task is to replace one database schema (or set of relationships) with another schema in which the relationships have a simpler and more regular structure.

functional dependency. Let X and Y be two attributes of some relation. Y is said to be functionally dependent on X if, at any given time, each value of X corresponds to at most one value of attribute Y.

Functional dependence is denoted as X -\u003e Y.

Relationship student S(Ns, Fio, Ngr, Addr, Tel). Each of the attributes Fio, Ngr, Addr, Tel is functionally dependent on the attribute Ns.

So, in a normalized relation, all non-key attributes are functionally dependent on the key of the relation. The key of the relation S is the Ns attribute.