THE ETL-SYSTEMS AS INNOVATIVE SOLUTION CONTRIBUTING TO COMPANIES’ COMPETITIVENESS - Студенческий научный форум

XI Международная студенческая научная конференция Студенческий научный форум - 2019

THE ETL-SYSTEMS AS INNOVATIVE SOLUTION CONTRIBUTING TO COMPANIES’ COMPETITIVENESS

Аванесян Н.Л. 1, Ермолаева Л.Д. 1
1Влгу
 Комментарии
Текст работы размещён без изображений и формул.
Полная версия работы доступна во вкладке "Файлы работы" в формате PDF

Technology and the IT-industry have reached a level of development that affects all types of human activity. With incredible speed, the volume of processed data, a variety of tools for creating and improving systems and applications is constantly growing. At the same time, the number of data sources and consumers is increasing.Sophisticated software and information systems created for working with data, challenge IT staff requiring to improve their skills and qualification.

Data downloaded from any sources, as a rule, is required not only to be stored within one system, but to be transferred to other systems for processing and analysis. To do this, there are data warehouses (HD or DWH - Data Warehouse). These are databases for collecting and processing various information designed and oriented specifically for preparing reports and business analysis in order to support decision-making at the enterprise [1].

So, given the above, we can distinguish three stages in the process of working with data [2]:

Extracting;

Transforming;

Loading.

These three stages constitute the abbreviation ETL - one of the main processes in data management when receiving them from a variety of source systems and loading into CD, in order to obtain reliable information.

Let us consider the selected stages [3]:

1. Data extract

The initial phase of the ETL process is the process of extracting records from data sources and preparing them for the conversion process. When developing a procedure for extracting data, it is first necessary to determine the frequency of data upload from OLTP systems or individual sources. Uploading data takes a certain amount of time, which is called the upload window.

The data extraction procedure can be implemented in two ways:

data extraction using specialized software;

data extraction by means of the system in which they are stored.

After extraction, the data is placed in the so-called “intermediate area”, where for each data source a separate table or a separate file, or both, is created.

2. Data transform

The purpose of this stage is to prepare the data for placement in the data warehouse and bring it into a form more convenient for subsequent analysis. This should take into account some of the requirements put forward by the analyst, in particular, to the level of data quality. Therefore, the transformation process can involve a wide variety of tools, starting with the simplest means of manual data editing and ending with systems that implement complex data processing and cleaning methods. In the process of converting data within the ETL, the following operations are most often performed:

data structure conversion;

data aggregation;

translation of values;

creation of new data;

data cleansing.

3. Data load

The loading process consists in transferring data from intermediate tables to the data storage structure. During the next loading, not all information from sources is transferred to the data storage, but only that which was changed during the intermediate time elapsed from the previous loading. In this case, there are two streams:

add stream - new, previously non-existing information is transferred to the data warehouse;

update (add-on) stream - information that existed previously but was changed or added is transferred to the data warehouse.

Data is used to distribute the loaded data on the stream. They record the state of the data at certain points in time and determine which data has been changed or added.

To implement the ETL process, you can use most modern programming languages. In addition, most BI solution component providers provide data conversion between their products. However, if we are not talking about a simple one-time conversion of a small amount of data between the two systems, but about building the process of continuous integration of data from several different sources, then it makes sense to consider using specialized utilities that facilitate the automation of typical operations, support the main formats used and most common information systems. In this case, the parameters of scalability, speed and extensibility of such utilities should be taken into account.

The most common developers of ETL tools today are Oracle, Informatica and IBM [2]. As a rule, the systems of such manufacturers satisfy the needs of the business so that most customers do not use half of the existing functionality. Therefore, it makes sense to choose ETL tools based on the required solution tasks and the existing enterprise platform. For example, if a company is dominated by systems from IBM, then IBM's Data Stage solution can be considered.

When using Unix / Linux of similar operating systems, solutions from Informatica or Oracle are often used. The advantages of the PowerCenter and PowerMart solutions from Informatica include the compliance with new technologies, regular releases and, as a result, the largest set of tools for data integration. Flexibility of the system adds the ability to program in the Java language. Oracle Data Integrator is characterized by the ability to scale from a small organization to a whole enterprise, the presence of special technologies for manipulating large amounts of data. Also, to support developers, the system has special knowledge modules for many other systems containing code patterns [4].

If the company is dominated by systems from Microsoft, then ETL can be built on the SSIS product from Microsoft. This product has a developed user interface, provides a wide library of standard data stream components. In addition to these, SSIS provides the infrastructure for creating custom components. This allows companies to develop highly specialized, highly efficient data processing components. But the product is not cross-platform and is focused on interaction with Miscosoft products [5].

Thus, at the moment, ETL systems are considered not only as a tool for creating reports or correcting errors that may occur at the input level, data transfer, due to system errors or differences in data between adjacent IT systems. These tasks are indeed extremely important for business, since in the availability of a huge number of different systems in a company with conflicting data, it is very difficult to obtain reliable and qualitative data, reduced to a single system of values and details. Business realized the need to implement ETL-systems for making timely decisions and successful competition. And this means that developers and integrators of such solutions strive to offer businesses new, innovative solutions with even more developed and intellectual functionality.

Literature:

1. ETL: review and role in the development of companies [Electronic resource]. - Electron. data. URL: https://moluch.ru/conf/tech/archive/286/13149/

2. ETL (Extract Transform Load) [Electronic resource]. - Electron. data. URL:http://www.tadviser.ru/index.php/%D0%A1%D1%82%D0%B0%D1%82%D1%8C%D1%8F: ETL_ (Extract_Transform_Load)

3. What is ETL? [Electronic resource]. - Electron. data.URL: http://datascientist.one/chto-takoe-etl/

4. Oracle Data Integrator Enterprise Edition // ORACLE. [Electronic resource]. - Electron. data. URL: http://www.oracle.com/us/products/middleware/data-integration/odi-ee-11g-ds-168065.pdf

5. A. Berger. Microsoft SQL Server 2005 Analysis Services. OLAP and multidimensional data analysis / Berger A. B., Gorbach I. V., Melomed E. L., Scherbinin V. A., Stepanenko V. P. / Ed. Ed. A. B. Berger, I. V. Gorbach. - SPb .: BHV-Petersburg, 2007.

Просмотров работы: 2