Data management in banking

Banks today are responsible for protecting and storing huge amounts of valuable information within their firewalls. This information is both related to customers and to the changing financial landscape. In many cases, this information goes unused because it is not easily accessible or searchable, even though the use of the data could improve decision making among multiple banking businesses.

With this data, banks could more quickly find information about who is at risk of defaulting on a loan. Banks could also decide what market portfolio valuation adjustments are needed. Banks could also have a clearer view of how data are stored and managed to meet regulatory requirements. In this way, data can be leveraged, stored, archived, or disposed of to meet compliance.

Thousands of decisions, large and small, are needed to meet daily banking function requirements. As a result, data are becoming increasingly important. Not only that, but banks have strict regulatory requirements and financial crime obligations. They need the ability to control the results of any data analysis process, right down to the initial destination of the information in a data repository. This need for traceability requires transparency from the input stage to the production of data of practical utility.

In order to manage the many accounts or client companies, banks need to be able to extract information from the data quickly and conveniently. As banks mature digitally, the amount of data and new opportunities to apply that data are growing exponentially. This growth enables banks to pursue new business models and customer-centric areas of opportunity.

It is critical to have an appropriate data archiving strategy to achieve operational efficiency, good application performance, and regulatory compliance. A sound data warehousing strategy is also the starting point for having data in formats that can be used with business intelligence and for obtaining detailed information of practical use.

The one shown below is a common data management model:

In this model, "Data Services" means any transformation, joining, or other operation performed on data other than storage. It is the main activity required to use data for the purpose of making informed decisions.

All banks and financial institutions enter, move, and store data. This article focuses on introducing data to Azure. The solution enables a move away from traditional local data storage, processing, archiving and deletion. By moving data to Azure, banks and financial institutions can gain key benefits, including:

Cost control through unlimited global scalability, using computing resources and data capacity only when and where it is needed.
Reduced capital expenditures and operating costs by retiring physical servers on-premises.
Integrated backup and disaster recovery, reducing the cost and complexity of data protection.
Automated archiving of low-cost archival data while still ensuring that compliance requirements are met.
Access to advanced, integrated data services to process data for learning, prediction, transformation or other needs.

This article provides recommended techniques to ensure efficient data entry into Azure and key data management techniques to use once in the cloud.

Data entry.

Financial institutions have data already captured that is used by current applications. Several options are available to move this data into Azure. In many cases, applications in use at the institution can connect to the data in Azure as if it were on-premises, with minimal changes to existing applications. This process is especially true when using Microsoft SQL databases in Azure, but through the Azure Marketplace solutions can be found for Oracle, Teradata, MongoDB and others.

There are several data migration strategies to move data from local to Azure and have various levels of latency. All of the following techniques offer transparency and data security.

service endpoints Virtual network (virtual network).

For those working with customers' financial information, security is a priority issue. Protecting resources (such as a database) in Azure often depends on configuring a network infrastructure within Azure itself and then accessing the network via a specific endpoint.

Before transferring data into Azure, it is useful to consider the network topology that protects both Azure resources and the connection from local. Virtual network service endpoints provide a protected direct connection to an Azure-defined virtual network.

Virtual networks are defined in Azure to contain Azure resources within a limited virtual network. This virtual network endpoint then provides protected access to strategic Azure service resources and only those in the defined virtual network.

Database lift and shift

The "lift-and-shift" database migration model represents one of the most common scenarios for using Azure's SQL database. The "lift-and-shift" mode involves simply taking existing local databases and moving them directly to the cloud. This procedure is recommended in the following scenarios:

Moving from a current data center where prices are higher or other operational reasons
Current local SQL Server database hardware is expired or nearing end of life
Supporting the generic business strategy of "moving to the cloud."
Taking advantage of SQL Azure's availability and disaster recovery capabilities.

In the case of smaller databases, the first step for data entry is usually to create the necessary data stores and structures, such as tables, via the Azure portal, the Azure command line or Azure SDK. For these smaller data stores, the next steps can be performed by a custom application written to copy the correct data to the most appropriate Azure data storage resource. Typically, the fastest route for larger data migrations is to restore backups to Azure.

Several ways are available to transfer data into Azure quickly and securely. This article discusses the other standard techniques, with the advantages and disadvantages of each.

Azure Database Migration Service

When applying the lift-and-shift methodology to SQL Server databases, the Azure Database Migration Service can be used to move databases to Azure. The service uses Data Migration Assistant to ensure that local databases are compatible with the functionality offered by Azure's SQL database. All necessary changes prior to database migration are available. In addition, use of the service requires a site-to-site Internet connection between the local network and Azure.

Bulk copy program (BCP) for SQL Server.

If SQL Server is currently on local and the goal is to move to Azure's SQL, another optimal technique is to use SQL Server Management Studio and the BCP utility to move data to Azure's SQL. After scripting and creating Azure SQL databases from the original local server, BCP can be used to quickly move data into Azure SQL.

BLOB and Azure file storage.

Individual branch offices often have their own archives located on local servers. This can cause problems with file sharing between branches and result in a lack of a single source of trusted information for a given file. In more complex cases, the institution may have an "official" file repository that branches access, but have intermittent or other connectivity problems in accessing file sharing.

Azure offers services to reduce the extent of these problems. Bringing this data into Azure ensures a single source of information for all data, a universally accessible repository with centralized access controls and permissions.

For some specific data formats, different data storage solutions may be more appropriate. For example, data stored locally in SQL Server are suitable for Azure's SQL. Data stored in formats with csv extension or Excel files are more suitable for Azure's BLOB archive or Azure Files, a feature implemented in the BLOB service.

Much of the data flow to and from Azure passes through the BLOB archive at some point in the move.

The BLOB archive is based on the following core concepts.

Durable & available
Secure compliance &
Manageable cost efficiency &
Scalable performance &
Open & interoperability

Connecting all branches to the same file share in Azure is often done through the bank's existing data center, as shown in Figure 1. The corporate data center connects to the Azure File store with an SMB (Server Message Block) connection. From the perspective of the site network, the file share can be located in the corporate data center and can be mounted like any other file share on the network. When using this technique, data is encrypted inactive and during transport between the data center and Azure.

Figure 1

Frequently, companies use Azure File storage to consolidate and protect large volumes of files. This allows them to retire legacy file servers or repurpose hardware. Another benefit given by moving to Azure File storage is the centralization of data recovery and management services.

Azure Data Box

Banks can have terabytes, if not petabytes, of information to put into Azure. Azure data stores are very flexible and have great scalability.

Azure Data Box is a service dedicated to migrating large volumes of data into Azure. The service is designed to migrate data without it being transferred or backed up via an Azure connection. Suitable for terabytes of data, Azure Data Box is an appliance that can be ordered from the Azure portal. It is sent to its location, where it can be connected to the network and loaded with data via standard NAS protocols and protected via standard256-AES encryption. Once the data is in the appliance, it is returned to the Azure Data Center where the data is hydrated in Azure. The data on the appliance is then securely deleted.

Azure Information Protection

Azure Information Protection is a cloud-based solution that enables an organization to classify, tag and protect its documents and e-mail messages. This can be done automatically by administrators who define the rules and conditions, manually by users or by a combination where users receive recommendations.

Data services

The main problems encountered by banks relate to master data management, metadata-related conflicts caused by the presence of different core banking systems, and data from origination systems, onboarding, offer management, CRM, and more. Azure offers tools to mitigate these issues and other common data problems.

Financial services firms perform many operations with data. When writing data into Azure's data stores, they may need to transform that data or add it with other data elements that augment the data elements entered.

Azure Data Factory

Microsoft Azure Data Factory is a fully managed service that facilitates the insertion, processing and monitoring of data movement in a Data Factory pipeline. Data Factory activities form the structure of the data management pipeline.

Data Factory enables the transformation or augmentation of data as it moves into Azure and between other Azure services. Data Factory is a managed cloud service designed for complex hybrid extract, transform and load (ETL), extract, load and transform (ELT) and data integration projects.

For example, data could be fed into pipelines or analysis tools that generate detailed information of practical utility. They could be transmitted to machine learning solutions or be transformed into another format for later downstream processing. Examples include converting files with a csv extension into parquet files, which are more suitable for machine learning systems, and storing these parquet files in BLOB archives.

Data can also be sent to downstream computing services, for example Azure HDInsight, Spark, Azure Data Lake Analytics and Azure Machine Learning. In this way, data can be fed directly into systems that generate intelligent analytics and reports. Figure 2 illustrates a common data entry model. The data are stored in a common Data Lake repository for later use by downstream analytics services.

Figure 2

Data Factory pipelines consist of activities that acquire and return data sets. Activities can be assembled into pipelines by defining where data should be acquired, how it should be processed, and where to store the results. Creating pipelines with activities is at the heart of Data Factory, and composing a visual workflow directly in the Azure portal simplifies pipeline creation. For a complete list of activities, see here.

Azure Databricks

Azure Databricks is an Apache Spark-based analytics platform managed in Azure. It is highly scalable and Spark processes run in computer clusters as needed. Databricks runs from a Notebook that provides a single location for collaboration between data scientists, data engineers and business analysts.

It is a logical processing pipeline useful when data needs to be transformed or analyzed. You can enter data directly with Data Factory in machine learning contexts where in-depth analysis time is critical or for simple file transformations.

Data Storage.

When data no longer needs to be stored in an active data store, it can be archived for audit trail or compliance purposes, in compliance with national and local banking regulations. Azure has suitable options for archiving data that is rarely accessed. Privacy issues often arise with data that must be kept in storage resources for years.

Storage costs can be high, particularly when stored in local databases. These databases are accessed infrequently and only to write new archived data or delete the database of data that no longer needs to be retained. Infrequent access to local machines implies a high total cost of hardware ownership.

Azure storage space

For unstructured data, such as files or images, Azure offers several levels of storage for BLOB archiving, including frequent access, sporadic access, and archive. The frequent access storage tier is suitable for active data that is used by applications and for which the greatest efficiency is needed. The sporadic access archiving level is suitable for data sets from short-term backups and for disaster recovery, but also for data that is available to applications but is not accessed frequently. The archive storage tier is the most economical and is intended for offline data.

It is possible to reactivate archive access level data in the other two levels, although this operation may take several hours. Archive space is an appropriate choice if you do not plan to access the data for at least 180 days. When a BLOB is in archival storage, it cannot be read, but other existing operations can be performed, e.g., list, delete, and retrieve metadata. The archival storage level is the least expensive level for BLOB archiving.

Long-term retention of the Azure SQL database.

When using Azure SQL, a long-term backup retention service is available to archive backups for up to ten years. Users can schedule long-term backup retention for weeks, months or even years.

To restore a database from the storage resource for long-term retention, select a specific backup based on its timestamp. The database can be restored to an existing server within the same subscription as the original database.

Deleting unwanted data

To remain compliant with banking regulations or data retention policies, data must often be deleted when it is no longer desired. Before implementing a technical solution for this unwanted data, it is important to have a deletion plan in place so that the criteria are not violated. You can delete data from the storage resource or other data stores in Azure at any time.

An effective and popular strategy for deleting unwanted data is to set the operation to a specified interval; daily or weekly intervals are among the most commonly used. A time-triggered Azure Function can be written to perform this process. If you delete data, Microsoft Azure deletes the data, including any cached or backup copies.

Introduction

There are several ways to start, depending on current usage and the maturity of the data models used at the time. In all cases, it is a perfect time to examine the data storage, processing, and retention model needed for each data repository. This step is critical for creating data management systems in regulatory compliance scenarios. The cloud offers new opportunities here that are not currently available locally. but may require updating existing data models.

After becoming familiar with a new data model, determine the data entry strategy. What are the data sources? Where will the data reside in Azure? How and when will they be moved to Azure? Many resources are available to facilitate migration, broken down by content type, size and more. One example is the Azure Database Migration Service.

After moving data to Azure, create a cleanup plan for data that is no longer useful or is at the end of its lifecycle. Although long-term archiving is always a good option for sporadically accessed data, purging expired data reduces the footprint and overall storage costs. Azure's backup and archiving solution architectures are good resources for planning the overall strategy.

Components

The following technologies are relevant to this solution:

Azure Functions is a service that uses serverless scripts and small programs that can be executed in response to a system event or in a timer.
Azure Archiving Client tools are tools to access data stores and include much more than the Azure portal.
BLOB archiving is suitable for storing files such as text or images and other types of unstructured data.
Databricks is a fully managed service that offers a simple implementation of a Spark cluster.
Data Factory is a cloud data integration service used to compose data storage, transit, and processing services into automated data pipelines

Conclusions

As the digital landscape for banking and finance rapidly evolves, customers are looking for solutions and partners that are immediately available, without initial slowdowns. Hand in hand with the exponential growth of data entry, banks are looking for fast, secure and innovative ways to store, analyze and use their most important data.

Azure meets the diverse needs for data entry, processing, storage and deletion by offering various technologies and strategies. Data entry in Azure is simple, and various data stores are available to store data according to type, structure, and so on. Data solutions are available beyond SQL Server and SQL Azure to include third-party databases.

Some Azure services, such as Azure Databricks and Azure Data Factory facilitate executable activities on data. Archival storage is available for long-term archiving of rarely accessed data and can be deleted in a sequence cycle as needed.

Data management in banking

Giang Monday, February 27, 2023