Secure Data Copy from Google BigQuery to Azure SQL Database using Azure Data Factory

Introduction:

As organizations increasingly rely on multi-cloud strategies, the ability to move data seamlessly between cloud platforms becomes essential. An everyday use case is transferring data from Google BigQuery, a fully-managed, serverless data warehouse on Google Cloud, to Azure SQL Database, a fully-managed relational database service by Microsoft Azure. In this article, we will explore using Azure Data Factory (ADF), Microsoft’s cloud-based ETL and data integration service, to copy data from Google BigQuery to Azure SQL Database using a service authentication method for secure and efficient data movement.

Cloud Services used:
  • Google BigQuery is a high-performance enterprise data warehouse that allows scalable analysis over petabytes of data using SQL-like queries. It is serverless, meaning users don’t have to manage infrastructure, and supports real-time analytics with built-in machine learning capabilities.
  • Azure SQL Database is a fully-managed relational database service that supports high availability, automated backups, scaling, and security features such as threat detection and advanced data protection. It is ideal for building data-driven applications with a resilient and scalable backend.
  • Azure Data Factory (ADF) is Azure’s cloud-based data integration service. It enables data engineers and analysts to build complex ETL and ELT workflows to ingest, transform, and orchestrate data movement across various data stores. ADF provides a rich UI, code-free pipeline creation options, and support for over 90 connectors, including Google BigQuery.
Google Big Query integration with Azure data factory

Step-by-Step: Copy Data from Google BigQuery to Azure SQL Database

Google BigQuery

Google provides a sandbox environment to create and experiment with Google BigQuery with some limitations.

I created a Google BigQuery using the sandbox. Before creating a resource, make sure you are in the right project, or create a project using the console. In the console, you can search for BigQuery or select from the menu, as shown in the figure below. It will launch the BigQuery studio.

Google BigQuery Service
BigQuery

Next, add data from BiqQuery Studio. For demo purposes, I used the sample employee CSV file(local file) with 100 records and four fields (ID, FirstName, LastName, Age) to keep it simple.

Schema

BigQuery Employee table schema
Schema

The next step is to create a service authentication to establish the connection from Azure Data Factory.

Service Account:

Service authentication is a secure method of connecting applications and services without user credentials. In the context of Azure Data Factory, service authentication allows integration with Google BigQuery through a service account, a special kind of Google account intended to represent a non-human user who needs to authenticate and access data. This method uses a JSON key file containing the service account’s credentials, ensuring that access is tightly scoped and managed through Google Cloud IAM roles. By leveraging service authentication, organizations can automate data workflows securely and avoid hardcoding sensitive credentials, aligning with best practices for cloud security and identity management.

Select IAM & Admin from the menu, search for it from the console, and select the service account.

Service accounts option

Add a new service account, as shown in the figure below

Service account creation
Service Account

For the role, select the least privileged role to establish a connection and read the data from the Employee dataset.

Google Cloud service account role
Service Account Role

After creating the service account, click on it to add a key. Create a key of the type JSON. Once the key is created, it will download the JSON file. Save it; we will use it later to establish the connection from the Azure data factory.

Service account key creation
Service Account Key
Azure Data Factory
Connection:

Launch Azure Data Factory Studio

Click on Manage from the menu, and select linked services as shown in the figure below.

Linked Services

Click Add and search for Google Big Query from the Add Linked Services window. Select and click on continue. It will open the New Linked Service window

Google BigQuery Connector
Google BigQuery Connector

Name – Provide a unique name

Project ID – Ensure the project ID is your Google Cloud Project ID.

Authentication Type – you will find user and service authentication. Service authentication is used for production configuration

Key file—Upload the service account JSON key file here. The most recommended approach is to use the key vault, but for demo purposes, I uploaded it from the local system.

Test Connection  -Click on Test Connection at bottom right of the screen and click on create once the test is successful

Google BigQuery Service Connection
Google BigQuery Service Connection
Add DataSet

Select pipelines from the menu and add a new dataset, as shown in the figure below. Then, search for the Google BigQuery service.

ADF Dataset Google BigQuery
New Dataset

 Select the linked service and table from the properties screen as shown in the figure below. Click on save

Google BigQuery DataSet
Google BigQuery DataSet
Add Pipeline:

Select Pipelines and the new pipeline as shown in the figure below,

ADF Pipeline
ADF Pipeline

Drag and drop the Copy data module from the Move and transform section.

Name the module and select the source and sink

Source -> Select the Google BigQuery dataset

Source - Google BigQuery Dataset
Source – Google BigQuery Dataset

Sink —> Select the Azure SQL Dataset. Adding the Azure SQL database as a dataset is similar to how we did it for Google BigQuery. The only difference is the authentication methods. The most recommended approach is to use Managed Identity for authentication.

Sink - Azure SQL DataSet
Sink – Azure SQL DataSet

Modify the Source and sink properties based on the requirement. In this demo, I’m just going with the default options. The only change is that I wrote behavior as Upsert for sink properties.

Mappings -> Click on import schemas, and you can map the fields as shown in the figure below.

Field Mapping
Field Mapping

Finally, could you validate the settings and click on Debug? The records from Google BigQuery will be imported into the Azure SQL database.

Based on the Microsoft Learn document, you can use Azure Data Factory and Synapse Analytics for this integration and data copy; however, based on our use case, which is a typical ETL operation, Azure Data Factory is the best option from a Cost standpoint.

Conclusion:

Azure Data Factory with service authentication provides a secure and scalable approach to integrating data across cloud platforms. This step-by-step integration of Google BigQuery and Azure SQL Database via ADF allows organizations to streamline analytics and reporting by centralizing data in a preferred data store. With its robust connectors and easy-to-use interface, Azure Data Factory is a powerful tool for hybrid and multi-cloud data integration scenarios.

Gowtham K

Gowtham K has been awarded as MVP(Most Valuable Professional) for 9 times by Microsoft for his exceptional contribution in Microsoft technologies under the category “Developer Technologies & Security” . He has more than 12 years of experience on Microsoft technologies such as C#, ASP.NET MVC, ASP.NET WEB API, ASP.NET Core, MS SQL Server, Azure, Microsoft Entra ID, Azure AD B2C and other technologies such as JavaScript, jQuery, HTML and CSS .He is also a blogger and author of articles on various technologies. He is also a speaker and delivered talk on various technologies like ASP.NET MVC, Azure and Azure DevOps in the public events.

Leave a Reply

Your email address will not be published. Required fields are marked *