Home  > Resources  > Blog

Using PolyBase in SQL Server

 
May 27, 2022 by Faheem Javed
Category: Microsoft

1.1 Understanding PolyBase

PolyBase enables a SQL Server instance to query data with T-SQL directly from SQL Server, Oracle, Teradata, MongoDB, Hadoop clusters, and Cosmos DB. Separate installation of client connection software is not needed. You can also use the generic ODBC connector to connect to additional providers using third-party ODBC drivers. PolyBase allows T-SQL queries to join the data from external sources to relational tables in an instance of SQL Server.

1.2 PolyBase Example

1.3 Data Virtualization with PolyBase

With PolyBase, data stays in its original location and format. You can virtualize the external data through the SQL Server instance so that it can be queried in a place like any other table in SQL Server. This process minimizes the need for ETL processes for data movement. The query author does not need any knowledge about the external source.

1.4 PolyBase Use-cases

Query data stored in Hadoop from a SQL Server instance. Query data stored in Azure blob storage. Import data from Hadoop, Azure blob storage, and Azure Data Lake Store. There is no need for a separate ETL or import tool. Export data to Hadoop and Azure blob storage, or Azure Data Lake Store. Integrate with BI tools. Use PolyBase with Microsoft’s business intelligence and analysis stack, or use any third-party tools that are compatible with SQL Server.

1.5 PolyBase and Massive Parallel Processing

You may configure a PolyBase cluster that lets you set up scaled-out groups. Scaled-out groups allow PolyBase to provide enormous parallel processing to handle huge quantities of data at a quick speed.

1.6 Supported SQL Products and Services

  • SQL Server 2016 (13.x) and later versions (Windows only)
  • SQL Server 2019 (15.x) and later versions (Linux)
  • Azure Synapse Analytics

1.7 PolyBase Connectors

SQL Server 2016 (13.x) introduced PolyBase with support for connections to Hadoop and Azure blob storage. SQL Server 2019 (15.x) introduced additional connectors, including SQLServer, Oracle, Teradata, and MongoDB.

1.8 PolyBase vs. Linked Server

1.9 PolyBase Limitations

If the sum of the column schema is greater than 32 KB, PolyBase can’t query the data.

The following data types cannot be used in PolyBase external tables:

geography, geometry, hierarchy id, image, text, nText, xml, Any user-defined type. Oracle synonyms are not supported for usage with PolyBase.1

1.10 Using PolyBase in SQL Server Instance

  • Install PolyBase on Windows or Install PolyBase on Linux.
  • Starting with SQL Server 2019 (15.x), enable PolyBase in sp_configure, if necessary.
  • Create an external data source.
  • Create an external table.

1.11 PolyBase Installation

In the installation wizard, select PolyBase Query Service for External Data.

1.12 Enable PolyBase

After the installation, use the following T-SQL command.

exec sp_configure @configname = ‘polybase enabled’, @configvalue = 1;

RECONFIGURE;

Run the following command and verify the return value is 1 to ensure PolyBase is installed.

SELECT SERVERPROPERTY (‘IsPolyBaseInstalled’) AS IsPolyBaseInstalled;

1.13 Create an external data source to reference a SQL Server Instance

Create a database scoped credential.

— Create a Master Key

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘S0me!nfo’;

GO

CREATE DATABASE SCOPED CREDENTIAL SqlServerCredentials

WITH IDENTITY = ‘username’, Secret = ‘password’;

GO

Use a named instance to connect to a SQL Server instance

CREATE EXTERNAL DATA SOURCE SQLServerInstance2

WITH (

LOCATION = ‘sqlserver://YOUR_HOST_NAME’ ,

CONNECTION_OPTIONS = ‘Server=%s\YOUR_INSTANCE_NAME’ ,

CREDENTIAL = SQLServerCredentials

) ;

Use a port to connect to a SQL Server instance.

CREATE EXTERNAL DATA SOURCE SQLServerInstance2

WITH (

LOCATION = ‘sqlserver://YOUR_HOST_NAME:YOUR_PORT_NUMBER’ ,

CREDENTIAL = SQLServerCredentials

) ;

1.14 Create an external data source in SQL Server 2019 to reference Oracle

To create an external data source that references Oracle, ensure you have a database scoped credential.

— Create a database master key

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ” ;

— Create a database scoped credential with Azure storage account key as the

secret.

CREATE DATABASE SCOPED CREDENTIAL OracleProxyAccount

WITH

IDENTITY = ‘oracle_username’,

SECRET = ‘oracle_password’ ;

Create external data source to reference Oracle.

CREATE EXTERNAL DATA SOURCE MyOracleServer

WITH

( LOCATION = ‘oracle://145.145.145.145:1521’,

CREDENTIAL = OracleProxyAccount

) ;

1.15 Create an external data source to access data in Azure Storage

— Create a database master key if one does not already exist, using your own

password. This key is used to encrypt the credential secret in the next step.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ” ;

— Create a database scoped credential with Azure storage account key as the

secret.

CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential

WITH

IDENTITY = ” ,

SECRET = ” ;

— Create an external data source with CREDENTIAL option.

CREATE EXTERNAL DATA SOURCE MyAzureStorage

WITH

( LOCATION = ‘wasbs://daily@logs.blob.core.windows.net/’ ,

CREDENTIAL = AzureStorageCredential ,

TYPE = HADOOP

) ;

Windows Azure Storage Blob (WASB) is a file system implemented as an extension built on top of the HDFS APIs

1.16 Create an external data source to reference Hadoop

CREATE EXTERNAL DATA SOURCE MyHadoopCluster

WITH

( LOCATION = ‘hdfs://10.10.10.10:8020’ ,

TYPE = HADOOP ,

RESOURCE_MANAGER_LOCATION = ‘10.10.10.10:8050’

) ;

1.17 Create an external table for SQL Server

CREATE EXTERNAL TABLE dbo.customer(

C_CUSTKEY INT NOT NULL,

C_NAME VARCHAR(25) NOT NULL,

C_ADDRESS VARCHAR(40) NOT NULL,

C_PHONE CHAR(15) NOT NULL,

)

WITH (

LOCATION=’your_remote_database.dbo.customer’,

DATA_SOURCE=SqlServerInstance

);

1.18 Create an External Table for Azure Blob Storage

Create an external file format.

— In this example, the files are comma (,) delimited

CREATE EXTERNAL FILE FORMAT TextFileFormat WITH (

FORMAT_TYPE = DELIMITEDTEXT,

FORMAT_OPTIONS (FIELD_TERMINATOR =’,’,

USE_TYPE_DEFAULT = TRUE)

Create an external table.

— LOCATION: path to file or directory that contains the data

CREATE EXTERNAL TABLE [dbo].[customer] (

[customerId] int NOT NULL,

[customerName] varchar(20) NOT NULL

)

WITH (LOCATION=’/data/’,

DATA_SOURCE = AzureStorage,

FILE_FORMAT = TextFileFormat

);

1.19 Importing Data with CTAS

CTAS is the CREATE TABLE AS SELECT syntax.

For example:

CREATE TABLE Fast_Customers

WITH

(CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH (CustomerKey))

AS

SELECT DISTINCT

Insured_Customers.CustomerKey, Insured_Customers.FirstName,

Insured_Customers.LastName,

Insured_Customers.YearlyIncome, Insured_Customers.MaritalStatus

from Insured_Customers INNER JOIN

(

SELECT * FROM CarSensor_Data where Speed > 35

) AS SensorD

ON Insured_Customers.CustomerKey = SensorD.CustomerKey

1.20 Exporting Data with CETAS

CETAS is the CREATE EXTERNAL TABLE AS SELECT syntax.

For example, archiving/exporting relational data to Azure Blob Storage.

CREATE EXTERNAL TABLE [dbo].[FastCustomers2009]

WITH (

LOCATION=’/archive/customer/2009′,

DATA_SOURCE = AzureStorage,

FILE_FORMAT = TextFileFormat

)

AS

SELECT T.* FROM Insured_Customers T1 JOIN CarSensor_Data T2

ON (T1.CustomerKey = T2.CustomerKey)

WHERE T2.YearMeasured = 2009 and T2.Speed > 40;

1.21 View PolyBase Objects in SSMS

External tables are displayed in External Tables.

External data sources and external file formats are in the External Resources.

1.22 Summary

In this tutorial, you learned the following:

◊ PolyBase

◊ PolyBase vs. Linked Server

Follow Us

Blog Categories