Nidhi Gupta
3 min readDec 9, 2023

--

Azure Synapse Series1: Demystifying The Power Of Azure Synapse Analytics

A cloud based integrated analytics service that brings together big data and data warehousing.

With the increase demand of big data processing Azure synapse helps us to bring spark, sql, azure data factory and azure data bricks all together at common workspace called as Azure synapse analytics workspace.

Before my readers get loaded with a lot of questions on synapse. Hold on!!! Let’s understand more detailing on synapse its working ,how we query data , how data is stored etc.…

  • Similar to how we create other services in azure same way just go for creating the synapse workspace.
  • While creating synapse workspace we have to provide gen2 details so while creating the workspace we have the storage as well.
  • Synapse storage will be same as azure storage account.

Types of workload in azure synapse

  1. Synapse pipeline(ADF): Synapse pipeline is same as azure data factory just here the pipeline will be in synapse workspace.
  2. Synapse spark(Notebook + Monitoring): Synapse spark provide facility to deal with big data processing.
  3. SQL Pool(Serverless/Built-in & Dedicated):We can query our data using sql pool.

Dedicated SQL Pool

  1. Costing based on the storage consumption.
  2. Ability to create table and external table both.
  3. Uses MPP Engine.

MPP (Massive Parallel Processing)-

  1. Primarily emphasis parallel processing for large-scale data analytics.
  2. Tasks executed concurrently across multiple nodes.
  3. Complex queries on large datasets.
  4. Scalibility,parallelization, and efficient data handling of large datasets are key characteristics of MPP engine.
  5. Example: Azure Synapse Analytics, Amazon Redshift, and Google BigQuery

Serverless SQL Pool

  1. No reserved capacity, cost based on pay-as-you-go.
  2. Ability to create external table only
  3. Distributed Query Processing Engine

Distributed Query Processing Engine-

  1. It involve the distribution of query tasks across multiple nodes, but the emphasis may be on broader distributed computing aspects beyond analytics, including data storage and retrieval.
  2. Handle data distribution, storage, and retrieval in a distributed manner.
  3. Example: Apache Spark

Note:MPP engines can be seen as subset of distributed query processing engines, with a specific focus on parallel processing for analytics workloads. The terms are sometimes used interchangeably, but the distinction lies in the emphasis on analytics and data processing efficiency within the MPP context.

Tables in Synapse

There are two types of table in azure synapse:

(i) Table

Table in synapse are same as table in sql db., data arranged in rows and columns in tabular format.

(ii) External Table

External tables are the tables in synapse where structure/metadata of the table is defined at the synapse storage but data is placed at the external storage(ex: azure storage(DataLake gen2)).

There are two types of external table

  1. Hadoop: We can change by parameter Type=Hadoop
  2. Native: Default format is native.
  • In Serverless SQL Pool we can create external tables only.
  • In Dedicated SQL Pool we can create both table and external tables.

Note: Azure Synapse Series2 soon will be published for practical implementation of creating external table with distribution.

Thanks for the read🙏🙏. Do clap 👏👏if find useful😊😊.

“Keep learning and keep sharing knowledge”

--

--

Nidhi Gupta

Azure Data Engineer 👨‍💻.Heading towards cloud technologies expertise✌️.