Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article is Phase 1 of 4 in the Azure Synapse Spark to Microsoft Fabric migration best practices series.
Start here before you migrate any notebooks, Spark job definitions, pools, or lake metadata. This article helps you assess the scope of your Synapse Spark estate, choose a migration approach that matches your risk tolerance and delivery timeline, and understand the Fabric differences that affect planning.
By the end of this step, you should know what needs to move, which migration pattern to use, where the main compatibility risks are, and what rollback or parallel-run constraints you need to account for.
In this article, you learn how to:
- Assess your Synapse Spark footprint.
- Choose between lift-and-shift, phased modernization, and parallel run.
- Account for rollback and synchronization constraints.
- Review key feature and architecture differences between Synapse Spark and Fabric Spark.
Assess your Synapse Spark footprint
Azure Synapse Analytics encompasses multiple workload types. This guide focuses on migrating Spark pools, notebooks, Spark job definitions, lake databases, and Hive Metastore metadata to Microsoft Fabric. For dedicated SQL pool, pipeline, Data Explorer, and security migration guidance, refer to the companion guides.
| Synapse Workload | Fabric Destination | Migration Tool/Path |
|---|---|---|
| Spark Pools | Fabric Spark (Lakehouse) | Spark Migration Assistant (preview); manual pool/env migration |
| Notebooks | Fabric Notebooks | Spark Migration Assistant; code refactoring for Synapse-specific APIs |
| Spark job definitions | Fabric Spark job definitions | Spark Migration Assistant (recommended); manual recreation if needed |
| Lake Databases | Fabric Lakehouse catalog | Spark Migration Assistant (Delta tables via shortcuts); HMS export/import for non-Delta |
| Hive Metastore | Fabric Lakehouse catalog | HMS export/import notebooks; OneLake shortcuts for data |
| Linked Services | Fabric Connections / Key Vault | Create Fabric Connections; migrate secrets to Key Vault; refactor notebook code |
Run the Fabric Assessment Tool
Before planning your migration, run the Fabric Assessment Tool to generate a comprehensive report of your Synapse source workspace. The tool scans your workspace and aggregates a summary of all objects — Spark pools, notebooks, Spark job definitions, lake databases, linked services, and their configurations — giving you a clear picture of the migration scope.
Download the tool. The Fabric Assessment Tool is available in the Microsoft fabric-toolbox GitHub repository at microsoft/fabric-toolbox.
Run the assessment. Point the tool at your Azure Synapse workspace. It scans all Spark-related items and produces a report with object counts, configurations, dependencies, and potential compatibility issues.
Review the report. Use the assessment output to understand the scope of your migration: how many notebooks, pools, SJDs, and databases need to be migrated, which linked services are in use, and what potential blockers exist (GPU pools, unsupported features, and others).
Tip
Run the assessment tool early in your planning process. The report helps you estimate effort, identify blockers, and prioritize which workloads to migrate first. It also serves as the baseline inventory for Phase 1 of the migration checklist.
Migration patterns
Choose your migration pattern based on your organizational constraints, risk tolerance, and timeline.
Lift-and-shift pattern
Migrate all Spark workloads at once using the Migration Assistant with minimal changes. Focus on getting notebooks and jobs running in Fabric as quickly as possible — refactor only what breaks (linked services, file paths, unsupported APIs). Accept the current architecture as-is.
Use lift-and-shift when:
- Your Synapse workspace is being decommissioned on a fixed deadline and you need to move fast.
- Your Spark workloads are already well-architected (Delta-first, clean code, few linked service dependencies).
- Your workspace footprint is manageable for a one-shot migration and your team can handle the refactoring effort in a single sprint.
- Downstream consumers (Power BI, APIs) can tolerate a brief switchover window.
Phased modernization
Migrate workloads incrementally by priority, re-architecting as you go. Start with the highest-value or lowest-risk workloads first. As you migrate each batch, consolidate Spark pools into fewer Environments, adopt Lakehouse best practices (Delta-first, V-Order for BI consumers), enable NEE, and redesign for Direct Lake.
Use phased modernization when:
- You have a large or complex Synapse environment with multiple teams and diverse workloads that can't be migrated in one shot.
- Your current architecture has technical debt you want to address (non-Delta formats, mount-point dependencies, sprawling Spark pools).
- You have flexibility on timeline and want to improve performance and cost efficiency during migration.
- Different workloads have different owners and need independent migration schedules.
Parallel run pattern
Run both environments simultaneously during transition. Route new Spark workloads to Fabric while legacy workloads continue on Synapse. Validate migrated workloads by comparing results side-by-side before cutting over. Gradually decommission Synapse as confidence builds.
Use a parallel run when:
- Your workloads have strict SLAs or regulatory requirements that demand extended validation before cutover.
- You need to prove Fabric performance meets or exceeds Synapse before stakeholders approve decommission.
- Your downstream consumers (dashboards, APIs, ML models) can't tolerate any discrepancy during transition.
- You're migrating production pipelines where incorrect results have high business effect (financial reporting, compliance).
Parallel run introduces a data synchronization problem that you must design for up front. Choose one of these patterns:
- Shared storage layer: Have both Synapse and Fabric read and write to the same ADLS Gen2 storage through OneLake shortcuts. This keeps both platforms on the same Delta files, but you must prevent write conflicts by ensuring only one platform writes to a given table at a time.
- Write-once, read-both: Keep Synapse as the primary writer during transition and let Fabric read the same data through shortcuts. After you validate the migrated notebooks in Fabric, switch the write-to path to Fabric and make Synapse the read-only consumer until decommission. This is the safest option for most migrations.
- Dual-write: Avoid running the same ETL in both environments at the same time unless you already have automated comparison and reconciliation tooling. Dual-write tends to create divergence, duplication, and operational overhead.
Parallel run also affects change management. While Synapse remains the active development environment, any notebook, Spark job definition, Spark pool configuration, or lake database schema changes made in Synapse aren't reflected automatically in Fabric. You must re-migrate the affected assets to keep both environments aligned.
- Notebook code changes: Re-run the Spark Migration Assistant or manually re-export and re-import the updated notebooks. Reapply any Fabric-specific code refactoring, including
notebookutils, file path updates, and Key Vault secrets. - Spark job definition changes: Re-migrate through the Migration Assistant or manually recreate the updated SJDs in Fabric.
- Spark pool configuration changes: Update the corresponding Fabric Environment to match the revised node size, autoscale settings, and libraries.
- Lake database schema changes: Re-run the HMS export/import notebooks, or manually create or alter the affected tables in the Fabric lakehouse.
To reduce re-migration overhead, establish a change freeze on the Synapse side once migration begins. If changes are unavoidable, keep a change log so you can replay them in Fabric before cutover.
Rollback considerations
Synapse-to-Fabric migration is a copy operation — it doesn't modify or delete your source Synapse workspace. Your original Spark pools, notebooks, and data remain intact throughout the process. This makes rollback straightforward:
- If migration results are unsatisfactory, continue using your existing Synapse workspace. No changes need to be reverted.
- Delete the migrated Fabric artifacts (notebooks, environments, Spark job definitions) and retry after addressing issues.
- OneLake shortcuts point to your existing ADLS Gen2 storage — removing shortcuts doesn't affect the underlying data.
- Don't decommission your Synapse workspace until all migrated workloads are validated in Fabric and downstream consumers are rerouted.
Tip
Start small and prove viability quickly. Pick a representative Spark workload and migrate it end-to-end — from pool setup through notebook refactoring to validation. Choose something that exercises your most common patterns (data access, linked services, catalog operations) but is low-risk enough to iterate on. Document the steps, issues encountered, and resolutions to build a repeatable process for subsequent migrations.
Feature parity and key differences
Understanding the architectural differences between Synapse and Fabric is critical for planning. The following tables highlight key differences in compute architecture and Spark capabilities.
For the full comparison, see Compare Fabric and Azure Synapse Spark: Key Differences.
Compute and architecture
| Capability | Azure Synapse | Microsoft Fabric |
|---|---|---|
| Deployment model | PaaS (configure and manage resources) | SaaS (capacity-based, no infrastructure management) |
| Compute model | Spark pools (node-based); requires minimum 3 nodes | Capacity Units (CU) shared across all workloads; Spark pools as config templates; single-node execution supported; Autoscale Billing for Spark (pay-per-use, similar to Synapse model) |
| Spark engine | Synapse Spark pools (Spark 3.4, 3.5); GPU pools supported | Fabric Spark (Runtime 1.2/1.3/2.0: Spark 3.4–4.0); no GPU support; runs on latest-generation hardware for improved performance |
| Scaling | Node autoscale for Spark (min 3 nodes) | Node autoscale for Spark (single-node minimum); capacity-based scaling |
| Session startup | Pool-based; cold start for new clusters | Starter Pools (seconds-level startup); Custom Live Pools; High Concurrency mode |
| Cost model | Per-node-hour (Spark); pause/resume | Two options: (1) Fabric Spark uses a Capacity Unit (CU)-based shared consumption model, or (2) Autoscale Billing for Spark – pay as you go Spark mode |
Spark: Synapse Spark vs. Fabric Spark
| Capability | Synapse Spark | Fabric Spark |
|---|---|---|
| Spark versions | Spark 3.4 (EOL), 3.5 (Preview). | Spark 3.4 (RT 1.2 EOL), 3.5 (RT 1.3 GA), 4.0 (RT 2.0 Preview) |
| Query acceleration | No native acceleration engine | Native Execution Engine (Velox/Gluten, up to 4x on TPC-DS) |
| Pool model | Fixed pools with max node count per pool; minimum 3 nodes | Starter Pools (seconds-level startup, no configuration needed); Custom Pools for specific node sizes and custom libraries; single-node execution supported |
| Security (network) | Managed virtual network; Private Endpoints | Managed Private Endpoints (MPE); Outbound Access Policies (OAP); Customer-Managed Keys (CMK) |
| GPU support | GPU-accelerated pools available | Not supported |
| High concurrency | Not supported | Supported: multiple notebooks share one Spark session |
| Library management | Pool-level and workspace-level libraries; manual upload of wheels, JARs, tar.gz | Environment-based library management: public feeds (PyPI/Conda) + custom uploads (wheels, JARs). To replicate Synapse workspace-level libraries, create an Environment with the required libraries and set it as the workspace default. All notebooks and SJDs in the workspace inherit it automatically. |
| V-Order | Not available | Write-time Parquet optimization; 40–60% improvement for Power BI Direct Lake and ~10% for SQL analytics endpoint; no Spark read benefit; 15–33% write overhead |
| Optimize Write | Disabled by default | Enabled by default |
| Default table format | Parquet (Delta optional) | Delta Lake (default and required for Lakehouse tables) |
| Hive Metastore | Built-in HMS; external HMS via Azure SQL DB or MySQL (deprecated after Spark 3.4) | Fabric Lakehouse catalog; HMS migration via export/import scripts |
| DMTS in notebooks | Supported | Supported in notebooks; not yet supported in Spark job definitions |
| Managed identity for KV | Supported | Supported in notebooks and Spark job definitions |
| mssparkutils | Full library (fs, credentials, notebook, env, lakehouse) | notebookutils (similar API; some differences in method names) |