Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article is Phase 2 of 4 in the Azure Synapse Spark to Microsoft Fabric migration best practices series.
Use this article to migrate your Spark workloads from Azure Synapse to Microsoft Fabric. This article covers running the Migration Assistant, refactoring code patterns that can't be automatically converted, and migrating Spark pool configurations, environments, and libraries.
In this article, you learn how to:
- Understand the migration workflow for standard (non-Git) and Git-enabled Synapse workspaces.
- Use the Spark Migration Assistant to migrate notebooks, Spark job definitions, and pools.
- Refactor Synapse-specific code patterns for Fabric compatibility.
- Migrate Spark pool settings, environments, and libraries.
- Identify and resolve library compatibility gaps between Synapse and Fabric.
Migrate with the Migration Assistant
The Spark Migration Assistant automates the migration of notebooks, Spark job definitions, pools, and lake database metadata from Synapse to Fabric. The assistant copies and transforms your items, but doesn't complete the migration—you still need to refactor code, reconcile configuration gaps, and validate the results.
For step-by-step instructions on running the assistant, see Spark Synapse to Fabric Spark Migration Assistant (Preview).
The assistant migrates the following items:
- Spark pools are migrated to Fabric Pools and corresponding Environment artifacts.
- Notebooks and their associated environments are migrated.
- Spark job definitions are migrated with associated environments.
- Lake databases are mapped to Fabric schemas; managed Delta tables are migrated via OneLake catalog shortcuts.
Important
Spark configurations, custom libraries, and custom executor settings aren't migrated by the assistant. You must configure these manually in Fabric Environments. Synapse workspaces under a VNet can't be migrated with the assistant.
Standard (non-Git) workspace migration
For workspaces where notebooks and SJDs are stored directly in Synapse (not in a Git repository):
Run the Spark Migration Assistant from your Fabric workspace (Migrate > Data engineering items). Select the source Synapse workspace and migrate all Spark items.
Validate dependencies: ensure the same Spark version is used. If notebooks reference other notebooks via
mssparkutils.notebook.run(), verify those were also migrated. The Migration Assistant preserves folder structure (Fabric supports up to 10 levels of nesting).Refactor code: replace
mssparkutilswithnotebookutils, replace linked service references with Fabric Connections, and update file paths. See the Refactor Spark code section for details.
Git-enabled workspace migration
For workspaces where notebooks and SJDs are stored in an Azure DevOps or GitHub repository, note that Synapse and Fabric use different Git serialization formats. Synapse stores notebooks as JSON; Fabric uses source format .py/.scala or .ipynb. You can't point a Fabric workspace at the same Synapse Git branch directly.
Migrate items. Use the Spark Migration Assistant to migrate notebooks and SJDs from the Synapse workspace to a Fabric workspace. This converts items to Fabric-compatible format.
Refactor code. Apply the same code refactoring as the standard scenario — replace
mssparkutils, update file paths, replace linked services. See the Refactor Spark code section for details.Connect Fabric workspace to Git. Connect your Fabric workspace to a new branch or folder in your repository (Workspace Settings > Source Control > Git Integration). Use a separate branch or folder from your Synapse content to avoid conflicts. Commit the Fabric workspace content to populate the new branch.
Set up deployment pipelines (optional). Configure Fabric deployment pipelines (Dev → Test → Prod) for ongoing CI/CD. Fabric supports auto-binding for default lakehouses and attached environments when deploying across stages.
Tip
Keep your Synapse Git branch intact as a historical reference. Create a new branch or folder for Fabric content. Fabric stores notebooks as source files (.py for PySpark) rather than JSON, which provides cleaner Git diffs for code review.
Refactor Spark code
After migrating your notebooks and Spark job definitions, you need to fix code patterns that the Migration Assistant can't convert automatically. This section guides you through replacing Synapse-specific APIs, updating file paths, and changing credential patterns to work with Fabric.
Pre-refactoring audit
Before addressing individual refactoring patterns, run a codebase-wide search across all notebooks to identify Synapse-specific code that needs changes.
| Search Pattern | Category | Action Required |
|---|---|---|
spark.synapse.linkedService |
Linked Services | Remove; replace with direct endpoint auth or Key Vault secrets |
getSecretWithLS |
Credentials | Replace with getSecret(vaultUrl, secretName) |
TokenLibrary |
Token/Auth | Remove; use direct OAuth config or notebookutils |
synapsesql |
SQL Connector | Replace spark.read.synapsesql() with Delta format reads |
mssparkutils |
Spark Utils | Replace with notebookutils (most APIs identical) |
spark.catalog.listDatabases |
Catalog API | Replace with spark.sql("SHOW DATABASES") |
spark.catalog.currentDatabase |
Catalog API | Replace with spark.sql("SELECT CURRENT_DATABASE()") |
spark.catalog.getDatabase |
Catalog API | Replace with spark.sql("DESCRIBE DATABASE ...") |
spark.catalog.listFunctions |
Catalog API | Not supported in Fabric — remove |
spark.catalog.registerFunction |
Catalog API | Not supported — use spark.udf.register() instead |
spark.catalog.functionExists |
Catalog API | Not supported in Fabric — remove |
LinkedServiceBasedTokenProvider |
Auth Provider | Replace with ClientCredsTokenProvider |
getPropertiesAsMap |
Linked Services | Remove; configure storage account directly |
spark.storage.synapse |
Linked Services | Remove — not supported in Fabric |
/user/trusted-service-user/ |
File Paths | Replace with OneLake path or shortcut path |
cosmos.oltp |
Cosmos DB | Update to use Key Vault for secrets instead of linked service |
kusto.spark.synapse |
Kusto/ADX | Replace linked service auth with accessToken via getToken() |
Tip
Run these searches across your entire notebook repository before migration. Notebooks with zero matches are safe to migrate as-is. Notebooks with matches should be prioritized for code refactoring using the detailed guidance in the following sections.
File path usage
Update Synapse notebooks that use relative paths or Synapse-managed storage paths to use direct abfss:// paths or OneLake paths in Fabric.
| Before (Synapse) | After (Fabric) |
|---|---|
"abfss://...@<synapse_storage>.dfs.core.windows.net/user/trusted-service-user/deltalake" |
"abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Tables/deltalake" |
spark.read.synapsesql("<pool>.<schema>.<table>") |
spark.read.format("delta").load("abfss://.../<lakehouse>/Tables/<table>") |
Tip
Replace all Synapse-managed storage paths with OneLake paths (abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<item_id>/...). For ADLS Gen2 data, create OneLake shortcuts and reference the shortcut paths instead.
Spark Catalog API
Fabric doesn't support several spark.catalog methods. Replace them with Spark SQL equivalents.
| Before (Synapse) | After (Fabric) |
|---|---|
spark.catalog.listDatabases() |
spark.sql("SHOW DATABASES").show() |
spark.catalog.currentDatabase() |
spark.sql("SELECT CURRENT_DATABASE()").first()["current_database()"] |
spark.catalog.getDatabase(db_name) |
spark.sql(f"DESCRIBE DATABASE {db_name}").show() |
spark.catalog.listFunctions() |
Not supported in Fabric — remove or skip |
spark.catalog.registerFunction(name, fn) |
Not supported in Fabric — use spark.udf.register() instead |
spark.catalog.functionExists(name) |
Not supported in Fabric — remove or skip |
Note
spark.catalog table methods such as createTable(), tableExists(), and listTables() work normally in Fabric. Only database-level and function-level catalog methods require refactoring.
MSSparkUtils and NotebookUtils
Replace mssparkutils calls with the Fabric notebookutils equivalents. The most common credential-related changes are:
| Before (Synapse) | After (Fabric) |
|---|---|
mssparkutils.credentials.getSecretWithLS("sampleLS", secretKey) |
notebookutils.credentials.getSecret("https://<vault>.vault.azure.net/", secretKey) |
TokenLibrary.getSecret("foo", "bar") |
notebookutils.credentials.getSecret("https://foo.vault.azure.net/", "bar") |
In Fabric, linked service-based secret retrieval (getSecretWithLS) isn't supported. Instead, reference the Key Vault URL directly by using notebookutils.credentials.getSecret(vaultUrl, secretName). The same pattern applies to TokenLibrary.getSecret() calls.
Note
Most mssparkutils.fs methods (for example, ls, cp, mv, rm, mkdirs, head) work identically as notebookutils.fs in Fabric. The primary changes are credential and secret methods, and notebook.run() path references.
Azure Data Explorer (Kusto) connector
Synapse notebooks that connect to Azure Data Explorer (Kusto) via linked services must be refactored to use direct endpoint authentication.
| Before (Synapse) | After (Fabric) |
|---|---|
.option("spark.synapse.linkedService", "AzureDataExplorer1") |
Remove linked service reference |
| Read with linked service option set | .option("accessToken", notebookutils.credentials.getToken("https://<cluster>.kusto.windows.net")) |
Replace the linked service option with an accessToken option. Use notebookutils.credentials.getToken() to obtain a token for your Kusto cluster endpoint. The rest of the query options (kustoDatabase, kustoQuery) remain unchanged.
Cosmos DB connector
Update Cosmos DB connections in Synapse that use linked services or getSecretWithLS.
| Before (Synapse) | After (Fabric) |
|---|---|
.option("spark.synapse.linkedService", "CosmosDbLS") |
Remove linked service reference |
mssparkutils.credentials.getSecretWithLS("cosmosKeyLS", "cosmosKey") |
notebookutils.credentials.getSecret("https://<vault>.vault.azure.net/", "cosmosKey") |
Replace the linked service reference with direct Cosmos DB endpoint configuration. Store the Cosmos DB account key in Azure Key Vault and retrieve it by using notebookutils.credentials.getSecret(vaultUrl, secretName) instead of getSecretWithLS().
Linked service references
Replace all Synapse linked service references in Fabric.
| Before (Synapse) | After (Fabric) |
|---|---|
spark.conf.set("spark.storage.synapse.linkedServiceName", ls_name) |
Remove — not supported in Fabric |
spark.conf.set("fs.azure.account.oauth.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider") |
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") |
TokenLibrary.getPropertiesAsMap(linked_service_cfg) |
Remove — use direct connection string or service principal config |
In Fabric, there are no linked services. Replace the Synapse token provider with standard OAuth client credentials (service principal). Configure fs.azure.account.auth.type, oauth.provider.type, client.id, client.secret, and client.endpoint directly by using spark.conf.set().
Token library
Synapse's TokenLibrary for getting tokens and reading linked service properties isn't available in Fabric. Replace it with equivalent patterns.
| Before (Synapse) | After (Fabric) |
|---|---|
TokenLibrary.getPropertiesAsMap(serviceConnection) |
Remove — configure storage account directly |
val my_account = conexion("Endpoint").toString.substring(8) |
val my_account = "<storage_account_name>" // Hardcode or retrieve via notebookutils |
mssparkutils.fs.head(internalPath, Int.MaxValue) |
notebookutils.fs.head(internalPath, Int.MaxValue) |
For OAuth-based ADLS Gen2 access, configure the service principal credentials directly by using spark.conf.set() with the storage account-specific keys (for example, fs.azure.account.auth.type.<account>.dfs.core.windows.net) instead of relying on linked service token providers.
Important
Review all notebooks for linked service references before cutover. Any remaining spark.synapse.linkedService, TokenLibrary, or getSecretWithLS calls fail at runtime in Fabric.
Spark job definition migration
Spark job definitions (SJDs) are batch job configurations that reference a main executable file (.py, .jar, or .R), optional reference libraries, command-line arguments, and a lakehouse context. While the Spark Migration Assistant handles SJD migration automatically, important differences between Synapse and Fabric SJDs require attention.
Key differences between Synapse and Fabric SJDs
Lakehouse context required. In Fabric, every SJD must have at least one lakehouse associated with it. This lakehouse serves as the default file system for Spark runtime. Any code that uses relative paths reads and writes from the default lakehouse. In Synapse, SJDs use the workspace default storage (ADLS Gen2) as the default file system.
Supported languages. Fabric supports PySpark (Python), Spark (Scala/Java), and SparkR. .NET for Spark (C#/F#) isn't supported in Fabric. You must rewrite these workloads in Python or Scala before migration.
Retry policies. Fabric SJDs support built-in retry policies, such as max retries and retry interval. This feature is useful for Spark Structured Streaming jobs that need to run indefinitely.
Environment binding. In Synapse, SJDs bind to a Spark pool. In Fabric, SJDs bind to an Environment, which contains pool configuration, libraries, and Spark properties. The Migration Assistant automatically maps Synapse pool references to Fabric Environments.
Scheduling. Fabric SJDs have built-in scheduling (Settings > Schedule) without requiring a separate pipeline. In Synapse, SJD scheduling requires a pipeline with a Spark Job activity. If you have Synapse pipelines that only trigger SJDs, consider using Fabric's built-in SJD scheduling instead of migrating the pipeline.
Import/export. Synapse supports UI-based JSON import and export for SJDs. Fabric doesn't support UI import or export. Use the Spark Migration Assistant or the Fabric REST API to create or update SJDs programmatically.
Refactor SJD code
The same code refactoring patterns in this article apply to SJD main files. Changes fall into two categories.
Source code changes (inside the .py, .jar, or .R main file):
- Replace
mssparkutilswithnotebookutilsfor credential and file system operations. - Update hardcoded file paths in code to OneLake
abfss://paths or shortcut paths, when needed. SJDs that use only relative paths against the default lakehouse might not require changes. - Replace linked service references in code with Key Vault secrets or Fabric Connections.
Note
DMTS Connections aren't yet supported in Fabric Spark job definitions (supported in notebooks only). If your SJD code uses DMTS, refactor to use direct endpoint authentication.
SJD configuration changes (in the Fabric SJD item settings):
- Verify that ADLS Gen2 paths referenced by main definition files are still accessible from the Fabric workspace. If files were stored in Synapse workspace-internal storage, re-upload them to the Fabric SJD or move them to an accessible ADLS Gen2 location.
- Verify all reference files (
.py,.R,.jar) are accessible after migration. Re-upload any files that were stored in Synapse workspace-internal storage. - If command-line arguments contain Synapse-specific paths or connection strings, update them to Fabric equivalents.
Migrate pools, environments, and libraries
After your notebooks and Spark job definitions are migrated, you need to decide on pool and environment strategy. This section explains when you can use Fabric Starter Pools (instead of migrating), when to create custom environments, and how to identify and resolve library compatibility gaps.
Spark pool migration
Fabric Starter Pools
Fabric Starter Pools provide seconds-level Spark session startup — a significant improvement over Synapse Spark pools, which require minutes-long cold starts to start clusters. Starter Pools are ready to use from the platform and require no configuration.
Tip
If your Synapse Spark pool has no custom configurations, no custom libraries, and no specific node size requirements beyond Medium—don't migrate the pool. Instead, let your notebooks and Spark job definitions use the Fabric workspace default Starter Pool settings. This approach gives you the fastest startup times and zero pool management overhead. Only create a Custom Pool or Environment when you have a specific need.
When to create a custom pool or environment
Create a Fabric custom pool and/or environment only when your workload requires:
- A specific node size (Small, Large, XLarge, XXLarge) different from the default Medium.
- Custom libraries (pip packages, conda packages, JARs, wheels) that aren't in the Fabric built-in runtime.
- Custom Spark properties (for example,
spark.sql.shuffle.partitions,spark.executor.memory) beyond the defaults. - Managed Private Endpoints for accessing private data sources (requires Custom Pools).
- A specific Spark runtime version different from the workspace default.
Configuration and library migration
Migrate Spark configurations and libraries to Fabric Environments.
For detailed steps on migrating libraries to Fabric Environments, see Migrate Spark Libraries from Azure Synapse to Fabric.
Export Spark configs. In Synapse Studio, go to Manage > Spark Pools > select pool > Configurations + Libraries > download as
.yml/.conf/.json.Import to Environment. In Fabric, create an Environment artifact. Go to Spark Compute > Spark Properties > Upload the exported
Sparkproperties.ymlfile.Migrate libraries. For pool-level libraries, upload packages (wheels, JARs, tars) to the Environment's library section. For PyPI/Conda packages, add them to the Environment's public library configuration.
Important
Workspace-level library settings in Fabric are deprecated. Migrate all libraries to Environment artifacts. The migration permanently removes existing workspace-level configurations—download all settings before enabling Environments.
Library compatibility: Synapse vs. Fabric
Fabric Runtime 1.3 (Spark 3.5) ships with 223 Python, 183 Java/Scala, and 135 R libraries built-in. Most Synapse libraries are available in Fabric, but there are gaps that can cause runtime failures if not addressed before migration.
To identify which libraries your notebooks actually use, run these checks before reviewing the gap tables:
- Python notebooks: Search for
importandfrom ... importstatements across all.py/.ipynbfiles. - Java/Scala notebooks and SJDs: Search for
importstatements and Maven coordinates; look for packages likecom.azure.cosmos.sparkorcom.microsoft.kusto.spark. - Export full dependency list: Run
pip freezein a Synapse notebook, compare against the Fabric Runtime 1.3 manifest. Only libraries that appear in both yourpip freezeoutput and the gap tables below need action. - Pool-level and workspace-level custom libraries: In Synapse Studio, go to Manage > Apache Spark Pools > select pool > Packages to see custom libraries that need to be reuploaded to a Fabric Environment.
Python libraries missing from Fabric
| Category | Libraries | Action |
|---|---|---|
| CUDA / GPU (9 libs) | libcublas, libcufft, libcufile, libcurand, libcusolver, libcusparse, libnpp, libnvfatbin, libnvjitlink, libnvjpeg | Not available—Fabric doesn't support GPU pools. Refactor GPU workloads to use CPU-based alternatives or keep on Synapse. |
| HTTP / API clients | httpx, httpcore, h11, google-auth, jmespath | Install via Environment: pip install httpx google-auth jmespath |
| ML / Interpretability | interpret, interpret-core | Install via Environment: pip install interpret |
| Data serialization | marshmallow, jsonpickle, frozendict, fixedint | Install via Environment if needed: pip install marshmallow jsonpickle |
| Logging / Telemetry | fluent-logger, humanfriendly, library-metadata-cooker, impulse-python-handler | fluent-logger: install if used. Others are Synapse-internal—likely not needed. |
| Jupyter internals | jupyter-client, jupyter-core, jupyter-ui-poll, jupyterlab-widgets, ipython-pygments-lexers | Fabric manages Jupyter infrastructure internally. These libraries are usually not needed in user code. |
| System / C libraries | libgcc, libstdcxx, libgrpc, libabseil, libexpat, libnsl, libzlib | Low-level system libs. Usually not imported directly. Only install if you have C extensions that depend on them. |
| File / concurrency | filelock, fsspec, knack | Install via Environment if used: pip install filelock fsspec |
Java/Scala libraries missing from Fabric
| Library | Synapse Version | Action |
|---|---|---|
| azure-cosmos-analytics-spark | 2.2.5 | Install as a custom JAR in the Fabric Environment if your Spark jobs use the Cosmos DB analytics connector. |
| junit-jupiter-params | 5.5.2 | Test-only library. Not needed in production notebooks. |
| junit-platform-commons | 1.5.2 | Test-only library. Not needed in production notebooks. |
R libraries
Only one difference: Synapse includes the lightgbm R package (v4.6.0) which isn't in Fabric. Install via Environment if needed. Fabric adds FabricTelemetry (v1.0.2) which is Fabric-internal.
Notable version differences
68 Python libraries exist on both platforms but with different versions. Most are minor version differences, but 17 have major version jumps that could affect behavior.
| Library | Fabric Version | Synapse Version | Impact |
|---|---|---|---|
| libxgboost | 2.0.3 | 3.0.1 | XGBoost API changes between v2 and v3. Test model training/prediction code. |
| flask | 2.2.5 | 3.0.3 | Flask 3.x has breaking changes. If serving Flask APIs from notebooks, test thoroughly. |
| lxml | 4.9.3 | 5.3.0 | Minor API changes. Test XML parsing workflows. |
| libprotobuf | 3.20.3 | 4.25.3 | Protobuf 4.x has breaking changes for custom proto definitions. |
| markupsafe | 2.1.3 | 3.0.2 | MarkupSafe 3.x drops Python 3.7 support but API is compatible. |
| libpq | 12.17 | 17.4 | PostgreSQL client library. Major version jump — test DB connections. |
| libgcc-ng / libstdcxx-ng | 11.2.0 | 15.2.0 | GCC runtime. Might affect C extension compatibility. |
Note
Synapse generally ships newer versions of system-level libraries (GCC, protobuf, libpq) while Fabric ships newer versions of data/ML libraries (more Python packages overall). If you need a specific version, pin it in your Fabric Environment configuration.
Tip
Run a quick compatibility check: export your Synapse pool's library list (pip freeze), compare against the Fabric Runtime 1.3 manifest, and pre-install any missing libraries in your Fabric Environment before running migrated notebooks. For a line-by-line comparison of every built-in library and version between Fabric and Synapse Spark runtimes, see the microsoft/synapse-spark-runtime GitHub repository.