An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
Adding new requirements to a Synapse Spark pool can surface dependency conflicts or library-installation errors that cause previously working packages to fail. In Synapse, Python libraries for a Spark pool are resolved as a single Conda environment from the requirements file; if any package (including the new one) fails to install or conflicts with existing ones, the pool silently falls back to the base runtime and your previous requirements appear to “break.”
To add new requirements without breaking existing ones, follow these steps:
- Recreate the Synapse runtime environment locally
- Download the base Synapse runtime environment template YAML for the Spark version in use (for example, from the runtime documentation such as Apache Spark 3.4 GA).
- Create a local Conda environment from that YAML:
conda myenv create -f environment.yml conda activate myenv - This approximates the Synapse base runtime where the pool requirements are applied.
- Test the combined requirements locally
- Take the exact requirements file used by EV2 (the one that includes both
azure.kustoand the newazure.cosmosrequirement) and install it into the local environment:pip install -r requirements.txt - If installation fails locally, there is a dependency conflict between the base runtime and the requirements, or between
azure.kustoandazure.cosmos(or their transitive dependencies). Adjust versions or remove conflicting packages untilpip install -rsucceeds. - Only when the combined requirements install cleanly locally should they be deployed to the Synapse Spark pool.
- Take the exact requirements file used by EV2 (the one that includes both
- Deploy the corrected requirements to the Spark pool
- Update the pool-level requirements file (via EV2/ARM) with the validated set of packages.
- After updating, force the Spark pool to pick up the new libraries by using Force new settings on the pool. This ends all current sessions and restarts the pool so the new environment is applied.
- If using a DEP-enabled workspace
- Installing packages directly from public repos is not supported. Upload all required wheels (for example, for
azure.cosmosand any additional dependencies) as workspace libraries and attach them to the Spark pool instead of pulling from PyPI. - Use the same local-environment approach to discover all required wheel dependencies before uploading.
- Installing packages directly from public repos is not supported. Upload all required wheels (for example, for
- General guidance
- Any error in the requirements file (invalid package, incompatible version, or dependency conflict) causes the pool to revert to the base runtime, which can look like previously working requirements suddenly failing.
- Always validate the full, combined requirements set in a local environment that mirrors the Synapse runtime before updating the pool.
References:
- Manage libraries for Apache Spark in Azure Synapse Analytics
- Azure Synapse runtimes
- Troubleshoot library installation errors
- LibraryRequirements Class
- Microsoft.Synapse workspaces/bigDataPools 2021-06-01 (deployment-language-terraform)
- Microsoft.Synapse workspaces/bigDataPools 2021-06-01-preview (deployment-language-terraform)