Discover data

Azure Databricks provides a suite of tools and products that simplify the discovery of data assets that are accessible through the Databricks Data Intelligence Platform. This article provides an opinionated overview of how you can discover and preview data that has already been configured for access in your workspace.

Topics in this section focus on exploring data objects and data files. If you're looking for information about working with assets such as notebooks, SQL queries, libraries, and models, see Workspace UI.

If you're seeking guidance around generating summary statistics for datasets or other tasks associated with exploratory data analysis (EDA), see Exploratory data analysis on Azure Databricks: Tools and techniques.

How can you discover data assets?

Data discovery tools on Azure Databricks fall into the following general categories:

  • AI-assisted insights, summary, and search.
  • Keyword search.
  • Catalog exploration using the UI.
  • Programmatic listing and metadata exploration.

Data discovery tools are optimized for data governed by Unity Catalog. Data assets that have not been registered as Unity Catalog objects might not be discoverable using some of these approaches.

Find data using the UI

  • Genie: Browse assets shared with you, search by name, ask data questions in natural language, and filter by domain. See Use the Genie interface.
  • Discover page: A curated browsing experience that lets you explore data assets organized by domains. Curators can highlight key assets for their organization, and consumers can browse by domain or asset type. See Discover page and domains.
  • Catalog Explorer: Provides tools for exploring and governing data assets. Access Catalog Explorer using the Data icon. Catalog in the workspace sidebar. Use the Insights tab to learn how data is being used in your workspace. See What is Catalog Explorer? and View frequent queries and users of a table.
    • Notebooks and SQL editor: Also provide a catalog navigator for exploring database objects. Click the Catalog icon in the editor sidebar to expand or collapse the catalog navigator without leaving your code editor.

Explore data programmatically

You can use the SHOW command on all database objects to discover assets registered to Unity Catalog. Use the LIST command, the %fs magic command, or Databricks Utilities to list files.

See Explore storage and find data files and Explore database objects.

Review data comments

You can review comments to learn about the contents of datasets available in your lakehouse. Comments can be set on data objects including catalogs, schemas, tables, and columns. You can view comments in Catalog Explorer or using the DESCRIBE command for an object.

Catalog Explorer can provide AI-generated comments for tables, which makes it easy for data asset owners to provide a rich overview of datasets. See Add AI-generated comments to Unity Catalog objects.

Users can also optionally provide comments on tables and other database objects using markdown, which is rendered in Catalog Explorer. See Add comments to data and AI assets.

Search for tables in your lakehouse

You can use the search bar in Azure Databricks to find tables registered to Unity Catalog. You can either perform a keyword search or use semantic search to find datasets or columns that relate to your search query. Search only returns results for tables that you have permission to see. Search reviews table names, column names, table comments, and column comments. See Search for workspace objects.