Databricks End to End Project

Azure Setup

To kick off this end-to-end project on Azure Databricks, I first created a dedicated resource group named Databricks_ETE1 to keep all related services organized and manageable. Within this resource group, I provisioned the core infrastructure required to support a medallion architecture and enable secure, scalable data processing.

Resources Created

Databricks_ETE1
  1. daivikdatabricksete1 – Azure Storage Account:

    • This is the backbone of the project’s storage layer.

    • Used to create separate containers for raw (source), staging (bronze, silver), and curated (gold) data following the medallion architecture.

  2. databricks_ete_con – Azure Databricks Access Connector:

    • A managed identity resource that facilitates secure, credential-based access between Databricks and the Data Lake.

    • Essential for setting up Unity Catalog and assigning roles without manually managing keys or secrets.

  3. databricks_ete – Azure Databricks Workspace:

    • The primary compute and analytics environment for building notebooks, managing data workflows, and running Spark-based processing jobs.

By grouping these resources under Databricks_ETE1, I ensured streamlined access control, cost management, and regional consistency — laying a solid foundation for building an enterprise-grade data lakehouse solution.

Data Lake Containers (Medallion Architecture)

To support a structured data flow, I organized my Data Lake into the following containers:

  • source – Raw, unprocessed external data for lineage and auditability.

  • bronze – Ingested raw data with minimal transformation.

  • silver – Cleaned and enriched data ready for analytics.

  • gold – Aggregated, business-ready data for reporting and ML.

  • metastore – Stores Unity Catalog metadata and managed table info. (This was created while setting up Unity Catalog)

This layered structure improves data reliability, security, and scalability in line with lakehouse best practices.

Databricks Admin Setup

To manage workspace and catalog configurations, I granted myself admin access via accounts.azuredatabricks.net.
Access was configured using my Azure Principal Name to ensure proper authentication and role assignment.

Unity Catalog Configuration

To enable secure, centralized data governance, I configured a custom Unity Catalog metastore in Azure Databricks using the following steps:

Steps Taken:

  1. Create a Metastore:

    • Logged into accounts.azuredatabricks.net → Navigated to the Catalog section → Created Metastore.

    • Named: databricks_ete_metastore

    • Region: useast (must match Databricks workspace region)

  2. Provide ADLS Gen2 Path for Managed Tables:

    • Created a new container named metastore in my storage account daivikdatabricksete1.

    • Set the storage path to:
      abfss://metastore@daivikdatabricksete1.dfs.core.windows.net/

  3. Assign IAM Role to Access Connector:

    • In Azure Portal, navigated to Storage Accountdaivikdatabricksete1Access Control (IAM).

    • Selected Add Role AssignmentStorage Blob Data Contributor.

    • Assigned to: Managed Identity

    • Selected identity: databricks_ete_con (Azure Databricks Access Connector)

  4. Attach the Metastore to Databricks Workspace:

    • Copied the Access Connector’s Resource ID from the Azure Portal.

    • Used it to complete metastore setup and linked it to the databricks_ete workspace in the same region.

External Location & Credential Setup

To enable Unity Catalog to securely access data stored in ADLS Gen2 containers, I configured external credentials and locations as follows:

Steps Taken:

  1. Create External Credential:

    • In the Catalog section of Databricks, I created a new Credential. (Catalog -> External Location -> Credential)

    • Used the Resource ID of the Databricks Access Connector (databricks_ete_con) to authenticate access to the storage account securely via managed identity.

  2. Register External Locations:

    • Linked each ADLS container to Unity Catalog using the credential:

      • bronze_ext_locationabfss://bronze@daivikdatabricksete1.dfs.core.windows.net

      • silver_ext_locationabfss://silver@daivikdatabricksete1.dfs.core.windows.net

      • gold_ext_locationabfss://gold@daivikdatabricksete1.dfs.core.windows.net

      • source_ext_dataabfss://source@daivikdatabricksete1.dfs.core.windows.net

This setup ensures all data layers are securely accessible for querying and processing through Unity Catalog — with fine-grained permission control and no need for key management

Description