To kick off this end-to-end project on Azure Databricks, I first created a dedicated resource group named Databricks_ETE1 to keep all related services organized and manageable. Within this resource group, I provisioned the core infrastructure required to support a medallion architecture and enable secure, scalable data processing.
Resources Created
daivikdatabricksete1 – Azure Storage Account:
This is the backbone of the project’s storage layer.
Used to create separate containers for raw (source), staging (bronze, silver), and curated (gold) data following the medallion architecture.
A managed identity resource that facilitates secure, credential-based access between Databricks and the Data Lake.
Essential for setting up Unity Catalog and assigning roles without manually managing keys or secrets.
databricks_ete – Azure Databricks Workspace:
The primary compute and analytics environment for building notebooks, managing data workflows, and running Spark-based processing jobs.
By grouping these resources under Databricks_ETE1, I ensured streamlined access control, cost management, and regional consistency — laying a solid foundation for building an enterprise-grade data lakehouse solution.
Data Lake Containers (Medallion Architecture)
To support a structured data flow, I organized my Data Lake into the following containers:
source – Raw, unprocessed external data for lineage and auditability.
bronze – Ingested raw data with minimal transformation.
silver – Cleaned and enriched data ready for analytics.
gold – Aggregated, business-ready data for reporting and ML.
metastore – Stores Unity Catalog metadata and managed table info. (This was created while setting up Unity Catalog)
This layered structure improves data reliability, security, and scalability in line with lakehouse best practices.
Databricks Admin Setup
To manage workspace and catalog configurations, I granted myself admin access via accounts.azuredatabricks.net. Access was configured using my Azure Principal Name to ensure proper authentication and role assignment.
Unity Catalog Configuration
To enable secure, centralized data governance, I configured a custom Unity Catalog metastore in Azure Databricks using the following steps:
Copied the Access Connector’s Resource ID from the Azure Portal.
Used it to complete metastore setup and linked it to the databricks_ete workspace in the same region.
External Location & Credential Setup
To enable Unity Catalog to securely access data stored in ADLS Gen2 containers, I configured external credentials and locations as follows:
Steps Taken:
Create External Credential:
In the Catalog section of Databricks, I created a new Credential. (Catalog -> External Location -> Credential)
Used the Resource ID of the Databricks Access Connector (databricks_ete_con) to authenticate access to the storage account securely via managed identity.
Register External Locations:
Linked each ADLS container to Unity Catalog using the credential:
This setup ensures all data layers are securely accessible for querying and processing through Unity Catalog — with fine-grained permission control and no need for key management