Context and Challenge
For over 17 years, our client specializes in offering 3pl services to various industries across Europe. Since its foundation they have been heavily investing in advanced automation of their warehouses becoming a disruptor in the market, especially with the rise of e-commerce. To maintain their high business standards and ensure their leadership remains unmatched, they required a modern analytics and BI system. This would enable them to deliver even more optimized and personalized services to their clients.
As a highly technology-driven business, their numerous systems tracked nearly every aspect of warehouse operations, client service, and billing. However, over time, the massive volume of data became increasingly difficult to manage and analyze. The existing solutions they relied on were unable to effectively address the following challenges:
- Scattered Data: Key operational data was distributed across multiple systems, including Warehouse Management, ERP, and HR systems, making unified reporting a cumbersome process.
- Complex Invoicing: Ensuring each service was correctly billed required cross-referencing different systems, leading to potential errors.
- Need for Advanced Insights: To optimize operations and make data-driven decisions, the company needed high-level analytics and real-time dashboards.
To address these issues, the company needed to:
- Centralize Data: Create a single data lake environment that pulls, stores, and organizes all relevant data.
- Enhance Data Quality: Improve data accuracy and consistency to avoid billing inaccuracies and potential revenue leakage.
- Enable Advanced BI Provide management and operational teams with user- friendly, insightful dashboards built on best-in-class analytics Power BI.
Our Approach (Solution Overview)
After a thorough analysis of the clientʼs needs, we designed the following solution to address
their challenges and provide a robust BI system:
Incremental Data Ingestion
- Data from all source systems was fetched and converted into Parquet files, an open-source, column-oriented file format offering high- performance compression and encoding for efficient bulk data storage and retrieval.
- ETL pipelines were established using Databricks notebooks and orchestration to handle incremental data loads from each source system, ensuring optimal performance and scalability.
Data Lakehouse
The storage solution implemented Medallion Architecture on Azure Data Lake Gen2 and Databricks, organizing data into:
- Bronze Layer: Raw data directly ingested from source systems. The purpose of this layer is to provide an historical archive of source data, data lineage, auditability, and reprocessing if needed.
- Silver Layer: The silver layer. In the Silver layer, the data from the Bronze layer is matched, merged, conformed and cleansed so that the Silver layer can provide an “Enterprise view” of all its key business entities, concepts and transactions.
- Gold Layer: Business-ready, enriched data optimized for advanced analytics and reporting.
Materialized Views and Prepared Datasets
- Using dbt (industry standard for data transformation), most of the business logic, calculations and transformations were executed within Databricks.
- Materialized views and pre-aggregated datasets were created, reducing query complexity in Power BI and significantly enhancing dashboard performance.
Power BI for Advanced Reporting
- Using dbt (industry standard for data transformation), most of the business logic, calculations and transformations were executed within Databricks.
- Materialized views and pre-aggregated datasets were created, reducing query complexity in Power BI and significantly enhancing dashboard performance.
Detailed Execution (Process + Timeline)
This complex project required substantial involvement from business representatives, who acted as key interpreters of business flows, objects, statuses, and other definitions. Fortunately, our client had a clear vision of their needs, supported by comprehensive documentation, which enabled us to kickstart the project efficiently.
Key Project stages:
- Requirements Walk-Through
Our team reviewed the clientʼs requirements in detail to ensure a complete understanding of their needs. This thorough analysis provided all the information necessary to start the project with confidence. - Infrastructure and Platform
Since the client was already using Azure, we built the solution leveraging Azure tools, including Azure DevOps for CI/CD and Azure Data Lake Gen2 for data storage, ensuring seamless integration with their existing ecosystem. - Data Capturing, Transformation and Storage
Data was updated daily by capturing changes from all source systems and storing them as Parquet files in the data lake. Using Databricks Jobs and Notebooks, raw data was processed and moved through the Medallion Architecture layers (Bronze, Silver, and Gold). Final datasets were prepared using dbt, transforming and storing them in the Gold Layer, ready for analytics and reporting. - Power BI Service and Reports
Data visualization and reports were built using Power BI Service, leveraging the prepared datasets in the Gold Layer for easy and efficient reporting. Reports were tailored based on type and purpose, with some shared directly with the clientʼs customers. Customers, impressed by the high- quality reports, placed additional orders for custom reports to address their specific needs.
The project was completed in 30 weeks and continues to evolve, with ongoing efforts to add more complex reports for the client and build custom reports for their customers.
Results and Impact
Quantitative Results
- Data Consistency & Accuracy:
Achieved a 98 100% consistency rate in financial records, reducing invoice disputes by 30%. - Eliminated revenue loss
Our client previously faced challenges in accurately accounting for all services and invoicing them. With the implementation of the new BI system, 100% of services are now accurately accounted for and billed. - Faster Insights
BI reports that previously took days to compile can now be generated on- demand in seconds. - Reduced Manual Effort
Automated data feeds saved each department 5 10 hours per week.
Qualitative Benefits
- Improved Confidence in Billing
The finance team can accurately invoice clients, mitigating revenue leakage. - Enhanced Decision-Making
Operational managers can pinpoint inefficiencies in warehouse processes and address them promptly. - Scalable Foundation
The medallion-based lakehouse architecture supports future data sources and advanced analytics (e.g., predictive, ML-based insights).
Key Takeaways and Lessons Learned
Success Factors
- Robust Data Architecture
Adopting the Medallion approach ensured data was refined incrementally, reducing errors early. - Close Stakeholder Collaboration
Frequent feedback loops with finance and operations minimized misaligned requirements. - Security & Compliance
Proactive focus on data access controls and encryption prevented governance issues.
Challenges Overcome
- Complex Legacy Systems
Some older WMS/ERP modules required custom connectors and data format adjustments. - High Data Volume
Optimizations in Databricks and careful partitioning in Azure Data Lake handled large daily data ingestions.
Industry Insights
- Importance of Data
In today’s market, staying competitive is impossible without using your data to continuously analyze opportunities for greater efficiency and effectiveness. - Data Lakehouse Maturity
Combining the best of data lakes and data warehouses (lakehouse) gives logistics companies a flexible yet structured approach to data. - Business Intelligence Adoption
Power BIʼs user-friendly interface accelerates analytics adoption across non-technical teams.