top of page

Enter the power of databricks



I'm not going to lie, when I first watched an introductory video on Databricks, my jaw nearly hit the floor. Yes, the heart of a software developer beats beneath my chest but with that caveat acknowledged, this is still very cool stuff for anyone. Let's have a quick dive in!


In technology, like with any choice that we make in any other industry, there have always been tradeoffs between different factors that led to us not being able to have our cake and eat it too. Traditionally in the data warehouse world, the tradeoffs have been between performance and flexibility. In other words, it may be faster to have the data on the processing server but be limited in amount that we can have due to server capacity or visa versa with more data size flexibility and slower processing due to longer read times.


Believe it or not, data warehouses are not new. They go back to the 80's! The original data warehouse design was essentially taking disparate data sources then using an Extract, Transform and Load (ETL) process to transform the data into a standard form before saving it into a database called a Data Mart. From there, reporting tools would display the data in forms such as grids or using data visualization such as charts and graphs. You can view Data Marts like bottled water where the purification process is already done and the water is ready to be consumed but the cost comes in the form of size and scalability. In other words, the bottle is the same size regardless and you can quickly outgrow it.


The next leap for data warehouses came in 2011 where data storage space became much more affordable with the invention of the cloud. A concept called a Data Lake was created where the disparate data streamed into and were stored en masse in a very large storage space. The key benefit here was that the data was now essentially right where we wanted it and scalability (bottle size) was not an issue. One big challenge was the ability to perform ACID (Atomic, Consistent, Isolated and Durable) changes to the data. Imagine the page of a book that you were trying to read whilst someone else was changing the words at the same time. Consider, if a failure happened mid-change of the page and only part of a change has been made. What was done and where did we leave off? In the world of data, those are big issues!


Flash forward again to today where the cloud has matured allowing databricks to come to be. This is a group of technologies that revolutionize the Data Warehouse world. Rather than just a Data Lake we have a Delta Lake. Delta being the math symbol for change and often used within software to mean all changes to something such as data. Delta Lakes offer the same scalability as a Data Lake but with massively increased speed and the ability to perform ACID transactions. Another key feature is Apache Spark which contrary to traditional thinking where one would want the data as close to the data processing as possible to increase speed, now has the data and compute separated. This has actually increased speed and flexibility. As we know, one of the key benefits of the cloud is the ability to very quickly create new instances of servers and processes. Apache Spark takes full advantage of this by scaling the number of worker processes it employs as needed to create parallel processing and combining this with innovative caching. Caching is where you identify data that is used more often and keep a copy of that data closer to the data processing. The end result is blazing fast data processing that is very flexible yet secure and reliable.


Not only have Data Warehouses become far more powerful over time but they have become more affordable too. Thanks to technologies such as the cloud platforms and databricks, we are entering into a golden age for the Data Warehouse. It may sound dramatic but Small and Medium sized businesses are now able to take advantage of this enhanced decision making ability that previously was exclusive to the bigger companies. This, in turn, will drive revenue and ROI for them.


TMH Solutions provide cloud solutions to small and medium sized businesses. We would love to explore how we can help your business realize its Business Intelligence potential! Contact us to discuss your vision!

42 views0 comments

Recent Posts

See All

Comments


bottom of page