Welcome to the lake house
ENTERPRISE IT data is complete , unaltered , and has a traceable history .
“ In the world of data management , the question of whether to use one data lake for all data or multiple lakes is a common one ,” says James Serra , Data & AI Solution Architect at Microsoft . “ While using a single data lake is ideal , there are many reasons why organisations may opt for multiple data lakes .”
Serra is an expert in data warehousing and data management , with over 35 years of experience in data modelling , data governance , and development methodologies . Alongside his role at Microsoft , he is a popular blogger , author , and speaker , having presented at PASS Summit , SQLBits , Enterprise Data World conference , Big Data Conference Europe , and dozens of SQL Saturdays .
One main reason is organisational structure , where each department or team owns their own data . Additionally , having multiple lakes may be necessary to comply with multi-regional data residency requirements , subscription or service limits , quotas , and constraints , or to implement different policies for each data lake . Another important factor is security , where confidential or sensitive data needs to be kept separate from other data for security reasons .
Multiple data lakes can also help to improve latency by having a data lake in the
90 % Effective performance of some lake houses in comparison to traditional data warehouses
Welcome to the lake house
Data lake houses promise to combine the best of both worlds : the economics of scale and flexibility of the data lake , with the reliability and control of the data warehouse .
The cloud has revolutionised data storage and processing , enabling data lakes to deliver data warehouse-level performance . Open-source technologies like Spark , Drill , and Trino provide data transformation capabilities , while opensource data lakehouse table formats like Delta Lake and Apache Iceberg help structure the data .
These lakehouses are designed to behave and perform like data warehouses and can reach 80-90 % of
the performance of warehouses . The commercial ecosystem around these open-source formats is becoming the main battleground between big vendors .
" The data lake house will not replace data lakes or purpose-built data warehouses , but in the long run they will co-opt enterprise data warehouses ," says Tony Baer , Principal , dbInsight , a company that has carried out extensive research in this area . " Data lake houses will enable data lakes to perform and be controlled , governed , and secured like data warehouses ."
technologymagazine . com 117