Wednesday, February 17, 2021

DatabricksがGoogleCloudをDataLakehouseに駆り立てる

Databricksは、GoogleのBigQueryおよびAIプラットフォームと統合されたGoogle Cloud Platform(GCP)でのDatabricksの立ち上げにより、パブリッククラウドのサポートを締めくくりました。この拡張機能は、Databricksの製品に含まれる主要なパブリッククラウドプラットフォームの3つを保護します。Databricks rounded out its public cloud support with the launch of Databricks on Google Cloud Platform (GCP) with integrations to Google’s BigQuery and AI Platform. The extension secures the trifecta of major public cloud platforms included within Databricks’ offerings. The GCP extension allows Databricks customers to utilize the management, security, and login capabilities of GCP, and access GCP analytics within Databricks. It also allows Databricks to integrate with Google BigQuery’s open platform and leverage Google Kubernetes Engine (GKE), giving Databricks customers the ability to deploy in a containerized cloud environment for the first time.  This means Databricks can now instantiate a data “lakehouse” capable of data engineering, data science, machine learning, and analytics universally across the big three and provide customers with a “single source of truth” to run all of their data workloads, the company claims.  Joel Minnick, VP of marketing at Databricks, called the partnership a “natural fit,” citing the companies’  shared passion and commitment to open source. “We’ve long both been very data focused organizations and are very passionate about helping customers get the most out of their data,” Minnick said. Seventy-two percent of companies using the public cloud are using more than one cloud provider, according to a 451 Research survey. “As data is increasingly stored in multiple locations, enterprises need to process and analyze data across those multiple locations and there is an argument to be made in favor of a cloud-agnostic data platform layer,” said Matt Aslett, research VP of data, AI, and analytics at 451 Research. “Oftentimes customers don’t necessarily want to teach their data teams how to do the same thing in multiple different ways and instead want to begin to standardize on one platform with one set of tools that can span all the different environments where their data lives,” Minnick explained. “As customers are looking to adopt a multi-cloud strategy, finding ways to do things consistently across clouds is becoming increasingly important.” Minnick stressed the importance of having a system that’s flexible enough to accommodate all types of data as different sides of the data world begin to collaborate. Bearing that in mind, Databricks’ evolving strategy blurs the lines between data lakes and data warehousing, combining the best of both approaches into what the company has coined: The data lakehouse. The simplest description of a data lakehouse, Aslett writes, is that it is an environment designed to combine the data structure and data management features of a data warehouse with the low-cost storage of a data lake.   One of the key enablers of the lakehouse concept is a structured transactional layer. Databricks has delivered key capabilities to make this concept a reality with the launch of Delta Lake, which deploys atomicity, consistency, isolation, durability (ACID) transactions, and brings in serializability — the strongest level of isolation — to ensure data in the data lake is usable for downstream data science, machine learning, and business analytics, even in the event of errors and power failures. It also launched the Delta Engine, which is a high-performance query engine for query acceleration. Minnick said the idea of the data lakehouse is to build one system to store and manage data that can then run all data workloads on top of that system across analytics through ML. And, as a result, provide different teams access to the same set of data and tools to enhance collaboration. “That vision of this unification that’s happening in the data world is one that I think our partners in the cloud space are seeing just as clearly as we are,” Minnick added. The announcement comes on the close of a $1 billion Series G funding round that put the company at a $28 billion post-investment valuation. The latest funding round attracted an A-list of venture investors and the cloud computing elite including Franklin Templeton, BlackRock, Microsoft, Amazon Web Services (AWS), Salesforce Ventures, and CapitalIG, the growth fund for Google parent company Alphabet. Databricks’ funding was the latest for data lake-focused companies. “While all the cloud providers have offerings that compete with Databricks, to some extent, they are also keen to not only partner with, but also make strategic investments in data and analytics specialists to support customer choice,” Aslett said. 

Archive