Interview Mike Olson - Cloudera: Four Reasons why Open Source is Ready for the Enterprise
"If you go back in time 10 years you would see that a lot of CIO’s believed there was something bad about open source. In general, we now see a diminishing fear of open source in the market. I use that word intentionally. Executives would bring up that open source wasn’t professionally developed, that it was not developed for companies. That is flatly no longer true. Nowadays, open source software is regulation compliant, and allows CIO’s to fully take advantage of the pace of innovation". In this article we share four reasons why open source software will rule the enterprise software market.
Recently, Mike Olson, the co-founder of Cloudera, took some time for an interview about the developments within the Hadoop ecosystem. In 2008, Cloudera was the first commercial distribution of Hadoop. Lots has happened since then, for Mike, there is no doubt about the future of open source.
Enterprise will bring in that open source software is not regulatory compliant, not secure, and therefore not safe to use. Mike Olson: "We can point to very large scale, very secure implementations, compliant with rigorous regulatory regime requirements, of the open source platform in mission critical applications. Cloudera is the only Hadoop platform that passed PCI Data Security Standard certification to store personally identifiable data worldwide compliant with regulatory requirements". Mike emphasized that there is no reason at all for CIO's to be concerned with security and compliance. They are not so much concerned with open source and the development, they are more concerned with enterprise requirements that they put on data management platforms.
2. Avoiding Single Vendor Proprietary Lock-in
If there is one thing that CIO’s detest, it is single vendor proprietary lock-in, they want to be able to choose the vendor that they work with by continuously taking advantage of the pace of innovation that a global open source developer community drives. Nowadays, CIO’s demand open source solutions, because with open source tooling a CIO is able to change software when something better is available.
3. Platform Software Needs to be Open Source
Mike Olson has been active in the data management space since the early '80's. He has seen the market growing and the way that software is developed changing. "In the last 10 years, no meaningful proprietary platform software (database, operating system) has emerged', Mike says. "I am confident that the law of physics now is that it is only possible to successfully launch a platform software if it is open source." "What used to be proprietary becomes open source. This is a trend across every single category; databases, operating systems, middleware. Think about it: JBoss for middleware, Linux for operating system, MySQL, Postgres and Hadoop for Data Management." So no more proprietary software? No, not quite, Open Source offers great opportunity, but there is certainly room for proprietary software, especially as an important driver for innovation. While open source communities have been great at building platform software, they haven’t been generally that great in building business applications. If you think of great analytics products or ERP products, in general, these are proprietary applications built on top of open source platforms.
4. Open Source Enables New Use Cases
Mike Olson is not predicting that Hadoop will be the one database to rule them all. "In some cases, existing workloads from other systems can move in, that is possible. Typically, Global 8000 enterprises have been data driven for years, these are companies that have been using data well. There are good data warehouse, OLTP and other systems that enterprises have been relying on. Large enterprises have been using dashboards and reports for many, many years. There is now an opportunity for them to not only be reactively reporting on historic and current state only, but to become predictive". In this process, Hadoop is enabling great new use cases. To predict what is going to happen next, and how organizations need to change their behavior to take advantage of these opportunities. Hadoop was designed to handle these advanced analytic and large scale data processing workloads. "Integrating with existing systems is important, so you are not forced to move all of your existing infrastructure".
The Future of Hadoop
Hadoop has now been available for around 8 years, in which the developments within the platform have been enormous. What will happen with Hadoop in the future? According to Mike Olson, the future will be determined by the further development of the technologies and applications within the ecosystem. Platform Developments
"At the beginning of Cloudera’s life, Hadoop was really just a storage layer, HDFS, and a process and compute layer, MapReduce. This offered only one single way to work with data. A lot of innovation happened since then, so these days when we talk Hadoop, we mean a collection of processing and analytics capabilities on a shared store. Examples are HBase taking a substantial share of the SQL workload market, Cloudera Impala (now Apache Impala), as an open source massive scale analytics data processing engine, Cloudera Search which has been built on Lucene and Solar technology. We have seen innovation like Apache Spark in the market, what is the next Spark that is going to emerge in the Hadoop ecosystem?" At the moment, Cloudera sees equivalent innovation at the storage layer, and this excites Mike a lot: HDFS and HBase are all over available for data storage. Apache Kudu is now a fast growing project, and is addressing a new kind of workload in that market.
Another important thing is the innovation from applications on top of Hadoop. For Cloudera it is very important to encourage partners to provide services, applications, and hardware, that makes it easier for customers to consume the platform. "We see partners building solutions ranging from mobile telephony systems to cybersecurity or new analytics and reporting solutions. Cloudera and Hadoop themselves don’t offer these kind of solutions. The role of Cloudera is to offer supplies a stable and highly scalable platform to run these applications on top of". Our partners educate customers, help them identify use cases that are best suited to the big data platform, and really roll out successful deployments.
Follow us for more of this
Testing and debugging Apache Airflow
February 22, 2019
The Zen of Python and Apache Airflow
February 18, 2019
AWS Machine Learning Competency Status for GoDataDriven
February 14, 2019
GoDataDriven Open Source Contribution for January 2019, the Apache Edition
February 13, 2019
Our social responsibility as a company
February 08, 2019
Keras: multi-label classification with ImageDataGenerator
January 31, 2019