Season 1: Episode #1

Storage Wars: Database Edition

Rahul declared our old show boring, and in this episode he fixes that. Join Rahul as he welcomes new co-host Hilary Doyle, declares Moore’s Law dead as a doorknob, and enjoys a heated (but friendly) debate about NoSQL, Postgres, and the future of distributed and relational DB’s in the cloud. Today’s guest: Postgres expert Ed Boyajian, CEO of Enterprise DB.

Guest

Ed Boyajian

CEO of EnterpriseDB

Read Bio

Ed Boyajian

CEO of EnterpriseDB

Listen to the AWS Insiders podcast

Transcript

Rahul Subramaniam: So I actually disagree with the fact that noSQL hasn’t taken off.

Ed Boyajian: I say follow the money in the $80 billion market.

Rahul Subramaniam: I don’t know of anyone who has said, “I love my Oracle Database.”

Hilary Doyle: Apologies, Larry.

Ed Boyajian: Developers should never use Oracle, SQL server, DB2, Sybase again for new applications.

Hilary Doyle: Yeah. Mic drop.

Rahul Subramaniam: Completely agree.

Hilary Doyle: This is AWS Insiders, an original podcast by CloudFix about the services, patterns, and future of cloud computing at AWS. CloudFix is a tool that finds and implements 100% safe, AWS recommended account fixes. I’m Hilary Doyle, joined by Rahul Subramaniam. Hi, Rahul.

Rahul Subramaniam: Hey, Hilary. How are you doing?

Hilary Doyle: I’m well, thanks. In this episode: Storage Wars. Database Storage Wars.

Rahul, you’re going head to head with a PostgreSQL evangelist who also happens to be a US Army veteran. He knows how to get people to fall in line, and I know that your opinions differ, so this is going to be a lively discussion. Stay strong. By the way, noSQL – I saw the first one. The Sequel’s even better. Ha! Okay. We got a lot to cover today. Something for everyone or everyone who likes to store data. This is kind of our Marie Kondo episode. Which data sparks joy and where should it live? Traditional, relational, distributed database, a mixture of all of the above – we’ll cover all the flavors and their use cases. But Rahul, this begs the question, how are you thinking about data storage now? How are you advising enterprise clients to structure for it?

Rahul Subramaniam: So modern computing really has three pillars, Hilary. I mean, there’s compute, there’s data storage, and then there’s networking. Data storage is seeing a revolution just the same way energy is having a solar revolution. It baffles me that a large part of electricity generation around the world today still happens by boiling water to generate steam that then runs turbines. Seems incredibly inefficient. But we didn’t have a better way till the advent of scalable solar areas. Similarly, the cloud providers are changing the game on data storage with their next generation of purpose-built, distributed data storage.

Hilary Doyle: That’s such helpful context. Rahul, we’re just scratching the surface. Coming up, we’ve got hot takes, use cases, helpful tips and tricks, plus a conversation with Ed Boyajian. But first, our AWS headlines.

Rahul, as always I give you the news, you give us your take. First, in with the new. Cyber threats we know are everywhere and now they are extremely sophisticated. Recently, AWS and Splunk, along with a group of other companies, launched a new open-source effort to identify and stop cyber attacks. It’s called the Open Cybersecurity Schema Framework Project. Just rolls off the tongue. Rahul, what can you tell us about it?

Rahul Subramaniam: Well, Hilary, I think cybersecurity is really the Wild West trip, but this might be a good first step to fix that. We have to get rid of bespoke security solutions at all costs. They’re just too costly and unsustainable.

Hilary Doyle: Okay, great. We like cheap and sustainable, when it comes to security. Next up, out with the old. Google Cloud saying goodbye to one part of its cloud IoT core services. All customers now have one year to move to a partner; AWS perhaps. Rahul, why is Google Cloud stepping away from this? What do they know about the future that we don’t?

Rahul Subramaniam: Can you hear that?

Hilary Doyle: Hm?

Rahul Subramaniam: That is the sound of AWS cheering. It’s really a giant gift in terms of potential new customers for them. And for Google, I’m not really sure they really care. This is a fringe use case for them. And AWS has a pretty large portfolio of IoT specific services. So migration’s going to be really simple.

Hilary Doyle: Happy holidays, AWS. And finally, Clorox is headed into the cloud with an ERP system. Their current system hadn’t been updated in 20 years. And this change could mean up to $100 million in savings in the next two years. Why do you think this took them so long? And what is it going to change for their business, Rahul?

Rahul Subramaniam: So they say that the easiest way to run an enterprise to the ground is to rewrite your core solution from scratch. Okay? And it appears Clorox is trying to do just that. I would’ve suggested a completely different strategy for them.

Hilary Doyle: Of course you would’ve.

Rahul Subramaniam: They should figure out a way to take their data and move it efficiently to the cloud and then build new value added point solutions that are driven by that data. Now, leverage the cloud for new ML Insights or better operational scale. Just don’t try to rebuild that old functionality. It is never going to be complete.

Hilary Doyle: Well, you can’t blame them for wanting to clean things up. For Clorox, that’s called brand building.

Rahul, data storage has a long history. We could start with the 1880 US census, or the birth of IBM, or we could skip ahead to, say, 1970 and the rise of SQL, followed by the explosion of big data and all that comes next. In other words, would you please give us a 50-year overview of data storage in the next 45 seconds?

Rahul Subramaniam: 50 years in 45 seconds, Hilary?

Hilary Doyle: No problem.

Rahul Subramaniam: So here’s the short version. For decades, Moore’s Law predicted that the number of transistors in a chip would double every two years, which indirectly meant that computing power also doubled every two years. And I believe that had a significant impact on how we built and used databases. We just kept building enormously computer-intensive monolithic databases, thinking that the principles of Moore’s Law would just keep up.

Hilary Doyle: Ooh, I like big buts and I sense one’s coming! But first, okay, how did this factor into your work and how you were building?

Rahul Subramaniam: So I remember a time when I had these databases that stored millions of rules that were at the heart of our configuration engine. Right? Memory was scarce at that time and network bandwidth was so slow that my natural instinct was to move all of our processing into that database. If the UI needed data in a certain way, we just built views. If we needed some kind of joins or ETLs or any kind of processing of that data, we just wrote tons of stored procedures. In short, we relied on that one large database to do most of our heavy lifting.

Hilary Doyle: And what’s happened to Moore’s law?

Rahul Subramaniam: Moore’s Law has been dead for at least a few years.

Hilary Doyle: What? You’re kidding me.

Rahul Subramaniam: And the reality is that that dramatically changes things, including how we should be approaching databases, because until a few years ago, relational databases and SQL have really been the most efficient solution around when it came to dealing with slow storage and network.

Hilary Doyle: Wow. Okay. So what’s next? What’s changing?

Rahul Subramaniam: Well, it’s changed everything, especially our approach to data storage. I mean, DynamoDB and DocumentDB are awesome examples of OLTP databases that are at least 10X more performant and available at a fraction of the cost today.

Hilary Doyle: All right. Move over Moore’s Law.

Rahul Subramaniam: Moore’s Law is really the zombie, the undead driving their teams. It’s really time to kill the zombie.

Group: 3, 2, 1. Happy New Year’s.

Hilary Doyle: Rahul, where were you on New Year’s Eve 2017?

Rahul Subramaniam: I think I was in Dubai watching some of the world’s most stunning fireworks.

Hilary Doyle: Okay. Well, I was in New Orleans and I’m pretty sure I was in bed by 11:00. So you’ve got me beat. But no one’s ever going to know that I was in bed by 11:00 for the reasons I’m about to share. You’re a hip guy. What are your social media platforms of choice? Are you live streaming compute musings on Twitch?

Rahul Subramaniam: I’m glad you called me a hip guy. And yes, I am on Twitch.

Hilary Doyle: No way. Okay, that’s very cool. I only go in for the social platforms made for old people. But it didn’t start out that way because back in 2017 I was younger and so were those old platforms. Take for example, Snapchat, one of the most popular social media apps in the world still. They’ve got over 332 million daily active users, 3 billion snaps generated every day. Obviously they were storing a lot of data, but they were built with a monolithic architecture. So all of this came to a head on that fateful New Year’s Eve 2017 when Snapchat went down. Begging the question: if a New Year’s Eve happens without a VSCO filter, has it even happened at all? Rahul, please give us the snapshot of the challenge here.

Rahul Subramaniam: So as Snap grew to millions of customers, they really started running into problems of being able to manage those millions of queries per second. Of course, they needed consistent access and response times to those queries. Relational data stores just aren’t capable of that kind of scale. So Snap really needed to rethink their app architecture with the data store at its core.

Hilary Doyle: In order to allow the rest of us to make our New Year’s Eve queries like, “Hey, do you come here often?” Anyway, we’ll come back to our Snap chat shortly, but first we kind of introduce our featured guest.

Rahul Subramaniam: Yay!

Hilary Doyle: Ed Boyajian is an open-source veteran and a US Army veteran. Thank you for your service. But you may know him best as CEO of Enterprise DB, a.k.a. EDB. Among the many things that this company does, they help migrate workloads from Oracle to PostgreSQL. No doubt many will agree that this is humanitarian work. Ed, welcome. How are you?

Ed Boyajian: I’m great, Hilary. Thank you.

Hilary Doyle: I want to step in for the CTOs who are breaking out into hives listening to this conversation because as an enterprise company locked into a seven-year contract with Oracle, what does a company need to do to make this work while they’re there while also planning their jail break?

Ed Boyajian: Yeah. It’s a great question, Hilary, and it can be an incremental process. And so our most successful customers have done that over multiple years. I mean, I think some of our biggest customers, they’re still on that journey 5, 6, 7 years later.

Rahul Subramaniam: Wow, that long?

Hilary Doyle: Got it.

Ed Boyajian: Think about it this way, Rahul. People got entrenched in Oracle over decades. And I think that in the kind of spirit of parody, it’s going to take some time to unwind those arrangements – both in terms of just the application of the technology, but also in terms of contracts. They have to re-engineer those agreements just as much as they have to re-engineer those applications.

Rahul Subramaniam: Yeah.

Hilary Doyle: Oracle overlords, man.

Rahul Subramaniam: When you look at your customers, are they primarily on-premise?

Ed Boyajian: Rahul, the way we look at it, 40% of our customers are using EDB in self-managed public clouds. So still mostly as you would’ve described on-prem, but a lot in self-managed public cloud.

Rahul Subramaniam: For decades, Oracle has pushed PL/SQL as a way of getting lock-in and kind of encouraged the terrible practice of bringing business logic into your database. Now, you could say it made sense back then when storage and network was slow. But despite all that open-source goodness of Postgres, it is really still enforcing or reinforcing the same bad practices, almost trying to be the next Oracle, isn’t it?

Hilary Doyle: Ooh. Don’t sit down and take that, Ed.

Ed Boyajian: I don’t see it quite that way. I think today many users of modern open-source databases like Postgres will use those store procedures selectively. I think the trick is really getting away from the legacy applications where there was such a heavy reliance, particularly with Oracle. But I don’t see the same patterns evolving with Postgres as it relates to new workloads. Setting it in context is really two patterns, and we see both these patterns in the business. One is migration away from legacy, but then there’s the new applications. About half of our new business comes from enterprises writing new applications on Postgres.

Rahul Subramaniam: I feel like most of the folks who are building new applications are still stuck with the old mental model for how data should be managed and used. We have this structured database that we are so comfortable with and using for, what, the last 50 years, but then we are shoving in this old mechanism of dealing with it for data structures that are really not designed for it. It looks like a lot of relational data storage have built hacks over the years on top of their core structured relational data model to be able to satisfy all of those constraints. Isn’t it better to rethink how we now manage our data stores by separating out the way you use different data stores for different use cases or different access patterns?

Ed Boyajian: I think that topic is layered because we know this: More applications are being written in enterprise than ever before. The creation of value around data has been brilliantly distributed into the organization. I think development is no longer kind of the domain of IT. But I think that what’s made relational databases and technologies like Postgres continue to be sticky and popular is, some of those structures are really well known. I mean, I still think that SQL is still probably the most prolific programming language on the planet of all programming languages.

So I do think that as long as applications are diverse and complex, databases inherently can’t really be commodities. People have to kind of adhere to some foundation of standards. It doesn’t mean there’s not a role for specialty databases. I think there’s a really important role for specialty databases. But the idea that general purpose databases aren’t important, I just think there’s an intellectual drive to want to get away from that, and we don’t see it that way.

Hilary Doyle: Postgres for you is a one-size-fits-all?

Ed Boyajian: I think Postgres is a one-size-fits-many.

Hilary Doyle: Ed, where does noSQL live in your database philosophy? Does it have a role?

Ed Boyajian: I think you follow the money around workloads, Hilary, and it gets at the heart of your question about noSQL. I would grossly break the application workload arena into roughly three big buckets. I would say there is what we think of as enterprise applications, I would call those systems of record where transactions and a single source of truth really matters. I think that’s the biggest part of the market today, in excess of $60 billion. That’s where all the big vendors, the traditional proprietary, the guys that we’re all hating on today live.

Hilary Doyle: Yes.

Ed Boyajian: And then there’s the systems of analysis, the analytics database where Snowflake lives, Teradata lives there. I think that’s a smaller market but still important. Then you got systems of engagement. I think if you follow the money there, look at that. That’s a… Call it a billion, five tied up with Mongo, Couchbase, throw Redisson there. So I say follow the money in the $80 billion market. It’s not that those noSQL databases aren’t important. They are. They’re just important specialty databases.

Rahul Subramaniam: I generally agree with the way you broke it down. I just take it one level deeper where I break it up as the transactional databases, which is basically an OLTP stuff. And there, I think given that more often than not we end up using custom fields and custom tables and stuff like that, it makes more sense to use something like a noSQL database. Then you have OLAP, where I actually believe that the structured column data processing lends itself better to the traditional data stores like Postgres because Postgres is so damn good at processing and aggregating all this very structured columnar data. And it’s one of the most performant engines you can find out there. So it just naturally makes sense to use that for OLAP like scenarios or use cases.

Hilary Doyle: Going by Ed’s numbers, Rahul, it seems like adoption of noSQL is low?

Rahul Subramaniam: I actually disagree with the fact that noSQL hasn’t taken off. In fact, for most modern applications where teams have actually been able to separate out the use cases and the access patterns, they’ve actually gravitated towards noSQL data storage because they need faster response times. Take a look at the likes of Amazon themselves when they’re able to hit hundreds of millions of transactions a second. When you have use cases like that, no, Postgres cannot do that. Not even Oracle can do it. The maximum throughput you can get out of an Oracle system is maybe 80,000 transactions per second on a single database. With DynamoDB, you can hit hundreds of millions of transactions per second without worrying about it or without even worrying about scaling requirements. So where there are needs to build a purely OLTP system, and where developers have been able to extricate themselves from this old world affinity to relational data stores, they automatically gravitate towards noSQL.

Ed Boyajian: The number of applications that have that kind of user demand is relatively small. The question isn’t, “Are noSQL databases good for those workloads?” I think they are. I just don’t think they’re prolific in most enterprises. And I think that’s where this conversation gets confused. And that’s why I use the expression, “Follow the money.”

Rahul Subramaniam: I’ll give you an example of a scenario where folks are still in the old world. So if you look at telecom, okay, I mean telecom is a massive use case. The reason why most of these companies have to split all of their management of telecom operations is the fact that their Oracle databases don’t scale beyond 80,000 TPS. It’s not that they don’t have a requirement for it. They just don’t have a system that can handle more transactions than that. And they’ve been married to Oracle for so long that they can’t think of moving to anything else. Right?

Ed Boyajian: But if you looked inside that same telco, they got thousands of applications, of which some are the ones that Rahul describes, and thousands of others don’t look like that. Thousands of others are databases that are smaller than a terabyte that don’t have hundreds of thousands of concurrent users processing 80,000 transactions per second. So that’s our reality. That’s just our experience. As I’m thinking about it, that noSQL argument we’ve listened to is, to me, has lived in the arena of the extreme use cases and appropriately. I think the argument gets into it like, “And then that should apply to everything.” That’s where we just see the world differently because our experience is different.

Hilary Doyle: What I think I hear you both saying in technical terms is that customers are locked into Oracle, Oracle sucks, and it is time to find another alternative. That needs to happen. It’s going to take time. But for you, Ed, the final point in that journey is a full migration to Postgres. And for you, Rahul Postgres is an intermediate solution.

Rahul Subramaniam: Yeah. I think we can emphatically agree that the proprietary database, especially the ones like Oracle, over the years, I don’t know of anyone that I’ve ever met who has said, “I love my Oracle Database.”

Hilary Doyle: Apologies, Larry. Done so many great things.

Ed Boyajian: No, I agree. Let me say it this way. Developers should never use Oracle, SQL Server, DB2, Sybase again for new applications.

Hilary Doyle: Yeah. Mic drop.

Rahul Subramaniam: Completely agree.

Hilary Doyle: Well, Ed, we are so delighted to have had you joining our journey today. Great to finally meet you, and thanks for being here.

Ed Boyajian: It was my pleasure. Hilary, Rahul, thank you both.

Rahul Subramaniam: Thanks so much for being here, Ed. This is an absolutely awesome conversation.

Hilary Doyle: Okay, Rahul. Let’s talk about that conversation with Ed. He did a great job defending Postgres, and I think you’ve got to bulk up your defense of noSQL a little. Think of this as bootcamp. You got to get down to the business. Get in the ring, Rahul. Speak your truth!

Rahul Subramaniam: Let’s start with costs. Okay?

Hilary Doyle: Sure.

Rahul Subramaniam: Why would you want to pick an expensive relational data store that must be up at all times and therefore cost a bomb versus a noSQL store like DynamoDB where you pay only a fraction of the cost? Because you pay per transaction.

Hilary Doyle: Right. But what about queries? Dynamo doesn’t have a query planner, do they?

Rahul Subramaniam: But you don’t need one with DynamoDB.

Hilary Doyle: Ah, okay.

Rahul Subramaniam: Right? Its architecture means that it has consistent query performance at any scale. So because of that, you don’t need a query planner or need to even optimize that stuff. And we haven’t even talked about the circus act that dev teams go through while managing custom fields in their relational data stores. That’s a non-issue with DynamoDB.

Hilary Doyle: Got it. I’m juggling while I ask these questions. What are the trade offs? Postgres versus noSQL – list them for me.

Rahul Subramaniam: Okay. So do you pick Postgres data store that is flat and structured, expensive, with the bet that you won’t ever grow and it’s designed for laziness? Versus a noSQL store like DynamoDB where you had to put in some thought right up front about what your data architecture looks like and gives you the same performance no matter what your scale? What would you pick? I pick the second option any day of the year. And a huge portion of the industry is starting to discover the same.

Hilary Doyle: Got it. All right. Well, I won’t call this a knockout, but we will call it a tie.

Okay. So Rahul, you’re leading the charge for mass migration to noSQL. Help us get there. What are the tips and tricks we all need to know?

Rahul Subramaniam: Okay. So let me just start by saying that we need to get out of our traditional relational data store mindset and think about our data very, very differently. Spend some time deciding and figuring out your data access patterns. That is probably the most important thing you can do for your data. And this is not something that we’ve done historically with our relations, so we just dumped it all in and figured it out as we go. But it helps a ton to figure this stuff out early because it can save you millions down the line.

Hilary Doyle: You say save millions down the line. But when you talk about leaving a relational mindset at the door now, that means for enterprise companies, they’re leaving millions of dollars in prior development at the door as well. How do you address that?

Rahul Subramaniam: But Hilary, that’s the whole sunk cost syndrome. You don’t make your bets on the future on the money that you’ve spent in the past.

Hilary Doyle: Well said.

Rahul Subramaniam: You want to make your future efficient, and therefore you have to move to a different mindset.

Hilary Doyle: And is that different mindset a database called, perhaps, Dynamo?

Rahul Subramaniam: Absolutely. So that’s tip number two.

Hilary Doyle: Got it.

Rahul Subramaniam: Just use DynamoDB. It is awesome. And of course that would depend on your data access pattern, which moves to my third solution, or the third tip rather. Don’t try to do aggregations or joins, which again is the traditional mindset in how you design your data storage. Learn how these noSQL DBs work. They are great for certain kinds of access patterns.

Now, if you absolutely need aggregations and joins, which is typically what happens in your analytics use cases, then use something like Redshift. However, if you have hierarchical data, go use a graph database like Neptune. So doing the work of understanding how you’re accessing your data allows you to optimize the way you store them in different places, and in the long run, gives you the best performance at the cheapest possible cost.

Group: Happy New Year’s. Yeah!

Hilary Doyle: Let’s get back to New Year’s Eve 2017. Rahul, you’re on a roof watching transporting fireworks in Dubai. I’m probably in bed reading a book in New Orleans. Let’s relive that magical night. There we all were testing out our aviator filters, rocking flower crowns, shooting lasers from cat eyes, Snapping like the world might end tomorrow. And then all of a sudden there was darkness.

Rahul Subramaniam: On the phone, to be clear.

Hilary Doyle: Yeah. To be clear. No, you hadn’t had too many French 75s. It was Snapchat’s monolithic architecture buckling under the weight of all of that increased New Year’s Eve traffic. Those are make or break stakes for a social app that’s just making its way in the world. So Rahul, how did Snap fix this?

Rahul Subramaniam: One word. DynamoDB.

Hilary Doyle: What a shocker.

Rahul Subramaniam: Yep. I’m always advocating for DynamoDB.

Hilary Doyle: I know. They should send you swag.

Rahul Subramaniam: Yeah, I deserve it. So the median latency of sending snaps dropped by over 20% once they started using DynamoDB, and they can handle more than 10 million queries per second. Snap found flexible pricing and reliable throughput with provision capacity and auto scaling, and so they were able to reduce costs significantly. But back to what I was saying earlier, consistent performance at no matter what scale you operate at. So backup, restore, point in time recovery – all of those things really were things that Snap benefited from. And with all their point in time recovery, they were able to go back to any point of recovery in the preceding 35 days. So once it refactored the monolithic architecture into a microservices one, Snap was basically able to carry out live migrations without even disrupting their service at all. It just-

Hilary Doyle: It had the power.

Rahul Subramaniam: Absolutely.

Hilary Doyle: Well, in a world full of challenges, thank you, Snap, for solving the problem of panda filters for everyone all at once. That’s it for us right now. You’ve been listening to AWS Insiders from CloudFix. My name is Hilary Doyle.

Rahul Subramaniam: And I’m Rahul Subramaniam.

Hilary Doyle: CloudFix is an AWS cost optimization tool. Learn more about them at cloudfix.com and check out the show notes for Ed Boyajian’s information and more. EDB contributes more than 30% of all Postgres code.

Rahul Subramaniam: Please leave us a review and follow us.

Hilary Doyle: We’ll see you next time.

Rahul Subramaniam: Goodbye.

Hilary Doyle: Dubai? How about those fireworks?

Meet your hosts

Rahul Subramaniam

Host

Rahul is the Founder and CEO of CloudFix. Over the course of his career, Rahul has acquired and transformed 140+ software products in the last 13 years. More recently, he has launched revolutionary products such as CloudFix and DevFlows, which transform how users build, manage, and optimize in the public cloud.

Hilary Doyle

Host

Hilary Doyle is the co-founder of Wealthie Works Daily, an investment platform and financial literacy-based media company for kids and families launching in 2022/23. She is a former print journalist, business broadcaster, and television writer and series developer working with CBC, BNN, CTV, CTV NewsChannel, CBC Radio, W Network, Sportsnet, TVA, and ESPN. Hilary is also a former Second City actor, and founder of CANADA’S CAMPFIRE, a national storytelling initiative.

Rahul Subramaniam

Host

Hilary Doyle

Host