Not quite three years ago I wrote an article called “What If?”.
What I wanted to do was go back and rethink decisions we had made during the years, especially decisions that we made for MySQL 5.0.
5.0 exists because of the MySQL/SAP alliance. SAP wanted to replace Oracle with MySQL, and to do that MySQL was going to need to run SAP R3 in order to do it. We didn’t just pay lip service to SAP, there was an effort to make this happen. Somewhere in the middle of all this there was also a very odd “we were going to adopt SAPDB as the next MySQL”. Which of course was never going to happen. There were countless meetings over this, and attempts to somehow sprinkle even an ounce of the SAPDB code into MySQL, but that never happened.
As far as making R3 work on MySQL? That was incredibly unlikely, and it was damaging to the product in the end. We ended up with a lot of features that the database was never designed to have. We created an unrealistic set of expectations. We had a source base which had too little testing.
So part of the goal with Drizzle was to cut it back to the core and build modules that we could then create better testing for. So for that reason Stored Procedures, Views, and Triggers were out. None of them were well designed, and all of them had/have major bugs. We tossed out the monolithic kernel design and moved to microkernel design.
MySQL 5.1 made an attempt to patch the replication system that had been written a decade ago. MySQL replication works, but it works with a lot of exceptions. Anyone who has ever put it into production is aware of these. The good thing about MySQL replication is that it mostly works out of the box, and that is something that was a bit of a revolution when it was created. Today? With the notable exception of SQL Server, the rest of the major databases still have replication systems which are difficult to use, install, or deploy.
We initially looked at using 5.1’s replication. We were only going to refactor it such that we were going to beef up its file format and switch to just using the row based replication that was added in 5.1. We were unsuccessful in refactoring it. About 9 months in we figured this out, and we began a rewrite.
The rewrite was the right answer. The original code had too little testing for us to ever know whether or not a change we made created bugs or not.
A big lesson learned, if you are going to refactor code, make sure you have plenty of testing up front.
Internally we have “new code” and “old code”. If we want to make a change to “new code” we can typically do it very rapidly. The rate at which we can extend it is pretty amazing. The MySQL code base is not friendly to anyone who knows C++. Pretty much all of the warnings have been disabled and there are a lot of tricky bits.
We have fixed all the warnings in Drizzle. This is something that isn’t sexy work, and the only way it is justified is because cleaning up warnings fixes bugs. If you are starting a new code base let me implore upon on you the necessity of doing this from the beginning.
Today our replication is pretty spiffy, and it answers a couple of the big “What If” statements I have wanted answered:
1) We use an entirely open message format.
2) We store our replication records directly in Innodb.
The open message format comes with a penalty, it is more verbose than a native format. It takes up more space than if we just shipped the block records created in the transactional engine. Running a point in time recovery on block records is tricky and very limiting. You can’t take the data from one database and push it to another. ETL? Forget it.
We used Google’s Protobuffer for the message format. There are other libraries available but they were either license incompatible or were not widely known. At the time we hadn’t made a decision to go with boost so using its serialization library wasn’t an option. The disadvantage has been that the Google library created a dependency for installing Drizzle. Dependencies are a pain, and when we started Drizzle I had thought that the different Linux distributions had a good handle on this, I don’t really believe this any longer. Avoid dependencies.
Storing the records in Innodb has always seemed to be an easy win in my mind. It solves a lot of the two phase commit problems that plague users and it gives you instant recovery. Storing the log in a separate file can possibly give you a win in that you can do some tricks with IO, but in the end it just complicates everything.
With MySQL you always need to keep in mind the question of “What would MyISAM do?”
MyISAM’s design, and limitations, are scattered throughout the program. In all cases MySQL has to ask “how will this be handled if we need to store data in a storage engine that can’t handle failure, handles all of its own IO, and needs to be locked at the Table level?”.
We dropped MyISAM support about a year into our work, and relegated it to a support only roll for temporary tables. We didn’t hide it completely before we GA’ed Drizzle, but we won’t support it long term. I’ve heard users say “but I want its performance!”. Trading performance for reliability works out for some people, but certainly not everyone. What I find is that when someone wants this, what they really want is a different sort of database all together. Typically it is some sort of analytics problem which creates this need.
Which gets us to the storage engine interface. It was within MySQL the first attempt to create an interface that we could plug different solutions in. I had proposed it in MySQL because I had written different engines and knew what a nightmare it was to make it work.
That engine interface has generated millions of dollars. When I wanted to make it available at MySQL the backlash was significant. Some of sales freaked out, some of marketing thought we were going to let others take over the product, and alliances wanted to know how we were going to limit it to “select partners”. On top of that, half of engineering wanted to go and re-engingeer it immediately.
In Drizzle we have spent a significant amount of time reworking the interface, but it is far from perfect. We redesigned it so that engines now own their own meta data and federate that data to the microkernel. We also designed the interface to require that all new engines have ACID like qualities, know how to handle their own recovery, and can handle failure gracefully. Our core engine is Innodb. We have had others propose new engines, and we have even supported other engines, but at the end of the day we know people want a transactional engine mainly because they don’t want to find that their data has been trashed.
Our Innodb is a little different. We have more views into the state of the engine, and we fixed our version to compile with a C++ compiler. We cleaned up warnings and fixed the bugs that popped up from that. We have begun to refactor it so that it is more integrated with Drizzle’s thread scheduler.
Innodb would have been the default engine for MySQL long ago if not for some “not invented here” mentality, mixed with a flopped buy out attempt. Heikki, the inventor of Innodb, came out quite well in all of this. Good for him.
I don’t believe we will spend much more energy on the storage engine interface going forward. It is a dead business, and while there are a couple of companies that have built brand and product enough to make a go of the business I don’t expect any additional ones will show up. The storage engine business made money for MySQL, but it was a big distraction. While with Drizzle is easier to integrate an engine, I’m not sure that a business exists for storage engine vendors with it. I’ll write more about this at a later date.
Speaking of dates, Drizzle’s internal format for timestamps is 64bit. There is still some work to be done to allow to use all 64bits, but you won’t need to recompile or change your disk format for them. Right now we need to fix some tests, and make sure a couple of functions will handle the formatting, but we store your data such that going forward, or backwards, you are in good shape. Unlike MySQL we store time as GMT, so there is no screwing around with “well I stored my data in my local time zone, but we had the machine set too…”. I have personally spent over a month of time just fixing bugs in that code.
We have spent a lot of time fixing bugs. We get a big collective smile on our faces when we read about new bugs that have been discovered in MySQL, when we discover that we don’t have the bug. We have spent a lot of time fixing bugs, and a considerable amount of our time has been spent on finding new ways to test Drizzle. I am sure plenty of bugs exist to be found.
We also support storing/comparing/displaying time with microseconds. We also have a real BOOL type, which I have been told is handy for the SQL Alchemy folks, and a native ISO UUID type. The UUID is interesting in that it stores time as well as being unique. It isn’t as fast as “please give me the next number”, but I believe it will be useful for a lot of applications. We have refactored all of the types, and the only one that was not size related that we dropped was SET. If you wonder why we dropped it read the section in the MySQL manual about its limitations and bugs.
Why do we allow only DELETE against a single table at a time, like pretty much every other RDBMS? Beyond the conceptual issue that few can wrap their heads around how to form, let alone feel like they know what the query will do, we hit the problem of the “multiple execution path”. There were a lot of one off execution paths in MySQL. In a lot of cases I know these were dead refactoring projects that were never completed. The “multiple execution path” problem is particularly disturbing when you think about fixing a bug. If you fix a bug in DELETE you need to know that there is an execution path for a single table that is different then the path used for multiple tables. This leads to odd behaviors, and a much richer set of bugs that exist.
SQL modes? Those are gone. If you wonder about what sorts of problems they create inside of the server, I’d suggest reading about the “Legend of the Ambalappuzha Paal Paayasam”.
In general in Drizzle we have tried to get rid of Gotchas that we have found. Things like declaring a column NOT NULL and discovering that somehow the database still stored a NULL is gone. Altering a table and adding a field that would violate the structure of the table? That is gone.
It is amazing that ALTER TABLE works, as the code there is Byzantine. We have made some effort to clean it up, but it is still way too tricky. I wish we could have done more there, but it is what it is. Are you using partitions? Make sure you back up your data before doing an ALTER TABLE. Wrapping partitions into the system in the way that was done at the time was simple, but it is far from robust.
I had hoped that with 5.1 we would have created a single logging API, but instead we ended up with multiple logging API internally in the server. With Drizzle we ripped all of them out and installed a single API. It is crazy simple to write a new logging plugin.
Which gets into the philosophy of plugins in general. Writing plugins should be low hanging fruit. Whenever possible we have tried to make it the case.
We have an entirely new INFORMATION_SCHEMA in Drizzle. It is based in table functions, which is a new concept in Drizzle. We keep a separate schema called DATA_DICTIONARY, in it we put whatever we like. Our INFORMATION_SCHEMA is only what the SQL standard has specified. We do zero vendor modification to it. Another hats off to SQL Server, their INFORMATION_SCHEMA is the closest to complying with the standard.
Drizzle’s drivers are BSD. They were written outside of Sun, and Sun signed off on contributions to them under a BSD license. They speak Drizzle and MySQL’s protocol. There is a JDBC version that was written. Their adoption is becoming wide spread. Licensing clarity around them is a big win for us, and for ISV who want to integrate. MySQL’s licensing mess was related to a lot of hand waving involving them. Recently I noticed that MongoDB had written up a clear licensing policy with regards to their own drivers. Awesome.
We never got to finish all, or really much of any, of what we wanted to do with the Drizzle protocol. I believe this is an area where we will see change in the near future. Internally inside of drizzle we have a C++ interface that resembles JDBC that lets us execute queries. We will be doing a lot more with that interface going forward.
What about performance? With Drizzle we began doing benchmarks early on, using a few different benchmarks. The benchmark generated by sysbench has always been the benchmark we have used as our bellwether. Unlike a lot of databases we test Drizzle with up to 1024 concurrent executing queries. Most of the benchmarks I see people run are for far less connections. We have chosen time and time again to favor performance gains at the high end, over gains on the low end. We are roughly double in performance from where we began. We could still do a lot better. MySQL 5.5 has a new meta data locking system which should do well in a number of situations, we could do a bit better in some of these cases. Our lack of a MyISAM would make it simple for us to move forward in this direction if we want too.
There has been a lot asked about our claim on scaling with lots of cores. Our process there is simple, eliminate locks, favor performance gains when we find them that favor of additional CPU, and try whenever possible to remove strong ownership that require waits for locks. MySQL relies on MyISAM, and MyISAM has significant locks, especially around the keycache, we got rid of those by freeing ourselves from MyISAM. We have had some gains with our new scheduler and we have done some to improve how IO is handled.
I am sure we have a lot of tuning still to do. We won’t be publishing benchmarks which compare us to others though. I’ve yet to see a comparison benchmark which wasn’t completely flawed, and even when they are not, few people really understand them. They fall into the classic “how many angles does it take to dance on the head of pin” conversations.
Our authentication system is modular, and we need to iron out more of the authorization system.
I’ve seen someone say that Drizzle is designed for Google and Facebook. This is not the case at all. We built it so that the next Facebook, Google, etc would have a platform to build on. Facebook and Google have their own forks of MySQL, they aren’t going to be using Drizzle. The pieces are there for the next company who needs to innovate, it is just a matter of someone making use of them. We speak the MySQL protocol, so the typical MySQL application runs just fine on Drizzle without change. We designed Drizzle to work as a piece of someone’s current infrastructure, not be yet another application which has a costly integration. We have a NoSQL sort of solution via the blob streaming module, but we are first and foremost a relational database.
What will the next Google or Facebook find? A much more friendly platform than what MySQL provided to develop on and with. The big success for Drizzle has been in the people that have been involved. We are without a doubt the descendant of MySQL that has the largest contributor base, and we have long passed MySQL with regards to contributors. We are well into the hundreds when it comes to developers who have contributed code. We have had more then 921 commits in the last month across 20 people. Our numbers go up and down, but we are consistently more then double anyone else in size. If you just walked out of college, or skipped it all together, you are going to have a much easier time adjusting Drizzle to your needs . At least we believe this :)
The codebase is C++, we make use of Boost, and while we are cautious, we tend to favor more forward thought in how we code. Readability is the key to creating code that others will use. Because in the end? We can scale silicon, but carbon? People are much harder to scale.
The people to thank for the code:
Olaf van der Spek
Other people to thank?
David Douglas who at Sun supported us initially, and when we didn’t think our internal support initially at Sun could get any better? Bob Brewen worked with us till the end came for Sun. An extra mention should be made for Lee Bieber, he has been working with the project from nearly the beginning as well. He has handled project management, done code refactoring, made flyers, organized dinners, and did everything else in between.
Mike Shadle for getting us machines, and making sure everything runs. Adrian Otto at Rackspace should be thanked (along with a number of other people as well).
A thank you should go to Chris Dibona for the Google Summer of Code project. We have a number of students who now work on databases for a living thanks to that program. While with MySQL we constantly failed at getting student’s code into the server, with Drizzle we have had a lot of success.
There is an entire channel of people who have been involved with Drizzle on Freenode in #drizzle who should be thanked as well. IRC is how we communicate.
There are a lot of other people I am forgetting to thank, sorry about that.
So what next? There is a lot more to Drizzle then what I have written above. Having worked on this for years I often forget what the differences are anymore. There are lots of new features, plenty of new enhancements, and new bugs just waiting to be found. I’m giving a talk at Web 2.0 Expo in a couple of weeks in San Francisco were I will talk about some of what we have done and are doing for virtualization.
I will being giving a keynote at the O’Reilly MySQL Conference & Expo, and there are a handful of talks there on Drizzle as well. The MySQL Ecosystem is a radically different place then what it was a year ago, I’ll be commenting on it in the future online and at the conference.
About a week ago Monty Taylor and I sat down and talked about what we wanted to do with Drizzle going forward. Monty has been working on this since the beginning with me, and he has been a lot of fun to work with. One conclusion that we both came to was that we want to see where people will take Drizzle before we determine too much about its future. It is easy to get caught up in new features, and we are interested in seeing how others use it before too many decisions are made about what to do next.
Thursday, March 17, 2011
Not quite three years ago I wrote an article called “What If?”.
Saturday, March 12, 2011
Wakeup at 7:08
The movie I was watching last night, a semi-documentary on the early punk music scene leads me to remember that there was a character called the “Brood” in the Marvel Universe.
So I lookup “Brood”.
Next I read up on the Acanti. Who doesn’t like space traveling wales that do harmony?
Which leads me to read up on Acheron Empire because of a slight reference to to the Acanti.
This of course leads me to read up on the Hyborian Age, and read the entry on Robert E. Howard. I briefly read the entry on Red Sonja, which leads me to the entry on the film, ad makes me wonder what was up with the copyright on Conan at the time. I also wonder what the X governor of California is up too, and wonder if the movement to change who can become president is moving forward at all. Which makes me wonder if he wouldn’t make for a somewhat palatable republican candidate. I can only imagine him on stage with Palin.
I then retreat from this entire tangent.
I look to see whatever happened to Brigitte Nielsen and discover that celebrity drug hab TV exists, and then have to look up Jamiee Foxworth because I have no idea who this child actress was. If you aren’t Gary Coleman or Drew Barrymore. I have no idea who you are.
Sometime in the last year I’ve read the article on Drew Barrymore, so I can skip that.
Jumping back I read the entry on “Kull of Atlantis”. I do not get the appeal of barbarian fiction.
The apeman article is just a jumping off point for a number of topics. I read up on the concept of “Person” and read about Humanezee. Parahumans? Check.
Did you know that there is a movement called the “Great Ape Project”?
Give rights to our fellow apes.
The DNA difference is quite small. I then read about the Soviet project to breed human hybrids. I know the germans had one as well, but I don’t notice it linked into any of the articles. The story of “Oliver (chimpanzee)” is pretty sad. We humans suck. I read up on Karl Pinkington, and from there…
Ancient kingdom of the Picts, which requires me to understand Bede the historian.
The modern movement for different countries in the United Kingdom to gain some level of independence is fascinating (especially since in the end it disenfranchises the English (deserves you right…).
Transhumance has nothing to do with transhumanity. I did briefly see an article on that, but I have heard enough about it in life. On the other hand reading up on transhumance gives me a better picture about the legalize structure of a nation.
Want to start a movement to do away with counties in Washington?
I find myself reading up on Wales which leads to articles on the End of Roman Rule in Britain. Hadrian’s Wall? Sounds like it was a taxing structure. Built in seven years and 80 miles long!
I have doubts that we could do that today. It was historically saved by a plan of purchasing land around it, using that land for sheep, and using the proceeds to buy more land.
Devolution, Padani, and Nunavut all follow. I hadn’t realized how much the structure of Canada had changed in the last could of decades.
BTW Canada? You suck for shipping eight families off to the great white north and not letting them come back south when they realize that you had sold them a bill of goods. That is just awful. Who would have thought Canadians did this shit?
Grise Fiord? At least let them rename it to a language that is spoken in Canada.
The Welsh at least have “Snowdonia”. Way more pleasant name.
Time spent on the above? About two hours.
Tuesday, March 8, 2011
I was glancing through Boing Boing tonight and noticed an article about an iPhone Application which is yet another “this is how to organize your life”.
There was a point in my life where every time I had an idea I would open up a window, put in a little bit of code, save, and move on. Despite the small amount of effort that this took I found it stressful.
I had all of these directories scattered around that had ideas in them.
Sometimes, if I had a window already open, I would just go work out the idea if it was small enough in the current code I had opened. Most of the time this was ok, but only ok. I’m not religious about code when it comes to a patch being only one thing (I rollup patches when they are related so there isn’t a lot of difference). It is not the best practice but it isn’t the worst either.
The problem was, sometimes this would go wrong. The idea I had required more work then what I thought, or the idea was a distraction from what I was working on.
So how did I solve this?
I just keep a simple note in my email where I tack on new ideas as I have them. There is no organization, no tags, and I just simply delete a line once I have completed it (and I doubt I will ever complete all of it). I scan it from time to time to see if I can remove anything from it, but for the most part I just leave it alone.
Once I have added an idea to the note I find that all of my stress goes away. I’ve recorded the thought, it will be there later to look at.
I find that most of the stress I have is not about completing what I need to complete, but it is about losing the knowledge of what I might like to complete.
Monday, March 7, 2011
Many of our students have went on to get jobs in the database industry after GSOC, and we have a high rate of “you wrote the code, it will end up the main release”.
Are you a student and databases are not your thing? I’d go look at other projects which have been successful with GSOC and see if there are anything that interests you. Open source is an awesome way to get real world experience developing software.