You're viewing the archives for July 2015
« Return to Blog

Transactions fully restored and more clarity about what happened

7/9/2015 in ClearCheckbook News

All of your transactions have been restored to your accounts and they will show up as they normally do on both the web and mobile apps.

What the heck happened?
Now, a little more explanation about what happened and what we're doing to future-proof the site. On the morning of Tuesday June 30th, the database started getting hung up on certain queries which caused a queue of queries to form. When this happens, the server tries its best to get through the queue as fast as possible, but in doing so, overloads and causes the queue to essentially stop in its tracks. This makes itself visible to you by seeing a bunch of errors and not being able to log into the site.

We were able to get things running again for the rest of the day but two days later on the morning of Thursday July 2nd the database had the same issue. We scrambled to get things running again but everything we'd done in the past simply wasn't working. We narrowed the problem down to a single table in our database, the Transactions table. This is what stores all of the transactions that ClearCheckbook users add to the site, all 75+ million of them. Whenever changes are made to this table (adding, editing or deleting transactions), indexes that help the database run more efficiently are updated. Normally this is a quick process but on 75M rows of data, this can take a long time.

To try and speed up the reading/writing of this table, we archived transactions for users who hadn't logged into the site for a few months. Unfortunately this didn't help too much and after a few hours we were back to the database overloading issues. By now it's Friday July 3rd and we're scrambling to get things figured out but it seems like nothing we do is helping at all. It's now that we make some calls and, fortunately, even on a Friday afternoon before Independence Day in the United States, we were able to reach some outside help that was able to meet up on Saturday July 4th and work with us.

The first thing we did was backup all transactions and force manual restoring when you logged back in. This worked fairly well and at least got the site operational. The problem was many people were reporting that not all of their transactions fully restored and were concerned that their data was lost.

What we're doing in the short term
This obviously wasn't a long term solution and the confusion that arose made us speed up plans to break up the giant Transactions table into about 30 smaller tables based on user id. This will be our mid-to-long term solution as far as the database schema goes. The downtime tonight was necessary so we could move all the transactions from the giant tables into the smaller ones. In the short term, this will help the site maintain uptime on our current hardware since the database won't have to search 75M transactions regularly.

What's next
The next step, and what we're working on now, is migrating to a cloud based solution. The outside help we mentioned earlier also excels at making these transitions and has done so with other companies in the past. We're working with them now to optimize our site code for its next home in the cloud. The cloud solution will help us expand and scale up as the site grows without having to maintain our own servers.

A heartfelt apology
I have to admit that this is the single most stressful situation I've ever been in before. When I took ClearCheckbook on as my full time job back in 2009, I knew there would be growing challenges along the way. I use the site daily myself for both personal and business accounts. When the site is having problems, I know how frustrating it can be.

What compounds the frustration is when I receive threatening, cursing and hate filled messages to my personal cell phone and email accounts and our social media outlets and the ClearCheckbook blog while we're working our hardest to resolve the problems. At the height of the issues, we were receiving several hundred emails an hour. There's no possible way we could respond to each one of your messages individually and still have time to work on the database problems.

I can't express how sorry I am that this happened the way it did. Please know that we do have a plan of action to prevent this kind of mess from happening in the future and I'll be posting more information about it as a plan and timeline solidifies.

If you're a premium member who contacted us to cancel/refund your membership, we tried to search through all the messages and fulfill your requests, but there are probably some that slipped through the cracks. If you haven't heard back from us and still wish to stop using the site, please contact us again through the Contact Us link at the bottom right side of the page.

Again, from the bottom of my heart, I'm so extremely sorry for the inconveniences the downtime caused you.

Brandon OBrien
Founder, ClearCheckbook

Update about restoring transactions

7/8/2015 in ClearCheckbook News
We're finishing up some updates that will restore everyone's transactions without having to click the restore button. This will also fix any issues that some people are having where it looks like a subset of your transactions is still missing even after clicking the restore button. These updates will be pushed out by the end of the day today.

Again, we're sorry we can't respond to every email coming in. If you're concerned that all of your data is lost, it isn't. This confusion will be resolved after we make the above updates live.

An explanation about the server troubles this week

7/4/2015 in ClearCheckbook News
First off, we want to say we're extremely sorry for all the trouble the site and app has had this week. We received thousands of emails during the outages and there's no way to easily respond to everyone. If you contacted us to express your confusion or frustration, we're sorry we didn't get back with you. We spent that time working on getting the site back up and running.

The site went down because our database has been getting overloaded lately. Imagine each page load as someone walking through a turnstyle to enter a subway or stadium. Under normal circumstances the lines move through normally as everyone goes through one at a time. Now, imagine rush hour when there are many more people trying to go through those same set of turnstlyes. Things start to back up and people get impatient. In the web world, this leads to people refreshing the page which just makes the situation worse since each page load causes more and more requests to the database (essentially adding more and more people in line). This snowballs and eventually the wait time is so long and there are so many requests pending that the server doesn't know how to handle it so it starts throwing errors.

The main culprit is our Transactions table in the database. We have somewhere around 75 million transactions that our users have entered with somewhere around 50,000 new transactions added daily. This is a lot of data to constantly look up, add to and delete from. The hardware that we're currently running right now was built and set up around 2008 and the truth is, I think the site has gotten too big for our hardware to handle. Back then, we had outgrown several other hosting solutions and at the time, cloud based hosting was still in its infancy. We purchased hardware that we thought would last us for a long time, and has done so pretty well for the last 7 or so years.

For an immediate fix, we've backed up all transactions and are having you restore them when you log in by clicking a button at the top of the page. This process could take a few minutes but don't worry, your data is still secured and will be restored shortly (there are a lot of other people trying to do the same thing).

To help prevent these downtimes and database problems from happening in the future, we're working with an outside party to help us migrate into newer hosting solutions that aren't bound by a physical machine (namely, cloud based solutions that can expand with demand). This is the next logical step in our site's growth, but it's not something we want to jump into blindly. We're going to do our research and find the best solution to keep ClearCheckbook running smoothly for years to come.

If you're accessing the site through one of the mobile apps, you'll need to log into the website to manually restore your transactions for now. We'll work on updating this soon though.

Again, I can't express the depth of sorrow, stress and frustration I feel that we've had these issues. To compound on it, the 4th of July holiday weekend in the US has made it extremely hard to find outside help since everyone is out on vacation.

We'll keep you updated with the progress of the migration to newer hardware over the next week.

Brandon O'Brien
Founder of ClearCheckbook

Archives

» 2016
» 2015
» 2014
» 2013
» 2012
» 2011
» 2010
» 2009
» 2008
» 2007
» 2006

Categories

Contact ClearCheckbook

If you have a question about the site, please check the Support Forums, Tutorials and Help section to see if it's already answered there.

Your message has been sent.

Thank you for reaching out to us. We work hard to respond as quickly as possible to our emails. You'll receive a response soon.