$5000
$5000. It was a morning just like any other - I made my coffee, read my Wall Street Journal, and logged into StandardData's Amazon Web Services (AWS) account. Then it hit me: $5000 from one night of running distributed processors on AWS. In horror, I realized an SQS error - Every distributed processor ran all night. Yikes! After a moment of concern, a moment of ponderance: This didn't have to happen, so why did it? For one, StandardData works on big datasets, often in the 100 TB+ range - a scale which can only amplify existing issues, causing cost to skyrocket. Though this bill is not StandardData's only mistake, it is the biggest to-date, and prompted a re-evaluation of our internal processes to make sure it doesn't happen again.
Our mantra is Business First, Technology Second because we care about business outcomes above all else. As evidenced by the high AWS bill, something about our prior strategy was not adherent to our core principles. It didn't take long to identify the root problem: The engineering processes were not following the business processes. As a business, we usually take the smallest number of steps possible to avoid overdevelopment and cost overruns; However, with these data pipeline mistakes, we found that we were deviating from that by chasing the high scale solution before we were ready to do so. With that, the fix was simple: Start small, iterate quickly, and scale when the time is right. As a rule of thumb, sleeping on it for one night is a good start, because then it will be obvious if something was running all night unnecessarily.
Technical firms and technical people love technology: We are no different. We get excited to work on big pools of data, and we must constantly keep ourselves in-check to ensure that we are doing right by our own business, as well as the businesses of our clients. The takeaway here is start small, iterate fast, and don't be in a rush to scale, or you might be scaling your AWS bill accordingly!