Big data vs. a lot of data

These days, nearly everyone in business has heard about “big data”. It’s one of the hottest topics in high tech, and many companies are spending big money on “big data solutions”. But is it for you?

You’ll learn here that “big data” is more than a buzz-word – it’s a very real technology. But if you have a lot of data that you aren’t using effectively, “big data” is not the solution – spend your energy instead learning, and getting value from the data you have.

So what is Big Data anyway?

Let’s start with what isn’t “big data”, at least for most companies: Your financial transactions are stored and processed in your accounting software – and whether you use QuickBooks or SAP R/3, it’s not big data – it fits in an ordinary database, generally on a single machine. (There are exceptions: If you’re Wal-mart and you’re processing 1 million transactions every hour, that likely qualifies as big data.)

For most companies, “big data” arises from the multitude of human interactions occurring every hour of every day that can affect your business. Examples are website page views, Google ad impressions, phone calls and text messages, tweets and Facebook posts, Yahoo Finance comments on your stock, or data streaming in from equipment in the field. Big data is characterized by volume, variety, and velocity.

If you have “a lot of data”, especially if it isn’t organized or you aren’t making use of it, “big data” is not what you need. A data warehouse or business intelligence system might be. Try reading Enda Ridge’s Guerrilla Analytics: A Practical Approach to Working with Data, to learn how to deal with your data.

Big Data technology

In terms of technology, what separates big data from ‘ordinary’ data is that a large dataset is spread across many computers and hard disks, and processed in parallel. Why do this? It’s a response to the physical limits of hard disks: Over the last 20 years, storage capacity has increased 30 times faster than disk read speed – so the time taken to read the contents of a typical hard drive has increased from about 5 minutes to 2.5 hours. Just reading, say, 10 or 20 terabytes of data sequentially takes literally all day.

But if you use many computers and hard disks, what if one of them fails – as it will, sooner or later? Hadoop software manages a cluster of computers, divides up and schedules processing work in parallel, and automatically recovers from failures, ensuring the work is re-done. Apache Spark software, a very hot topic today, can run on Hadoop clusters (or by itself) and do more flexible processing, much faster. With a Hadoop or Spark cluster, you can store and process all that website, advertising, phone, text, Twitter or Facebook data affecting your company.

Getting ‘lots of data’ organized

If you have a “lot of data”, you likely do have at least some of it in an organized and accessible form, such as your accounting system, SQL database, or even Excel spreadsheets. Start there, and ask what business questions you can answer now, what questions you’d most like to answer, and what additional data you need to answer those questions. Focus on getting just that data, following the maxim “do something small, simple, now”. Consider using Power Query and Power Pivot, included free in modern versions of Excel, to extract, transform and summarize your data.

Getting value from your data

OK, you’ve got some data organized – now what? Return to the business questions you’d most like to answer. Data visualization can help give you insights – consider using Power BI or Tableau.

Instead of spending big dollars on “enterprise” analytics software, spend your time (and small dollars) learning. We believe any business analyst can learn to do what the “data scientists” do. To get started with predictive analytics – forecasting, data mining or text mining – we recommend the book Data Mining for Business Intelligence, an excellent text that uses our own software, XLMiner. You can use this software to build practical predictive analytics models in Microsoft Excel, or on the Web at xlminer.com.

What if you do have big data? Then go for Apache Spark. Our XLMiner software has a built-in link to Spark clusters that makes it super-easy to analyze big data just like the pro data scientists, and start getting value right away.

Whether you have big data, a lot of data, or just some data organized, you can get value from it, starting now. And you can start building your organization’s analytics expertise and your personal skills today.

News Category: