So, you’ve succumbed to the buzz and now you’re looking around trying to make heads or tails of the mass amounts of information out there hyped up as “big data.” Or perhaps you’re even ready to start your own internal project to get your existing applications on the bandwagon. In either case, terrific! Your decision is a good one.
Unfortunately, now comes the flurry of potentially overwhelming questions:
- Where do I start?
- What are my expectations?
- What does big data mean to my company?
- What does big data mean in the context of our applications
- How do I assess my application needs?
- How do I know or determine if big data solutions will work for us?
After some online research, you’ll quickly find that most folks are merely picking a place at the edge of the pool, dipping their toes in here and there to test the water temperature.
The reality is that it is incredibly difficult to define the term big data. Its meaning includes so much more than just storing and using large data sets. When you hear people referring to big data, they’re actually referring to is the use of NoSQL database implementations to store and process large amounts of information.
Don’t be discouraged! In the following sections we’ll get into what those NoSQL databases are and how to identify which is best, if any, for your project[s]. That’s right; the goal here is to provide you with information that will allow you to draw the correct conclusions for your organization and your specific application[s] or project[s].
NoSQL databases aren’t really databases. In fact, they are nothing like a traditional relational database management system (RDBMS). Instead they are implementations of various data stores which do not have fixed schemas, referential integrity, defined joins, or a common storage model. Also, they typically do not adhere to ACID principles (atomicity, consistency, isolation, and durability) and have sometimes widely varied technologies behind them. The term NoSQL (or Not only SQL) is intended to imply that many of these implementations also support SQL-like query capabilities.
In this big data market where the NoSQL database is king, there are more than 100 different offerings available in various licensed models. The fact that these non-databases vary is no accident. Each distinct implementation has different strengths, weaknesses, and generally accepted uses. However, the bulk of these break down into four major categories based on some common underlying characteristics — as shown in this chart:
Choosing the Right Path
A heavy emphasis should be placed on the definition of your requirements. What are those? Well, that’s a large discussion all by itself. However, I’ll try to quickly paraphrase for the purpose of furthering this topic of discussion: Data requirements are artifacts captured during the process of defining application behavior with respect to gathering, storing, retrieving, or displaying information (data).