Insights from our 2016 Guide to Big Data Processing. Read on to know the key research findings in the report from the expert’s opinion
DZone is one of the web’s largest communities and publishers of technical content (Big Data, DevOps, and more) for software professionals. Every quarter they publish a guide, and this time around they asked us at Datos IO to participate in their 2016 Guide to Big Data Processing. This guide is packed with interesting insights on all things Big Data, particularly around how developers are addressing the ever-growing problem of data sets and the associated enterprise tools they are using.
So for this report, DZone gathered the thoughts of over 1,500 developers to dig deeper into what resources are needed to build a system for big and fast data, insight on the future of Big Data, and much more. Demographics are as follows:
- 83% of respondents’ companies use Java; 63% use JavaScript; 28% use Python; 27% use C#.
- 66% of respondents’ primary programming language is Java; 73% are developers or developer team leads; 6% are self-employed; 5% are C-level executives; 4% are business managers or analysts; 3% are in IT operations.
- 63% have been IT professionals for 10 or more years; 37% have been IT professionals for 15 or more years.
Before diving into the insights in this report, it’s important to first define big data. BIG DATA is a common term for large amounts of data. To be qualified as big data, data must be coming into the system at a high velocity, with large variation, or at high volumes. The business case for adopting Big Data is to ultimately turn data into insight. Examples of using big data span all industries, whether combining buying habits with location data to push individually-tailored offers in the retail sector, or using predictive analytics in manufacturing plants to optimize plant maintenance.
And the reason to adopt new data technologies is more complex than whether or not data volume, velocity, variety physically requires a specialized algorithm or tool. One interesting finding is that when respondents were asked whether they were “planning to adopt new ‘Big Data’ technologies in the next six months:”
- 37% of respondents reported that they are planning to do so.
- Of the 37% (426 respondents out of 1154), the results show that, in spite of the explosion of non-batch tools and frameworks, Hadoop is still the go-to Big Data technology for new adopters.
What’s also worth noting here is that three basic, NoSQL storage models are highlighted in this “top seven” list. Document-oriented MongoDB, column-(family-)oriented Cassandra, and graph Neo4j was the only particular graph DBMS mentioned more than once, with none clearly dominating.
As momentum of enterprise adoption builds and these distributed databases gain more traction, it’s important to understand that these databases while very useful in processing large volumes of data at a high velocity, can benefit from a backup and recovery solution purpose-built for the cloud era. That is, as these organizations adopt next-generation non-relational databases, they also need an ecosystem of data management products to protect the data and extract value from that data.
And the reason to protect data are as follows:
- First, it’s to minimize application downtime in the event of data loss due to hardware failures or human error. Human errors occur all the time, such as fat-fingers. In today’s ‘always-on’ world, customers want instant access and application downtime can be detrimental, leading to loss of customers, loss of brand, and ultimately loss of revenue.
- Second, there are compliance requirements in certain verticals that require organizations to retain and be able to recover data over its lifetime.
The challenge is that non-relational databases that support these next-generation applications (Analytics, IoT, SaaS and more) lack enterprise-class data protection solutions. And it’s this gap that’s putting enterprises at the risk of data loss and furthermore limiting their adoption of this new infrastructure. Enterprises will not be able to onboard their mission critical applications on distributed databases unless this critical gap is addressed.
Link:
https://dzone.com/articles/dzone-how-big-is-big-data