Saturday, 27 January 2018

Big data

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy. There are three dimensions to big data known as Volume, Variety and Velocity.
Lately, the term "big data" tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem." Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on." Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet search, fintech, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,connectomics, complex physics simulations, biology and environmental research.

Data sets grow rapidly - in part because they are increasingly gathered by cheap and numerous information-sensing Internet of things devices such as mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks. The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5 exabytes (2.5×1018) of data are generated. By 2025, IDC predicts there will be 163 zettabytes of data. One question for large enterprises is determining who should own big-data initiatives that affect the entire organization.

   Walking Into Big Data

Big Data is huge in amount, it is also captured at a fast rate and it is ordered or not ordered or some time amalgamation of the above. These factors create Big Data not easy to mine, manage and capture using conventional or traditional methods.

1.2 Aim/Objective
Perform Association Rule Mining and FP Growth on Big Data of E-Commerce Market to find frequent patterns and association rules among item sets present in database by using reduced Apriori Algorithm and reduced FP Growth Algorithm on top of Mahout (an open source library or java API) built on Hadoop Map Reduce Framework.
1.3 Motivation
Big Data refers to datasets whose amount is away from the ability of characteristic database software tools to analyze, store, manage and capture. This explanation is deliberately incorporates and subjective, a good definition of how large a dataset needs to be in order to be considered as big data i.e. we cannot define big data in terms of being big than a certain number of terabytes or thousands of gigabytes. We suppose that as technology advances with time the volume of datasets that would be eligible as big data will also rise. The definition can differ from sector to sector; it is depending up on which kind of software tools are normally present and what size of datasets are general in a particular industry. According to study, today big data in many sectors will range from a few dozen terabytes to thousands of terabytes.
' Velocity, Variety and Volume of data is growing day by day that is why it is not easy to manage large amount of data.
' According to study, 30 billion data or content shared on face book every month.
Issues/Problems while analysing Big Data:
Volume:
' According to analysis, every day more than one billion shares are traded on the New York Stock Exchange.
' According to analysis, every day Facebook saves two billion comments and likes
' According to analysis, every minute Foursquare manages more than two thousand Check-ins
' According to analysis, every minute Trans Union makes nearly 70,000 update to credit files
' According to analysis, every second Banks process more than ten thousand credit card transactions
Velocity:
We are producing data more rapidly than ever:
' According to study processes are more and more automated
' According to study people are more and more interacting online
' According to study systems are more and more interconnected
Variety:
We are producing a variety of data including:
' Social network connections
' Images
' Audio
' Video
' Log files
' Product rating comments
1.4 Background
Big data[5][6] is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
Gartner, and now much of the industry, continue to use this "3Vs" model for describing big data [7]. In 2012, Gartner updated its definition as follows: Big data is the term that can be defined as high velocity, volume and variety of information assets that require new forms of processing to enable enhanced result building, nearby discovery and process optimization [8]. Additionally, a new V "Veracity" is added by some organizations to describe it.
Big data has evolved like a very important factor in the economic and technology field, as Similar to other important factors of invention like hard-assets & human-capital, high in numbers the present economic activity merely could n't take position exclusive of it. We can say that by looking at current position of the departments in the US economic companies have minimum of 200TB data storage on an average if considered(as double as the size of Wal-Mart's data warehouse of US-retailer in 1999) having one thousand workers approximately. In fact many departments have 1peta-byte 'PB' (in mean) data storage per organization. The growth of the big data will be continue to reach to an high extent, due to the modern technologies, platforms and their logical units and capabilities for handling large amount of data and also its large no. of upcoming users.

Utilization of Big Data will turn Out to Be a Key base of Competition and Growth for Individual Firms:
Usage of big-data has become an important medium for the leading firms to get better in their data handling. If we consider an example of a retail company, the company can increase its operating margin by 60% approximately by embracing their big data. The chief retailers like UK's TESCO and many more use big-data to keep market revenue-share in their pocket against from their local competitors.
The emergence of big-data also has capability to evolutes new growth opportunities for those companies who have both combine and industry analyzing data. Even the companies who have their data at the mid-point of large info data about the objectives and demands of their users, services, buyers, products & suppliers can be easily analyzed and captured using big-data.
The big-data usage in any firm or company can facilitate the healthy and more enhanced analyzing of data and its outcome, by deploying the big-data in the firm there will be lower prices of product, higher quality and healthy match between the company and customer's need. We can say that the step forward towards the acceptance of big data can improve customer surplus and acceleration of performance along all the companies.

Figure1.1: Types of data generated

Figure1.2: Impact of Big Data
Significance of Big Data:
Government sector:
' The administrator of the Obama has announced the idea of big-data R&d which is very useful to handle the several obstacles and problem which government is facing now a days. Their idea comprised of 84 big-data programs with 6 different departments.
' Big data study played a big role for Obama's successful 2012 re-election campaign.

Private sector:
' Ebay.com uses two data warehouse that consists of 7.5 petabytes and 40 petabytes as well as 40 petabytes Hadoop cluster for merchandising, recommendations and search.
' Everyday, Amazon.com also handles large amount of data (in millions) and its related back-end operations as well as requested queries from third part sellers on an average of more than half million.
' More than 1Million consumer-transactions processes every hour in Walmart, that is put into databases and estimation is done on data.
' Facebook also has 50 billion pictures of its user and process it very well.
' F.I.C.O. that is 'Falcon Credit Card Fraud Detection System' handles and secure 2.1-billion active a/c worlds widely.
' According to estimation, the size of the data stored of business and companies is increasing to the double in every 1.2 years.
' Windermere Real Estate uses various 'GPS signals' from nearly 100 million drivers which help new home seekers to determine their critical times to & from work around the times.

No comments:

Post a Comment