Page 149 - Computer - 8
P. 149
Characteristics of Big Data
Big data was originally associated with three key concepts, called the three V’s: Volume, Variety, and
Velocity. Later on, data scientists added more characteristics to big data to ensure that the collected
data was relevant for storage and applications. Let us learn about these V’s.
1. Volume: This is the most commonly cited characteristic of big data. A big data environment contains
a large amount of data due to the nature of the data being collected and stored. Clickstreams,
system logs, and stream-processing systems are among the sources that typically produce massive
volumes of data on an ongoing basis.
2. Variety: Various data types may need to be stored and managed together in big data systems. In
addition, big data applications often include multiple datasets that may not be integrated upfront.
For example, a big data analytics project may attempt to forecast sales of a product by correlating
data on past sales, returns, online reviews, and customer service calls. Big data also encompasses
a wide variety of data types, including the following:
• Structured data, such as transactions and financial records.
• Unstructured data, such as text, documents, and multimedia files.
• Semi-structured data, such as web server logs and streaming data from sensors.
3. Velocity: This refers to the speed at which data is generated and must be processed and analysed.
In many cases, sets of big data are updated on a real-time or near-real-time basis instead of the
daily, weekly, or monthly updates made in many traditional data warehouses. Managing data
velocity is also important as big data analysis further expands into machine learning and artificial
intelligence (AI), where analytical processes automatically find patterns in data and use them to
generate insights.
4. Veracity: This refers to the degree of accuracy in data sets and how trustworthy they are. Raw
data collected from various sources can cause data quality issues that may be difficult to pinpoint.
If they aren’t fixed through data cleansing processes, bad data leads to analysis errors that can
undermine the value of business analytics initiatives. Data management and analytics teams also
need to ensure that they have enough accurate data available to produce valid results.
5. Value: Some data scientists and consultants add value to the list of big data’s characteristics. Not
all the data that’s collected has real business value or benefits. As a result, organisations need to
confirm that data relates to relevant business issues before it’s used in big data analytics projects.
6. Variability: This refers to the context in which the data collected from varied sources needs to be
understood before adding it to the data stores.
How is Big Data Stored and Processed?
Big data is often stored in a data lake. While data warehouses are commonly built on relational databases
and contain structured data only, data lakes can support various data types and are typically based on
clusters, cloud object storage services, NoSQL databases, or other big data platforms.
147