Page 107 - Code & Click

Page 107 - Code & Click - 8

P. 107

2. Variety: Various data types may need to be stored and managed together in big data systems.
In addition, big data applications often include multiple datasets that may not be integrated
upfront. Big data includes the following:
• Structured data, such as transactions and financial records.
• Unstructured data, such as text, documents, and multimedia files.
• Semi-structured data, such as web server logs and streaming data from sensors.
3. Velocity: This refers to the speed at which data is generated and must be processed and
analysed. In many cases, sets of big data are updated on a real-time or near-real-time basis
instead of the daily, weekly, or monthly updates made in many traditional data warehouses.
4. Veracity: This refers to the degree of accuracy in data sets and how trustworthy they are.
Raw data collected from various sources can cause data quality issues that may be difficult to
pinpoint. Data management and analytics teams also need to ensure that they have enough
accurate data available to produce valid results.
5. Value: Organisations need to confirm that data relates to relevant business issues before it’s
used in big data analytics projects.
6. Variability: This refers to the context in which the data collected from varied sources needs to
be understood before adding it to the data stores.

How is Big Data Stored and Processed?
Big data is often stored in a data lake. While data warehouses are commonly built on relational
databases and contain structured data only, data lakes can support various data types and are typically
based on clusters, cloud object storage services, NoSQL databases, or other big data platforms.
Many big data environments combine multiple systems in a distributed architecture; for example, a
central data lake might be integrated with other platforms, including relational databases or a data
warehouse. In other cases, it’s pre-processed using data mining tools and data preparation software.
The required computing power is often provided by clustered systems that distribute processing
workloads across hundreds or thousands of commodity servers. The cloud is a popular location for
big data systems. Organisations can deploy their own cloud-based systems or use managed big-data-
as-a-service offerings from cloud providers. The business only pays for the storage and compute
time it uses, and the cloud instances can be turned off until they’re needed again.

How Big Data Analytics Works?
To get valid and relevant results from big data analytics applications, data scientists and other data
analysts must have a detailed understanding of the available data and a sense of what they’re
looking for in it. That makes data preparation, which includes profiling, cleansing, validation, and
transformation of data sets, a crucial first step in the analytics process.
Using customer data as an example, the different branches of analytics that can be done with sets of
big data include the following:
• Comparative analysis: This examines customer behaviour metrics and real-time customer
engagement in order to compare a company’s products, services, and branding with those of
its competitors.
• Social media listening: This analyses what people are saying on social media about a business
or product, which can help identify potential problems and target audiences for marketing
campaigns.

105

102 103 104 105 106 107 108 109 110 111 112