Page 107 - Code & Click - 8
P. 107

2.  Variety: Various data types may need to be stored and managed together in big data systems.
                  In addition, big data applications often include multiple datasets that may not be integrated
                  upfront. Big data includes the following:
                  • Structured data, such as transactions and financial records.
                  • Unstructured data, such as text, documents, and multimedia files.
                  • Semi-structured data, such as web server logs and streaming data from sensors.
              3.  Velocity: This refers to the speed at which data is generated and must be processed and
                  analysed. In many cases, sets of big data are updated on a real-time or near-real-time basis
                  instead of the daily, weekly, or monthly updates made in many traditional data warehouses.
              4.  Veracity: This refers to the degree of accuracy in data sets and how trustworthy they are.
                  Raw data collected from various sources can cause data quality issues that may be difficult to
                  pinpoint. Data management and analytics teams also need to ensure that they have enough
                  accurate data available to produce valid results.
              5.   Value: Organisations need to confirm that data relates to relevant business issues before it’s
                  used in big data analytics projects.
              6.  Variability: This refers to the context in which the data collected from varied sources needs to
                  be understood before adding it to the data stores.

            How is Big Data Stored and Processed?
            Big data is often stored in a data lake. While data warehouses are commonly built on relational
            databases and contain structured data only, data lakes can support various data types and are typically
            based on clusters, cloud object storage services, NoSQL databases, or other big data platforms.
            Many big data environments combine multiple systems in a distributed architecture; for example, a
            central data lake might be integrated with other platforms, including relational databases or a data
            warehouse. In other cases, it’s pre-processed using data mining tools and data preparation software.
            The required computing power is often provided by clustered systems that distribute processing
            workloads across hundreds or thousands of commodity servers. The cloud is a popular location for
            big data systems. Organisations can deploy their own cloud-based systems or use managed big-data-
            as-a-service offerings from cloud providers. The business only pays for the storage and compute
            time it uses, and the cloud instances can be turned off until they’re needed again.

            How Big Data Analytics Works?
            To get valid and relevant results from big data analytics applications, data scientists and other data
            analysts  must  have a  detailed  understanding  of  the available  data  and  a  sense of  what they’re
            looking for in it. That makes data preparation, which includes profiling, cleansing, validation, and
            transformation of data sets, a crucial first step in the analytics process.
            Using customer data as an example, the different branches of analytics that can be done with sets of
            big data include the following:
               •  Comparative  analysis:  This  examines  customer  behaviour  metrics  and  real-time  customer
                  engagement in order to compare a company’s products, services, and branding with those of
                  its competitors.
               •  Social media listening: This analyses what people are saying on social media about a business
                  or product, which can help identify potential problems and target audiences for marketing
                  campaigns.


                                                                                                                105
   102   103   104   105   106   107   108   109   110   111   112