Page 161 - Computer - 8
P. 161

2.  Data Storage and Data Processing: Since data can have different
                formats  and  structures,  companies  need  to  consider  different                     Data
                storage  systems  based  on  the  type  of  data  that  needs  to  be       Data        Storage
                captured. Data management teams help set standards around                 Ingestion         and Data
                                                                                                                 Processing
                data  storage  and  structure,  which  facilitate  workflows  around
                analytics, machine learning, and deep learning models. This stage                DATA SCIENCE
                                                                                                   LIFECYCLE
                includes  cleaning  data,  removing  duplicate  data,  transforming
                and combining the data using ETL (extract, transform, load) jobs                            Data
                or other data integration technologies. This data preparation is          Communicate      Analysis
                essential for promoting data quality before loading into a data
                warehouse, data lake, or other repository.
            3.  Data Analysis: Data scientists conduct an exploratory data analysis to examine biases, patterns,
                ranges,  and  distributions  of  values  within  the  data.  This  data  analytics  exploration  drives
                hypothesis generation for testing. It also allows analysts to determine the data’s relevance for
                use within modelling efforts for predictive analytics, machine learning, and/or deep learning.
                Depending on a model’s accuracy, organisations can become reliant on these insights for business
                decision-making, allowing them to drive more scalability.

            4.   Communicate: Finally, insights are presented as reports and other data visualisations that make
                the insights and their impact on business easier for business analysts and other decision-makers
                to understand. A data science programming language, such as R or Python, includes components
                for generating visualisations; alternatively, data scientists can use dedicated visualisation tools.


          Role of Data Scientist
          Data science is considered a discipline, while data scientists are the practitioners within that field. Data
          scientists are not necessarily directly responsible for all the processes involved in the data science
          lifecycle. For example, data pipelines are typically handled by data engineers, but the data scientist may
          make recommendations about what sort of data is useful or required. While data scientists can build
          machine learning models, scaling these efforts at a larger level requires more software engineering skills
          to optimise a program to run more quickly. As a result, it’s common for a data scientist to partner with
          machine learning engineers to scale machine learning models.

          In short, a data scientist must be able to:
             •   Know enough about the business to ask pertinent questions and identify business pain points.
             •   Apply statistics and computer science, along with business acumen, to data analysis.
             •   Use a wide range of tools and techniques for preparing and extracting data, everything from
                databases and SQL to data mining and data integration methods.
             •  Extract  insights  from  big  data  using  predictive  analytics  and  artificial  intelligence,  including
                machine learning models, natural language processing, and deep learning.
             •   Write programs that automate data processing and calculations.

             •  Tell  and  illustrate  stories  that  clearly  convey  the  meaning  of  results  to  decision-makers  and
                stakeholders at every level of technical understanding.
             •   Explain how the results can be used to solve business problems.
             •   Collaborate  with  other  data  science  team  members,  such  as  data  and  business  analysts,
                IT architects, data engineers, and application developers.


                                                                                                             159
   156   157   158   159   160   161   162   163   164   165   166