Page 161 - Computer - 8
P. 161
2. Data Storage and Data Processing: Since data can have different
formats and structures, companies need to consider different Data
storage systems based on the type of data that needs to be Data Storage
captured. Data management teams help set standards around Ingestion and Data
Processing
data storage and structure, which facilitate workflows around
analytics, machine learning, and deep learning models. This stage DATA SCIENCE
LIFECYCLE
includes cleaning data, removing duplicate data, transforming
and combining the data using ETL (extract, transform, load) jobs Data
or other data integration technologies. This data preparation is Communicate Analysis
essential for promoting data quality before loading into a data
warehouse, data lake, or other repository.
3. Data Analysis: Data scientists conduct an exploratory data analysis to examine biases, patterns,
ranges, and distributions of values within the data. This data analytics exploration drives
hypothesis generation for testing. It also allows analysts to determine the data’s relevance for
use within modelling efforts for predictive analytics, machine learning, and/or deep learning.
Depending on a model’s accuracy, organisations can become reliant on these insights for business
decision-making, allowing them to drive more scalability.
4. Communicate: Finally, insights are presented as reports and other data visualisations that make
the insights and their impact on business easier for business analysts and other decision-makers
to understand. A data science programming language, such as R or Python, includes components
for generating visualisations; alternatively, data scientists can use dedicated visualisation tools.
Role of Data Scientist
Data science is considered a discipline, while data scientists are the practitioners within that field. Data
scientists are not necessarily directly responsible for all the processes involved in the data science
lifecycle. For example, data pipelines are typically handled by data engineers, but the data scientist may
make recommendations about what sort of data is useful or required. While data scientists can build
machine learning models, scaling these efforts at a larger level requires more software engineering skills
to optimise a program to run more quickly. As a result, it’s common for a data scientist to partner with
machine learning engineers to scale machine learning models.
In short, a data scientist must be able to:
• Know enough about the business to ask pertinent questions and identify business pain points.
• Apply statistics and computer science, along with business acumen, to data analysis.
• Use a wide range of tools and techniques for preparing and extracting data, everything from
databases and SQL to data mining and data integration methods.
• Extract insights from big data using predictive analytics and artificial intelligence, including
machine learning models, natural language processing, and deep learning.
• Write programs that automate data processing and calculations.
• Tell and illustrate stories that clearly convey the meaning of results to decision-makers and
stakeholders at every level of technical understanding.
• Explain how the results can be used to solve business problems.
• Collaborate with other data science team members, such as data and business analysts,
IT architects, data engineers, and application developers.
159