Skip to content
analysis

What does a data analyst do?

Data analysts, data scientists, data engineers... There are many professionals who work with data. In this post, we will understand what data analysts do and deliver, and how he or she work alongside other data professionals

Estimated time to read: 6 minutes

Published on February 16, 2020 and updated on November 19, 2021.

I’m that kind of person who prefers being generalist instead of specialists, meaning that I know a little of many things instead of much about a couple of topics. This is exactly what I believe a data analyst is: a generalist, capable of navigate across different data sources in and out of organizations to discover things and communicate with others. What does that mean in practice and why data analysis is important for organizations?

The meaning is the same, only with a different context

Analysis is nothing more than a study that tries to explain natural phenomena. I dare to tell that we are analysts since the day we’ve born, and that we get better the more we analyze, with tools and techniques designed for studying different phenomena. From chemical transformations to social dynamics, we are capable of examining and understanding – or infer – the reason behind any given phenomena occurred.

Data analysis is a specialization of analysis in which a researcher examine something called data. Data is the smallest unit that forms knowledge. By itself, data has no meaning. Data analysts transform data into knowledge following learning processes. In this context, data is raw material to generate knowledge. Data analysts deliver reports that base business decisions, forecasting the return on investment along a period of time. In the public sector, data analysts study and communicate the usage of public resources with help from the press.

Data analysis in times of big data

What is definitely not missing on the internet are definitions of big data and its importance for organizations. Big volumes of data are created and stored all the time, thanks to the spread of technologies to collect, process and store data. A prosperous environment to study data increased the demand for data analysts. On different forms of analysis, professionals create a protocol to capture, process, store and extract information and then go to the field to collect the data. A data analyst find existing data inside organizations, but on many shapes, sizes and location, which are usually not suited for usage the way they are found. The protocol must not only be designed to collect data, but also treat for existing data as well. Data analysts must be able to handle inconsistent and missing data that on many occasions simply cannot be obtained again.

In the data science universe, the required infrastructure for data scientists to work on needs to be so well planned in regards to hardware, software and processes that there’s a professional dedicated to do just that: the data engineer. Depending on the organization, a similar professional may work alongside data analysts, even with a more open scope, such as the development team. Smaller organizations count with data analysts to also plan the infrastructure by which data will be handled. This infrastructure is called information flow architecture.

Information flow architecture

If you ask me what’s the difference between a data analyst and a data scientist, I would say that data analysts do everything data scientists do, except training algorithms. For the most part, data scientists treat data before using them to train algorithms. This treatment can be during collection and / or data transformation that will allow algorithms to understand them. What changes between the work done by a data analyst and the one done by a data scientist is its final objective: Data analysts handle and study data to find patterns based on the past, while data scientists handle and study data to identify trends. For each objective, the same dataset can be treated completely different, but on both approaches data was treated and stored to be later used.

This is the main difference between these professions, but not the only one. Some places have data engineers, that configure the infrastructure of hardware, software and processes to be used by data scientists to handle date, while it is uncommon for this compliment to exist and support data analysts. It is common for smaller companies not having the same volume of data as big companies. Smaller companies, with less volumes of data, can take advantage of smaller infrastructures maintained by data analysts. This smaller infrastructure for small and medium companies, comprised of good hardware, software and processes for data analysis or data science, is what I’m calling information flow architecture. No matter the configuration, this infrastructure will have at least:

  • One or more data sources;
  • One or more data transformers;
  • One or more storage repositories;
  • One or more outputs for data consumption.

Sounds like ETL or ELT? The information flow architecture is the starting point to define the ideal workflow. This basic infrastructure will be setup according to its goal, which will vary among data analysts and data scientists, since structures like data warehouses accept different use cases than data lakes.

Process of collection, treatment, analysis and data extraction will vary according to the goal.

Process of collection, treatment, analysis and data extraction will vary according to the goal. Source: Xplenty.

Generating knowledge is the starting point

No matter if data analysts are solely focused on analysis itself or if they also carry the responsibility of maintaining a proper infrastructure, the work is not done if it doesn’t generate knowledge for the organization, capable of driving business decisions. It is at this moment that the generalist profile of a data analyst is useful, since knowing business processes is fundamental for establishing a starting point for the analysis. The starting point is important to contextualize any findings from data. I believe data analysts aren’t differnt from Ant Man, a Marvel superhero. He can shrink indefinitely until finding a completely different information universe, but if he shrinks too much he won’t be able to come back and share his findings with the team.The starting point is the set of configuration and coordinates that will drive the data analyst work – and the limit for Ant Man to shrink.

In Marvel Cinematic Universe, Ant Man explores the quantum realm after shrinking.

In Marvel Cinematic Universe, Ant Man explores the quantum realm after shrinking. Source: Express.co.uk.

The utility belt

Collect and organize datasets to extract and communicate knowledge are two very important skills for any data analyst. It happens that the workplace environment is rarely what’s needed so it must be treated before data can be used. In a digital environment where technologies to collect data are limited, and privacy is a topic to be discussed, the challenges around working with data are even bigger. To deal with problems like those, the toolbox of data analysts have the same tools that are found on the toolboxes of mathematics, statistics, software developer and designer specialists. Know about descriptive and inferential statistics is important to explain volumes of data with different levels of quality and noise, ensuring the necessary reliability to support business decisions. Software development tools, on the other hand, are useful to create automatic pipelines to collect, transform and store data. Lastly, tools used by experience designers help driving the work of data analysts so they are always in sync the business areas that need the kind of knowledge data analysts can deliver.

In the end, it is about being both generalist on business knowledge and knowledge about tools to be specialist at generating and communicating knowledge for the company.

Let’s analyze data!