Data Architect: A Data Architect is the go-to person for data management, especially when dealing with any number of disparate data sources. With an extensive knowledge of how databases work, as well as how the acquired data relates to the business’s operations, the Data Architect, ideally, is able to speculate how changes will affect the company’s data use, then manipulate the data architecture to compensate for them.

• Responsibilities: Data warehousing, ETL, architecture development, modeling

• Languages: Hive, SQL, Pig, Spark, XML


Data Engineer: This role is closely related to the Data Architect. The Data Engineer also works on the management side of data, making some people think the titles are interchangeable. However, a Data Engineer, who usually has a strong background in software engineering, builds, tests and maintains the data architecture.

• Responsibilities: ETL, installing data warehousing solutions, data modeling, data architecture and development, database architecture testing

• Languages: R, Python, SAS, MatLab, SQL, NOSQL, Pig, Hadoop, Java, C/C++, Ruby Perl


Data Analyst: Data Analyst works to interpret data to get actionable insights for the company. With a strong background in statistics and the ability to convert data from a raw form to a different format (data munging), the Data Analyst collects, processes and applies statistical algorithms to structured data.

• Responsibilities: Data collection and processing, programming, machine learning, data munging, data visualization, applying statistical analysis

• Languages: R, Python, SQL, NOSQL, HTML, Java Script, C/C++


Data Scientist: A Data Scientist’s mission is similar to that of a Data Analyst’s: find actionable insights that are key to a company’s growth and decision-making. However, a Data Scientist role is needed when a company’s data volume and velocity exceeds a certain level that requires more robust skills for sorting through a rolling sea of unstructured data (big data) to identify questions and pull out critical information. The person then cleanses the data for proper analysis and creates new algorithms to run queries that relate data from disparate sources.

On top of these skills, a Data Scientist also needs strong storytelling and visualization skills to share insights with peers across the company.

• Responsibilities: Data cleansing and processing, predictive modeling, machine learning, identifying questions, running queries, applying statistical analysis, correlating disparate data, storytelling and visualization

• Languages: R, Python, SAS, Hive, MatLab, SQL, Pig, Spark, Hadoop

Roles in Data: The breakdown

Category: Notes