5 Data Wrangling

5.1 Definition

Data wrangling is loosely defined as the process of manually converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data with the help of semi-automated tools.

It typically follows a set of general steps which begin with extracting the data in a raw form from the data source, “wrangling” the raw data using algorithms (e.g. sorting) or parsing the data into predefined data structures, and finally depositing the resulting content into a data sink for storage and future use.


5.2 Wrangling Challenges

Some of the challenges encountered in data wrangling are:

  • Importing files
  • Organizing data sets
  • Transforming data
  • Combining data sets
  • Dealing with various data types (e.g., dates)
  • Identifying errors