Effective Data Wrangling and Exploration with R
by FRU Kingsly
English | 2021 | ASIN: B08TW46GVF | 1206 Pages | EPUB | 4.95 MB
Data wrangling is one of the most important steps in data science and analytics, for it is claimed that it takes between 80% to 90% of an analyst's time. Data wrangling goes by many names including data munging, data manipulation, data preparation and data transformations. Just as there are many names to data wrangling, there are also many definitions to it. Below we look at two of the most important ones:
TRIFACTA which is a leading provider of data wrangling software by the same name defines data wrangling as:
"Data wrangling is the process of cleaning, structuring and enriching raw data into a desired format for better decision making in less time".
Gartner defines data wrangling as:
"Data preparation is an iterative-agile process for exploring, combining, cleaning, and transforming raw data into curated datasets for self-service data integration, data science, data discovery, and BI/analytics".
Clearly from the above, we can deduce that data wrangling is the process of converting raw data from one form to another that is appropriate for a specific task at hand. It is rare in analytics to receive data in the form and shape that we want to perform our analysis. Most often, we will be required to transform, clean, enrich and explore that data before we move to our analysis.
Data wrangling involves:
Importing and exporting dаta: to and from csv, excel, databases etc.
Cleaning dаta: identifying and dealing with missing data, outliers, and duplicates
Manipulating text and categorical data
Manipulating dates
Encoding and enriching data
Manipulating columns and rows
Split-apply-combine data
Merging data
Reshaping data
Grouping and Aggregating data
Exploring data
Data exploration is and should be the initial step of any data analysis project. It is a mini form of data analysis in which we make use of both descriptive statistics and data visualization techniques to better understand our dataset. With traditional analysis and research, we know with exactitude what we are after (that is the hypothesis is known) before collecting data. With exploratory analysis, the process is reversed; we assume little or no information about the outcome of the analysis but instead explore the data to come up with some meaningful insight or hypothesis. Data exploration involves:
looking at the structure and size of the data
looking at the completeness and correctness of the data
looking at the possible relationships that may exist between data elements
As can be observed, the boundary between data exploration and data wrangling is blurred because both make use of data cleaning techniques to make sure that the data is correct and complete for data analysis.
This book is all about data wrangling and exploration as important steps leading up to data analysis.
How is this Book Structured
It is divided into seven parts which include:
Part1: Programming with R (chapter 1 to 15)
Part2: Import and export data (chapter 16 to 18)
Part3: String and categorical data manipulation (chapter 19 to 21)
Part4: Date manipulation (chapter 22 to 24)
Part5: Data manipulation (chapter 25 to 28)
Part6: Data cleaning (chapter 29 to 30)
Part7: Data exploration (chapter 31 to 32)
Buy Premium From My Links To Get Resumable Support,Max Speed & Support Me
https://uploadgig.com/file/download/d250fB0fad72028c/ktmev.Effective.Data.Wrangling.and.Exploration.with.R.rar
https://rapidgator.net/file/5851aa35e9b9df71ea9184b5fa33a6e5/ktmev.Effective.Data.Wrangling.and.Exploration.with.R.rar.html
http://nitro.download/view/6832C9570D30AA7/ktmev.Effective.Data.Wrangling.and.Exploration.with.R.rar