TIG’s Exploratory Data Analysis

Exploratory data analysis (EDA) is an approach we use to initially analyze our clients data sets to summarize the main characteristics, often with visual tools. This technique in Big Data and Advanced Analytics allows our Data Scientist and/ or Analyst to build a statistical model primarily for seeing what the data can tell us beyond the formal modeling or hypothesis testing the task of predictive analysis - which would be the next step in our analytic process—post EDA.

TIG’s Exploratory Data Analysis (EDA) is a method for data analysis that employs a variety of techniques to:

1. Maximize insight into a data set
2. Uncover underlying structure
3. Extract important variables
4. Detect outliers and anomalies
5. Test underlying assumptions
6. Develop parsimonious models
7. Determine optimal factor settings

Focus The EDA results can determine how a data analysis should be carried out. EDA is not identical to statistical graphics although the two terms are used almost interchangeably. Statistical graphics is a collection of techniques--all graphically based and all focusing on one data characterization aspect. EDA encompasses a larger setting in data analysis that postpones the usual assumptions about what kind of model the data follows. This gives a more direct view - allowing the data itself to reveal its underlying structure and model. EDA is not a mere collection of techniques. EDA is how we dissect a data set - what we look for; how we look; and how we interpret.

Techniques Our EDA techniques are graphical in nature coupled with quantitative methods. The reason for the heavy reliance on graphical tools is that by its very nature the main role of EDA is to open-mindedly explore. Graphics gives the Data Scientist and Analysts’ unparalleled power to do so, enticing the data to reveal its structural secrets, always ready to gain some new, often unsuspected, insight into the data. In combination with the natural pattern-recognition capabilities that we all possess, graphics provides unparalleled power to carry this out.

Primary and Secondary Goals The primary goal of EDA is to maximize our Data Scientist and/or Analyst’s insight into our client’s data set as well as the underlying structure of a data set, while providing all of the specific items that an Analyst would want to extract such as:

1. A good-fitting, parsimonious model
2. A list of outliers
3. A sense of robustness of conclusions
4. Estimates for parameters
5. Uncertainties for those estimates
6. A ranked list of important factors
7. Conclusions as to whether individual factors are statistically significant
8. Optimal settings

Big Data Assessment
Data Insight

Discover how you can optimize now – connect today to set up a live demo.