A Visual Approach To Corporate eLearning: Exploratory Data Analysis In Online Training
SFIO CRACHO/Shutterstock.com

Exploratory Data Analysis In Online Training

The corporate eLearning landscape is shaped by the demands of the labor market and technological advancements. Inspired by MOOC (Massive Open Online Course) platforms, the modern ways employees learn are influenced by a few significant trends like adaptive programs, collaborative learning, gamification, and responsive design. These approaches are based on data and for them to function as planned, the information needs to be transformed, patterns extracted, and clusters created. This is a perfect case for exploratory data analysis (EDA), a philosophy based mostly on graphical techniques to find meaning in data before using it in a model. Plainly put, this is a way of looking at data to understand it before bluntly throwing it into the black box of algorithms.


Exploratory data analysis is a set of tools and techniques that ensure the validity and relevance of results by trying to understand the data, looking at limit-cases, outliers, and anomalies, as well as ordinary situations. It relies predominantly on graphical representations, and this is a useful approach since people are primarily visual learners.

It should be the first and most natural step in data analysis. Just plotting the data sets and taking a good look at it to see if any patterns arise or if any clusters form naturally. As Valeryia Shchutskaya from Indata Labs suggests, it can act as a starting point in formulating questions and starting with an open mind for exploration.

5 Methods And Applications Of Exploratory Data Analysis In Online Training

We will go through the most common EDA methods mentioning some of the possible applications in eLearning, with potential drawbacks. This is by no means an exhaustive list, just a teaser of what could be achieved. It is important to remember that the preparation of the data is as important as applying methods. After the acquisition, be sure to clean the data and transform it into the appropriate form.

1. Univariate Visualization

You can start by analyzing each variable in your set by plotting it on a graph. This representation can show you the distribution of the variable, or the unfolding of a process over time. You may be interested in simple statistics like the average, the standard deviation, minimum, maximum, skewness, and more. The common univariate graphs include lines, bar charts, pie charts and histograms.

In eLearning, the information displayed could show you the range of learning times in minutes as a histogram. If most subjects are near the mean, that means you have a good process, with an adapted curriculum. A right-hand side skewness means that those employees take longer to learn, so perhaps a personalized training path could help, while a left-hand side skewness can signal that either the curriculum is too easy or the employees abandon the training too fast.

Another possible univariate graph could show, the completed modules for each employee. In a personal dashboard, such a graph could be labeled with levels and achievements in a gamified approach.

2. Bivariate Visualization

The bivariate approach is the foundation of if-then analysis, highlighting the link between the cause and the effect variables and any real correlation. The graphical representations can be anything from simple two axis plots to heat maps.

In eLearning environments, these graphs can show the relationship between time spent on the platform and test results, and can help identify any outliers, like star performers or severe under-performers. The variables plotted could be changed from a control panel, filtered by various characteristics like age, education and act as the base for assumptions. An organization employing such methods can identify the exact areas where the employees lack skills, just by looking at test results.

3. Multivariate Visualization

Most results are not caused by a single influence, but a bunch of them. This is exactly what multivariate visualization is trying to portray. 3D representations, bubble charts, and other innovative tools can be used to identify not only the influencing variables on a phenomenon but the magnitude of such influence.

For a corporate eLearning program, a useful chart could be a bubble chart with the modules on the horizontal axis, the scores on the vertical axis and the size of the bubble showing the number of employees in each situation. This could give an overview of strong and weak points.

4. Dimensionality Reduction

When taking into consideration many initial variables that could influence the outcome, the analysis can be simplified by either selecting the most important ones or creating new variables by combining existing ones into indexes.

In a Learning Management System environment, this technique can be used to identify the factors that could affect the results such as the time of accessing the content, the device on which the content is deployed, and type of content used. The results can give a data-based formula of the best mix of learning materials, placement of the pictures and ideas about creating a responsive design of the platform that boosts understanding.

5. Clustering

After creating a bivariate visualization, data can show natural clusters (groups) of subjects. The most significant problem is separating the entities which are at the border between groups. Sometimes, this process is made easier by using fuzzy clusters, an approach where each element can belong to more groups, up to a percentage.

In eLearning, clusters can represent achievement levels. Passing from one cluster to the other can signal the need for a promotion or other way of recognition. Also, the curriculum must be adapted to the capacities of each group, given the homogeneity of the elements.


Although more related to statistics than learning, the exploratory data analysis framework can be successfully used to design more engaging and efficient corporate training programs. Since it is based on data, it can serve as a better way to make decisions instead of just continuing with business as usual. Taking some time to look at data showing learning progress or obstacles is at the heart of improving the process by completing the feedback loop. The value of this approach is that it visually leverages the data you collect and gives actionable insights right away.