
The project involved utilizing the R programming language to analyze datasets comprising 41 days of preseason training data from a professional soccer club. The dataset is structured as follows:

The data collection was conducted using GPS units, and a notable issue arose where the GPS units malfunctioned for certain players on specific days. The objective was to identify and address missing values for two players on a particular training day. The missing values in the dataset appeared as follows:

There are several ways to undertake this problem and the decision was to make linear regression models in order to predict the missing values based on previous performance of the players. The other options are: mean value, median value, or mode value.
Loading data and observing values

Subsetting data from the players we want to predict values for.

Example of a model

After making predictions for a specific player, reintegrating the predicted data back into the dataset.

How the data appeared following engineering and transformation:

Add comment
Comments