When we train a Machine Learning model with a dataset, it becomes ready for predictions. In our case, prediction is about offering the best possible Label for a given free-text. Since Machine Learning models use statistical methods there is always a possibility for wrong predictions. Let's consider we have a Machine Learning model for Sentiment analyses. If we pass, "I'm so proud of the @NASA and @SpaceX team today, they were ready for launch." trough this Sentiment model we expect to get Positive as the predicted label. On the other hand, if this free-text is a total stranger for our model, it may answer it wrong by predicting the label as Negative.
This situation is accepted in data science as long as it's monitored. We monitor this possibility with Accuracy Rate metric. Accuracy Rate is percentage of correct predictions for a given dataset. This means, when we have a Machine Learning model with the accuracy rate of 85%, statistically, we expect to have 85 correct one out of every 100 predictions.
When a dataset is uploaded to train a Machine Learning model, Kimola Cognitive splits the dataset as Training Set and Test Set. Before training with the whole dataset, Kimola Cognitive first trains the model with training set which is the Test Set less than the complete dataset. This way, we will have trained model and a labeled test set to make predictions and see if this prediction is correct.
Keep in mind that Kimola Cognitive usess Cross Validation technique to calculate the most reasonable Accuracy Rate. This method is used to overcome possible overfitting and selection bias problems.
Tell us about your thoughts and experiences regarding the article.