What is a dataset?

2 mins read - Updated on Mar 06, 2024

A dataset is mostly an Excel file that has the data in each row. In our case, the data is most likely a consumer feedback in a free-text format about a brand, product, or service. Since Kimola Cognitive provides classification technology, a dataset must have two columns at minimum.

Text Column

Text Column is the term for the column that holds the text to be analyzed and classified. Let's consider we have a dataset full of consumer feedbacks around a shoe brand. Consumer feedback saying, "I liked these shoes!!!" should be on the text column. If classified content has more than one element like Title and Content, you may consider combining these two columns as the Text Column.

Label Column

Label Column is the term for the column that holds the tag for the Text Column. This tag determines how we want to classify Text Column based on its content. For example, if Text Column has consumer feedback saying "I liked these shoes!!!", the Label column may be "Excitement". In some cases, free-text on Text Column needs to be classified with more than one label. We either can have an additional Label Column or a new dataset file (maybe a new sheet in Excel) for this new classification aspect.

Download Sample Datasets from Kimola on Github

To enhance your data analysis journey, we've published cleaned, sample datasets for different businesses. Click here to visit our Github profile and download sample datasets.