A dataset is mostly an Excel file that has the data in each row. In our case, the data is most likely a consumer feedback in a free-text format about a brand, product, or service. Since Kimola Cognitive provides classification technology, a dataset must have two columns at minimum.
Text Column is the term for the column that holds the text to be analyzed and classified. Let's consider we have a dataset full of consumer feedbacks around a shoe brand. Consumer feedback saying, "I liked these shoes!!!" should be on the text column. If classified content has more than one element like Title and Content, you may consider combining these two columns as the Text Column.
Label Column is the term for the column that holds the tag for the Text Column. This tag determines how we want to classify Text Column based on its content. For example, if Text Column has consumer feedback saying "I liked these shoes!!!", the Label column may be "Excitement". In some cases, free-text on Text Column needs to be classified with more than one label. We either can have an additional Label Column or a new dataset file (maybe a new sheet in Excel) for this new classification aspect.
To enhance your data analysis journey, we've published cleaned, sample datasets for different businesses. Click here to visit our Github profile and download sample datasets.
Tell us about your thoughts and experiences regarding the article.