Data normalization occurs after unstructured data is ingested. Stratifyd cleans out non-textual information such as HTML artifacts and random whitespace from the text found in documents. During tokenization, Stratifyd extracts individual words from documents. Next, Stratifyd performs lemmatization; a process that normalizes verb tenses by reducing verbs to their dictionary forms known as lemmas.
Spam detection also occurs at this time. Stratifyd will detect foreign languages and offers optional translation to your preferred language.