Data Normalization

Knowledge Article

Data normalization occurs after unstructured data is ingested. Stratifyd cleans out non-textual information such as HTML artifacts and random whitespace from the text found in documents. During tokenization, Stratifyd extracts individual words from documents. Next, Stratifyd performs lemmatization; a process that normalizes verb tenses by reducing verbs to their dictionary forms known as lemmas.

Spam detection also occurs at this time. Stratifyd will detect foreign languages and offers optional translation to your preferred language.

Ready for a Demo?

Contact us to learn more

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

619 S Cedar St., Suite J
Charlotte, NC 28202
hello@stratifyd.com