What are Stop Words?
Stop words are words that carry no significant value or meaning in search queries and natural language processing text analytics. These words are filtered out because they contain unnecessary information. The most common words in a language are often stop words – such as “the” and “be” – but there is no standard for stop word lists to adhere to. In fact, Stratifyd does not use stop word lists by default in order to support phrase searching.
Any list of words can be chosen as stop words depending on the purpose of your analysis. Stratifyd’s use of pointwise-mutual-information (PMI) limits the effects of typical would-be stop words since their PMI scores are very low. Nevertheless, you are given the option to upload a list of stop words to train the system for specific analyses.
Uploading a Stop Word List
- Click on the advanced tab in the main Stratifyd screen after log-in.
- Select stopwords
- Click on an existing list to edit, otherwise, click the + icon in the bottom right corner.
- Give your stop word list a title.
You can type a list of stop words in the prompt’s text box separated by commoas or you can upload a comma-separated file containing your desired stop words.
- Click save
- You can share your stop word lists with team members by clicking the blue sharing icon on the lexicon’s widget.
Stopwords also have version control, enabling you to test out which stopwords lists are working the best for your analyses.
How to Apply a Stop Word List
- To apply a stop word list to a dataset, open the data manager menu in the dashboard.
- Click Reprocess to bring up the data connector window.
- Select Advanced Options.
- Select the Stopwords option.
- Choose your stopword list and version
- Click apply
How to Tune Your Analysis
You can tune your analysis by adding stopwords directly from your dashboard widgets.
- Click the Tune Analysis icon in the data manager panel.
- Click on N-grams displayed in your widgets to add them as stopwords to the dataset.
We recommend using the bigram list in the Semantic Topics widget for tuning analyses. Simply go through the top terms in your list and strikeout any bigrams that appear to be junk or unhelpful in your analysis
- Reprocess the dashboard by clicking Submit when you are finished tuning datasets.