I hope you are enjoying the “Advanced Analytics Introduction” blog post series; here is a link to the previous segment (Step One) to provide some helpful background. In the previous installment, I provided an overview of the advanced analytics, data science and text analytics concepts. In this blog post, I review detailed definitions of text analytics and mining concepts to provide more context on this rapidly evolving market.
In his book “Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications”, John Elder, Ph.D., characterized the text analytics concept best when he stated the following:
- “Text Mining and Text Analytics are broad umbrella terms describing a range of technologies for analyzing and processing semi-structured and unstructured text data”
- “The unifying theme behind each of these technologies is the need to turn text into numbers so powerful algorithms can be applied to large document databases”
The diagrams below also come from the same publication by Dr. Elder. In this first diagram, the text mining field is separated into seven “practice areas.” These practice areas are further superimposed over a Venn diagram depicting the interrelationship between various academic disciplines.
The diagram below takes a closer look at the seven practice areas of text mining and superimposes specific text mining tasks. I will discuss text mining tasks that we see within the “Natural Language Processing” oval.
According to his book, “Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining”, Professor Cheng Xiang Zhai of the University of Illinois at Urbana-Champaign indicates that there are two main goals of text analytics:
- Turn text into high-quality information through converting text data into more concise information; enriching this content; and minimizing human effort in digesting information. Text mining activities we would perform at this level include:
- Identifying language
- Sentence boundary detection
- Part of speech tagging
- Phrase “chunking”
- Word stemming and lemmatizing
- Turn text analysis into actionable content by using trends and correlations discovered to make better decisions through word, topic and opinion association, mining and analysis
Xiang Zhai also points out that text mining is related to the field of “information retrieval” or “IR.” He mentions that IR is very useful for text mining because text retrieval can act as a “preprocessor” for text mining and can help turn “big text data” into a smaller, more relevant data population.
He also stipulates that text retrieval is needed for knowledge “provenance” (validity plus origin) because once we find patterns in text data, we generally must verify the content by looking at the original text data. Also, IR storage can be accomplished using a blend of traditional database, cloud storage, and document management techniques.
Additional blog posts on text and advanced analytics concepts to follow; please contact email@example.com if you have any questions or need further help!