Corpus analysis techniques are well established in the Humanities and are becoming more and more popular also in the Social Sciences, but despite constant evolution of these techniques, integrating quantitative and qualitative approaches to text remains an unfulfilled promise. The webinar Quanlify with Ease: Combining quantitative and qualitative corpus analysis, organised by SSHOC and CLARIN ERIC on April 16 2020 offered a solution to the challenges inherent in validating studies on large-scale corpora.
Speaker Andreas Blätte, professor of Public Policy and Regional Politics at the University of Duisburg-Essen and a member of CLARIN-D Working Group for Content Analysis in Social Sciences, discussed the theoretical necessity and practical implementation of “quanlification”, a methodological approach for which he coined the name, and which he first described in his masterclass at the CLARIN Annual Conference 2019. Professor Blätte also explained the reasons behind the development of the tool.
At first glance, it might not be clear why integration of quantitative and qualitative approaches is necessary. The central assumption is that the validity of research results obtained from large-scale corpora depends on researchers' ability to combine the quantitative and qualitative analysis of textual data.
Drawing from the ideas of distant reading and processing texts as data, Professor Blätte proposed that the findings of corpus analysis are based on both the text itself and its quantitative numerical representation. While this apparent juxtaposition suggests a methodological divide of qualitative and quantitative approaches and separate means of validation, the fact remains that every qualitative finding requires quantitative support in order not to be deemed simply anecdotal. Equally, every quantitative analysis relies on qualitative confirmation to avoid misleading interpretation of patterns.
As such, the necessity to combine qualitative and quantitative approaches to text is conceptually undisputed. Given that distant reading and close reading should be blended in order to validate the findings of the research, both perspectives should always be applied in tandem. This of course raises the practical question as to how to implement a research process which allows an easy interplay between quantitative and qualitative methods. Existing software solutions do not yet support this requirement. Tools are there, but setting up a truly quanlitative project remains expensive and difficult to implement.
Certain design decisions are a prerequisite for creating tools which facilitate a quanlitative research method. One such tool, the polmineR analysis environment, written in the statistical programming language R, enables results validation by displaying the full context. This allows for contextualisation of the findings via different modules (presented during the webinar). Such validation is applicable to results which either are based on numerical approaches (e.g. co-occurrences and topic models) or represent only part of the original text (e.g. concordances and subcorpora).
To gather the semantic sense of the returned numerical values, a variety of visualisation options are offered. Annotation of sentiment weights in a keyword-in-context representation, and visualisation of co-occurrences in a three-dimensional co-occurrences graph enriched with the underlying concordances are some examples.
The polmineR package also includes features to annotate any kind of table. These can be used to facilitate intersubjectivity and also to generate training data for machine learning approaches.
During the webinar, some early implementations of the approach which enables applying quanlitative methods to textual data at every point of the research project were showcased.
The vision of quanlification is to offer at minimal cost a flexible set of tools which enable the implementation of all kinds of workflows entailing distant and close reading methods. Therefore, the expansion of quanlitative capabilities to every element of the polmineR package in order to develop a comprehensive yet modular toolset for quanlification is a particularly important goal.
Further documentation and training materials linked to the polmineR package are currently being developed. There are also possibilities to extend the use of polmineR via other R packages such as htmlwidgets, as well as via flexdashboards and R Markdown language.
Despite the constant need for software upgrades, it is clear that the software solutions presented in the webinar already provide possibilities to make the task of quanlification very manageable.