Skip to main content

Content Analysis

Media

Part of Content Analysis

Title
Content Analysis
extracted text
Content Analysis
STEVEN E. STEMLER

Abstract
In the era of “big data,” the methodological technique of content analysis can be the
most powerful tool in the researcher’s kit. Content analysis is versatile enough to
apply to textual, visual, and audio data. Given the massive explosion in permanent,
archived linguistic, photographic, video, and audio data arising from the proliferation of technology, the technique of content analysis appears to be on the verge of a
renaissance. In this essay, I discuss cutting-edge examples of how content analysis
is being applied or might be applied to the study of areas as diverse as education,
criminology, and social intelligence.

INTRODUCTION
In the past 20 years, technology has profoundly changed the way people
communicate. The widespread proliferation of email, the web, digital photography, social media, YouTube, text messaging, and cellular phones has
yielded unprecedented amounts of permanent, archived data on individuals.
As a result, analysts have dubbed this the era of “big data.” Both private corporations and public governmental entities are actively attempting to mine
this data to discover patterns of individual and group behavior. However, in
order to fully leverage the power of big data, the appropriate methods for
data analysis must be used. Consequently, the methodological technique of
content analysis appears to be on the verge of a renaissance. Content analysis can be used with a wide variety of data sources, including textual data,
visual stimuli (e.g., photographs/videos), and audio data. In addition, the
technique is highly flexible in that it can be either empirically or theoretically
driven. In this essay, I discuss modern examples of content analysis studies that draw on each of the aforementioned sources of data and highlight
emerging trends in this area.

Emerging Trends in the Social and Behavioral Sciences. Edited by Robert Scott and Stephen Kosslyn.
© 2015 John Wiley & Sons, Inc. ISBN 978-1-118-90077-2.

1

2

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

CONTENT ANALYSIS OF TEXTUAL DATA
By far the most frequently used data source for content analysis is written text (Krippendorff, 2012). Perhaps one of the most prominent areas
where text-based content analysis is being used is within the realm of
automated essay scoring in education (Shermis & Burstein, 2013). The
various approaches to content analysis in this domain range in complexity
from simple keyword scoring, in which participants are given credit for
including certain keywords in their essay, to more advanced approaches
that use Bayesian probabilities to determine the likelihood that high-scoring
essays would use a particular set of words in a particular order (Landauer
& Dumais, 1997). However, what most of these programs have in common
is that they are empirically driven rather than theoretically driven.
EMPIRICALLY DRIVEN CONTENT ANALYSIS MODELS
Despite the fact that scholars have been experimenting with different
approaches to automatically analyzing the content of educational essays
for quite some time (Page, 1966; Shermis & Burstein, 2002), efforts to score
essays in a large-scale, high-stakes context have had limited success to
date. Indeed, the College Board, makers of the SAT, has just announced
that it will be rolling back the required writing section that was introduced
as part of the SAT in 2007. Their 7-year experiment in automated content
analysis of student essays was plagued by technical problems. For example,
Les Perelman of MIT conducted investigations exposing several of these
flaws (Weiss, 2014). He replicated the common finding in the literature that
length of essay tends to be positively correlated with essay score (Page,
1966, 1994), but he also found some idiosyncrasies associated with the ETS
automated scoring algorithm. For example, his research found that essays
using so-called fancy words, such as “myriad,” were rated more highly,
even if the words themselves had no relation to the content of the essay.
Furthermore, by using quotations, even when it had nothing to do with the
topic, students tended to increase their scores.
Interestingly, the automated content analysis of student essays need not
be so rudimentary. One of the most impressive approaches to automatically
content analyzing large bodies of text that I have encountered is Latent
Semantic Analysis (Landauer & Dumais, 1997; Landauer, Foltz, & Laham,
1998). This technique uses Bayesian analyses to determine the likelihood that
a quality essay would contain words in a particular context. The downside
of the technique is that the algorithm requires a large body of data on which
to be “trained.” That is to say, there needs to be a predetermined corpus
of high-quality as well as marginally acceptable answers with which to
train the program initially. Nevertheless, the technique shows considerable

Content Analysis

3

promise and represents a major advance over more simplistic scoring
techniques. Given the promise of these more advanced techniques, it is
somewhat surprising that ETS was so attached to their flawed e-Rater
program, which operates using a far more rudimentary algorithm (Burstein,
2003).
The ability to automatically and accurately content analyze large bodies
of textual responses will ultimately determine the success or failure of the
latest trend in higher education—massive open online courses (MOOCs).
Although MOOCs have many promising elements, not the least of which
is the capacity to provide instruction to hundreds of thousands of students
simultaneously, what will truly determine whether this technology is here
to stay or whether it becomes just another educational fad is whether the
content providers can effectively solve the problem of rapidly and automatically content analyzing textual responses to written prompts. It is worth
noting, however, that even an automated approach does not completely eliminate the need for human raters. Someone still has to make judgments about
the quality of responses in order to train any program on what patterns to
look for and this process is, in effect, a content analysis. Once that has been
accomplished, however, preliminary studies have demonstrated that various automated programs can be trained to very high consistency estimates
of interrater reliability with human raters (Shermis & Burstein, 2003).
EMERGENT CODING AND GROUNDED THEORY APPROACHES
TO ANALYSIS
A second approach to content analysis that is somewhere between a purely
empirically derived model and a purely theoretical one is a model known
as emergent coding. This approach is derived from the qualitative research
concept of grounded theory (Glaser & Strauss, 1967). Specifically, one may
approach an analysis without a particular theory in the first place, but then
use the data under investigation to develop a theory. This theory is then
applied to the subsequent data. One example of such approach comes from
my work with Damian Bebell in which we have analyzed the mission statements of a wide variety of schools (Stemler & Bebell, 2012; Stemler, 2012). To
briefly summarize our approach, which has been reported in greater detail
elsewhere (Stemler, 2001), we began the process by identifying a set of school
mission statements. Each of us then independently read and generated coding categories for each of the themes we encountered. We met to review the
themes, revised them, and then recoded the data until we reached a consensus. We created a coding rubric and recruited new, independent raters to code
a new set of mission statements according to our scheme. The independent
raters reached high levels of agreement, indicating that our coding scheme

4

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

was reliably detecting the derived coding categories. From this process, we
developed a theoretical model about the various purposes of schools and we
have subsequently analyzed thousands of school mission statements using
this framework.
Using content analysis, we are able to detect the extent to which changes
in educational policies or events in popular culture have impacted the
mission statements of schools. For example, in one study (Bebell & Stemler,
2004) we randomly sampled a set of high schools in Massachusetts before
the implementation of high-stakes graduation requirements and analyzed
their mission statements. We then did a follow-up analysis of the mission
statements of these same high schools 5 years later, after the implementation of high-stakes graduation requirements and found that schools that
changed their mission statement tended to make more references to the
cognitive purposes of schooling and had reduced or expelled their references to broader themes associated with physical development, citizenship,
and social-emotional development. In another study (Stemler, Bebell, &
Sonnabend, 2011), we found that the majority (62%) of a random sample
of high schools in Colorado stated that providing a safe environment for
children was one of their primary purposes. The presence of this theme
was far more pervasive in schools in Colorado than for schools in any of
the other nine states in our sample where it showed up in only 29% of all
school mission statements. This result was almost certainly influenced by
the Columbine school shootings. We expect that a comparison of school
mission statements throughout the state of Connecticut collected before
the December 2013 massacre at Sandy Hook elementary school would
show systematic differences compared to the mission statements of the
same schools collected after the incident. Specifically, we would predict
a statistically significant increase in the emphasis on safe environment in
these schools. Our approach to content analyzing school mission statements
allows for the quantitative evaluation of such a hypothesis.
THEORETICALLY DRIVEN CONTENT ANALYSIS MODELS
A third area where text-based content analysis methods have been widely
used is in the area of law enforcement. In the mid-2000s, the Federal Bureau
of Investigation (FBI) assembled a team of content analysts to evaluate
the authenticity of particular counterterrorism documents associated with
Al-Qaeda in order to (i) determine whether new documents that had
emerged were authored by either Osama bin Laden or the man in charge
of Al-Queda at the time, al-Zawahiri, and (ii) determine whether any theoretically driven content analysis models could successfully predict future
terrorist activity. The team included several content analysts, each with their

Content Analysis

5

own theoretically driven approach. The results were published as part of a
special issue of the journal Dynamics of Asymmetric Conflict in 2011 and were
also recently compiled into a book edited by Allison Smith (2013). Dechesne
(2013) provides an excellent review of the book in which he notes that the
various authors approach the analyses of the same corpus of text using
different theories. For example, Winter (2011) uses McClelland’s theory of
needs (power, achievement, and affiliation) as a lens by which to analyze
the data. By contrast, Pennebaker (2011) is focused not on the substance of
the content but rather on the grammatical style. Specifically, Pennebaker
used an algorithm he codeveloped called linguistic inquiry and word count
(LWIC) that can be used to determine the degree to which selected texts
use positive or negative emotion words, self-references, causal words, and
70 other dimensions. Other authors used other theories, sometimes relying
on the same computer program to analyze the same corpus of data using a
different theoretical framework. Dechesne notes that, “Across authors and
methods, terrorist rhetoric is found to be of lesser complexity, to come with
greater emphasis on affiliation, to stress issues of control and power, while
remarkably, violent and non-violent organizations do not differ in their
hostility against their adversaries, only in the methods they use to target
them.” (n.p.).
A subsequent series of studies by the FBI have also employed the content
analysis. In one study (Adams & Harpster, 2008), the speech patterns of a
sample of 911 homicide callers were systematically content analyzed. Specifically, the study used one hundred 911 homicide calls in which 50 of the
callers were adjudicated to have been innocent and 50 guilty. The results were
striking. Two-thirds of the innocent callers asked the dispatcher for help and
focused on getting help to the victim quickly, whereas only one-third of guilty
callers did. Nearly half of the callers included extraneous information in their
calls, but of those who did include extraneous information, 96% were guilty
of the offense and only 4% were innocent. Furthermore, the guilty callers
tended to request help for themselves rather than for their victims.
In a more recent study, Woodworth et al. (2012) content analyzed the linguistic patterns of known psychopaths and contrasted them with the linguistic patterns of individuals who were not classified as psychopaths. They
found that psychopaths tended to use more self-referent words, made more
references to basic needs (food, shelter), used the past tense more frequently,
and used a greater number of function words (e.g., “to” and “from”; “a” and
the’’).
Another hot area in which content analysis is being used, particularly with
regard to social media, is in attempting to link the content of Facebook status updates to dimensions of personality. One recent study used the results
of a content analysis of status updates as correlates of personality factors,

6

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

with the findings that narcissists tend to post more self-promotional content
and deeper self-disclosure information (Winter et al., 2014). Meanwhile, users
with high needs for affiliation tended to disclose more personal information
as well. A second study by Garcia and Sikstrom (in press) successfully used
Latent Semantic Analysis to link the content of status updates to the Dark
Triad of personality (i.e., psychopathy, narcissism, and Machiavellianism).
Each of these studies shows the potential power of linguistic content analysis
for both descriptive and predictive purposes.
FUTURE DIRECTIONS IN THE CONTENT ANALYSIS
OF TEXTUAL DATA
While the range of vocabulary used varies tremendously across individuals,
substantial work in the area of cognitive linguistic has demonstrated that
the language we use betrays more about us than we would like to believe.
The metaphors that we use to describe the world frame the way we think
and communicate (Lakoff & Johnson, 1980). Automated content analytic
programs could conceivably categorize people by the speech patterns associated with variables such as education level, geographic location, age, gender,
ethnicity, religious affiliation, cultural values, and so on. Furthermore, it
would not be impossible to conceive of an algorithm that uses latent class
analysis to identify categories of individuals based on the types of words
they use in different contexts. From there, algorithms may be developed that
examine whether changes in tone, verb tense, usage of adjectives, particular
metaphors predict particular behaviors. Developing algorithms to predict
the likelihood that an individual may commit an act of violence could be
immensely useful and represents one future direction for the field. The
major challenge is that such an approach requires a large number of data
points on which to validate the model. However, this is what the big data
movement is all about. Facebook posts and Twitter feeds are two readily
accessible, pervasive, and relatively permanent archives that are ripe for this
type of analysis.
A second interesting direction for text-based linguistic content analysis
comes from the world of artificial intelligence (AI). The AI community
is engaged in text-based content analysis as well in its efforts to create
realistic “bots.” There are annual competitions in which programmers
attempt to develop AI bots that can pass as human (the so-called Turing
Test). One of the best recent examples is “Evie” based on “Cleverbot”
(http://www.existor.com/ai-overview). Some of the more recent iterations
of these bots are programmed to adaptively learn correct answers to questions based on feedback from hundreds of thousands of Internet users.
The procedure is both quite simple and clever. The bot begins with a basic

Content Analysis

7

repertoire of inquiries built into the program (e.g., “How are you”). On the
basis of the responses received from an actual individual every time the bot
asks this question, the bot adaptively catalogs the most and least frequent
responses and associates a probability of then invoking such a response for
itself the next time a new user asks the bot the same question. Thus, the bot
learns the appropriate way to respond to each question by reflecting the
response it has received. From there, the algorithm develops a likelihood of
what is a good/correct answer.
One interesting implication of this work is that one could conceive of personalized bots (e.g., Apple iPhone’s Siri) developed for individuals using
this same learning algorithm. Each bot could invoke an automated content
analytic program that can detect deviations from “normal” speech patterns
of its primary user with regard to the use of emotionally charged words,
sentence/grammatical construction, length of entry, and so on and could conceivably begin to identify emotion within the individual user. Related to this,
AI bots are currently being used within the world of online therapy. Content
analytic algorithms could use information provided by the user to generate
AI intervention that would guide the user to a different set of emotions (e.g.,
emotional intelligence via AI). For example, if the user/client reports feeling
nervous about an impending visit to the new girlfriend’s house to meet her
parents, the bot could generate certain sets of questions that guide the individual into a less anxious state. What is more, the bot would be able to use
content analysis to determine whether the bot’s interventions were working as intended (e.g., making the client more relaxed). The possibilities for
text-based content analysis are staggering. I expect that the era of big data
will yield rapid advances in the analysis of linguistic data.
CONTENT ANALYSIS OF VISUAL DATA
As exciting as the future looks for text-based content analyses, I believe that
the future promise of the technique lies with visually based data. Indeed, it
is in that context that we may truly see the power of the methodology.
Some very interesting work in this area comes from Sarah Carney’s (2013)
content analysis of cartoon depictions of criminals. She and her team have
analyzed thousands of episodes of children’s cartoons across several decades
and have found some fascinating results. For example, they find that criminals are typically depicted as being incapable of change. Once someone is
bad, the person is bad forever. Efforts at redemption and change tend to be
presented in a comedic light rather than as a real possibility for the character. Carney has argued that such framing has important implications for
future voters’ attitudes toward topics such as parole. From a visual perspective, she and her team have found that the physical nature of criminals have

8

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

remained relatively static over time. Criminals are typically male, they are
large, have exaggerated facial features (eyebrows, chins, facial hair, noses,
scars, and/or bodies that are oddly shaped/disproportionate). They tend to
speak with foreign accents. And perhaps most disturbingly, her work has
found that scientists are typically portrayed as villains. The trope of the mad
scientist is regularly invoked and the messaging is that science and scientists
are not trustworthy. Such visual analyses may present some clues as to how
and when stereotypes are formed.
Within the field of personality theory, recent research has focused on content analyzing the presentations of self on social media, such as Facebook. A
common critique of the literature in personality is that the typical self-report
personality questionnaire is highly susceptible to faking. Thus, over time,
alternative indicators of personality have been sought in an effort to circumvent this problem. Historically speaking, concerns about faking led to the
development of projective personality measures such as the Rorschach and
the Thematic Apperception Test; however, the scoring of those instruments
has been criticized on psychometric grounds. Recently, however, a new trend
has emerged that involves content analyzing the data posted to social media
websites. In particular, this data can take the form of text, but also of visual
information such as pictures and videos. Although most research associated
with social media has focused on the content analysis of linguistic content
of Facebook status updates and Twitter feeds (Chew & Eysenbach, 2010),
researchers could conceivably content analyze the pictures that individuals
post onto their site to examine features that correlated highly with traditional
measures of personality or other characteristics. Thus, visual content analysis of new media, such as digital photography, YouTube videos, and the
arrangement of personal websites, has the potential to advance our theoretical understanding and empirical assessment of personality in a way that
overcomes some of the limitations of typical self-report indicators. Interestingly, most personality studies that draw on social network data are still
focusing on text-based linguistic analyses rather than capitalizing on the rich
set of visual stimuli available for analysis. A shift in focus from linguistic to
visually based content analysis seems to be one potential emerging trend.
FUTURE DIRECTIONS IN THE CONTENT ANALYSIS OF VISUAL DATA
In the current era, survey methodology is a ubiquitous approach to studying
human subjects. Surveys are given to assess personalities, intelligence, happiness, learning styles, and so on. It is entirely conceivable, however, that rather
than filling out a questionnaire on a dating website, for example, it may be
possible to instead post a set of photographs that one thinks best represents
one’s personality. In that way, an algorithm could be used to detect the subtle

Content Analysis

9

features that a person may not even be aware of. A person may choose to submit a photograph (or a set of photographs) that show them interacting as part
of a group, alone in portrait mode, or as part of some activity. Such a choice
would reveal something about personality in and of itself. Then, within these
categories, one may be able to match up certain qualities (e.g., personal interactions). One can imagine a program element that detects and estimates ages
of each person involved, gender of those involved, and so on. Then, on the
basis of such information, the algorithm attempts to match the person with
another who has a similar profile. Ideally, one could submit multiple pictures
(e.g., via Facebook) and the program could detect what types of pictures are
usually submitted (e.g., drinking, hiking, posing with grandparents, attending a child’s birthday party, etc.) and make a match to someone who posts
pictures with similar qualities.
Another promising direction for visually based content analysis is that
the technique may become useful in reviving the search for social intelligence. Social intelligence fell out of favor as a field of study in the late 1990s
(Kihlstrom & Cantor, 2000), mainly due to technical constraints. The pioneering theories of J.P. Guilford and his group (O’Sullivan, Guilford, & deMille,
1965) in this area are truly outstanding. I believe that the tremendous access
to new media, particularly digital photographs and videos, will prove
extremely fruitful in the advancement of theories of social intelligence and
cultural competence. At the most basic level, updates can be made to prior
research (Archer, 1980). With video data comes the opportunity to analyze
interpersonal interactions. This can take the form of eye contact, social
distance, and so on to discern patterns of behaviors that people currently do
not even see. Nearly everyone has a camera built into their phone these days
and the proliferation of videos posted to social media sites is astounding.
Content analyses of this footage could contribute a tremendous amount to
our understanding of the dynamics of interpersonal interactions that occur
in a native context (i.e., outside of the research laboratory).
CONTENT ANALYSIS OF AUDIO DATA
A third medium that can be content analyzed is audio data. Perhaps one of
the most interesting examples of this in recent times is the musical application, Pandora. The concept behind the app is that there is an algorithm that
attempts to match a user’s musical preferences by learning what the user
“likes” and “does not like.” The result is an adaptive algorithm that is suited
to the user’s particular taste. Behind the scenes, the musical quality of each
song is what is subject to the content analysis. Each song in the database
must first be categorized and rated according to the content analytic coding
rubric. This rubric presumably classifies songs according to timing, melody,

10

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

harmony, genre (e.g., acoustic, big band), and so on. Once the scoring rubric
is developed, it is a fairly easy matter to write an algorithm to detect a match.
However, the success of the algorithm, from a user perspective, depends on
the extent to which the relevant dimensions associated with musical preference are correctly categorized and coded. Just as textual data can be coded
for a variety of different elements (e.g., content, grammar) so too can audio
information. My opinion is that Pandora is a reasonably good starting point;
however, I believe the algorithm needs refinement and/or that a competitor could easily come up with a different content analytic rubric that would
show superior market value.
Another recent study that is taking audio data as a source and subjecting it
to content analysis comes from an undergraduate thesis at Wesleyan that I am
reading, which aims to examine the speech patterns of criminals as portrayed
by the media. The concept behind this project is that it attempts to content
analyze the pitch, tone, cadence, and so on of speech patterns of individuals
identified as “criminals” within the context of popular media and compare
their patterns to the speech patterns of “heroes.”
Similar types of audio analyses could easily be conducted for presidential
speeches. While there has been past research analyzing the linguistic content
of presidential state of the union addresses (e.g., Lim, 2008), none of these
analyses that I have encountered have systematically analyzed the particular speech patterns of the speakers with regard to the wide variety of audio
information on which they could be classified. I see this as another emerging
area for the field.
EMERGING TRENDS AND FUTURE DIRECTIONS
IN CONTENT ANALYSIS
Analysts in the era of big data can make tremendous advances to our theoretical understanding of a vast array of topics by embracing the techniques of
content analysis. There are myriad research questions on a dizzying array of
topics that can be investigated using this technique. For example, do transformational leaders speak differently, use different audio patterns, use different visual patterns than do mediocre/nontransformational leaders? The
relevant units of analysis may be technical data (e.g., grammatical usage of
active/passive voice) or they could be more substantive (e.g., appeal to affiliation, power, achievement) or any of a variety of theoretical possibilities (e.g.,
appeal to different levels of moral development). As a second example, in the
United States, we seem locked in perpetual conversations surrounding gun
violence prevention and counterterrorism. The methodology of content analysis has a large role to play in advancing our understanding of how to prevent

Content Analysis

11

violence. Thanks to the proliferation of big data, algorithms that content analyze textual, visual, and audio data may ultimately be able to help predict,
and therefore prevent, such incidences in the future.
However, with this nearly unlimited source of newfound data comes a second important element: the need for a guiding theory. In order to find some
relationships, we typically have to have some idea of what it is we are looking for, and these ideas generally come from strong theories. The versatility
of the method of content analysis to handle textual, visual, and auditory data
makes it extremely powerful. The technique can use theory to analyze data,
but it can also use data to help generate theory. The flexibility of the technique, coupled with the massive amounts of newly generated archival data
resulting from advanced technology suggest that we will soon be reading
many more studies that rely on content analysis.
REFERENCES
Adams, S. H., & Harpster, T. (2008). 911 Homicide calls and statement analysis: Is the
caller the killer? FBI Law Enforcement Bulletin, 77(6), 22–31.
Archer, D. (1980). How to expand your Social Intelligence Quotient. New York, NY: M.
Evans and Company, Inc.
Bebell, D. & Stemler, S.E. (April, 2004). Reassessing the objectives of educational accountability in Massachusetts: The mismatch between Massachusetts and the MCAS. Paper
presented at the annual meeting of the American Educational Research Association: San Diego, CA.
Burstein, J. C. (2003). The E-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay
scoring: A cross-disciplinary perspective (pp. 113–122). Mahwah, NJ: Lawrence Erlbaum Associates.
Carney, S. (2013). Representing crime and criminals in children’s television. Invited talk
presented to the faculty of Psychology at Connecticut College.
Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: Content analysis
of Tweets during the 2009 H1N1 outbreak. PLoS One, 5(11), 1–13.
Dechesne, M. (2013). Review of A.G. Smith (ed.). The relationship between rhetoric
and terrorist violence. Perspectives on Terrorism, 7(5). Retrieved from http://www.
terrorismanalysts.com/pt/index.php/pot/article/view/298/html.
Garcia, D., & Sikstrom, S. (in press). The dark side of Facebook: Semantic representations of status updates predict the Dark Triad of personality. Personality and
Individual Differences, doi:10.1016/j.paid.2013.10.001
Glaser, B., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative
research. Chicago, IL: Aldine.
Kihlstrom, J. F., & Cantor, N. (2000). Social intelligence. In R. J. Sternberg (Ed.), Handbook of intelligence (2nd ed., pp. 359–379). Cambridge, England: Cambridge University Press.

12

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

Krippendorff, K. (2012). Content Analysis: An introduction to its methodology (3rd ed.).
Thousand Oaks, CA: Sage.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of
Chicago Press.
Landauer, T. K., & Dumais, S. (1997). A solution to Plato’s problem: The Latent
Semantic Analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to Latent
Semantic Analysis. Discourse Processes, 25, 259–284. Retrieved from http://lsa.
colorado.edu/papers/dp1.LSAintro.pdf.
Lim, E. (2008). The anti-intellectual presidency. New York, NY: Oxford University Press.
O’Sullivan, M., Guilford, J.P., & deMille, R. (1965). The measurement of social intelligence. University of Southern California Psychological Laboratory: Reports from
the Laboratory.
Page, E. B. (1966). The imminence of grading essays by computer. Phi Delta Kappan,
48, 238–243.
Page, E. B. (1994). Computer grading of student prose, using modern concepts, and
software. Journal of Experimental Education, 62, 127–142.
Pennebaker, J. W. (2011). Using computer analyses to identify language style and
aggressive intent: The secret life of function words. Dynamics of Asymmetric Conflict, 4(2), 92–102.
Shermis, M. D., & Burstein, J. (Eds.) (2002). Automated Essay Scoring. New York, NY:
Taylor and Francis.
Shermis, M. D., & Burstein, J. (Eds.) (2013). Handbook of automated essay evaluation:
Current applications and new directions. New York, NY: Taylor and Francis.
Smith, A. G. (Ed.) (2013). The relationship between rhetoric and terrorist violence. London,
England: Routledge.
Stemler, S. E. (2001). An overview of content analysis. Practical Assessment, Research,
and Evaluation, 7(17). Retrieved from http://pareonline.net/getvn.asp?v=7&
n=17.
Stemler, S. E. (2012). What should university admissions test predict? Educational
Psychologist, 47(1), 5–17.
Stemler, S. E., & Bebell, D. (2012). The school mission statement: Values, goals, and identities in American Education. New York, NY: Taylor and Francis.
Stemler, S. E., Bebell, D., & Sonnabend, L. (2011). Using school mission statements
for reflection and research. Educational Administration Quarterly, 47(2), 383–420.
Weiss, J. (March 14, 2014). The man who killed the SAT. The Boston Globe.
Accessed on October 14, 2014 via: http://www.ohio.com/editorial/joanna-weissa-nerd-killed-the-sat-essay-1.473561.
Winter, D. G. (2011). Scoring motive imagery in documents from four Middle East
opposition groups. Dynamics of Asymmetric Conflict, 4(2), 144–154.
Winter, S., Neubaum, G., Eimler, S.C., Gordon, V., Theil, J., Herrmann, J., … Kramer,
N.C. (2014). Another brick in the Facebook wall: How personality traits relate to
the content of status updates. Computers in Human Behavior, 34, 194–202.

Content Analysis

13

Woodworth, M., Hancock, J., Porter, S., Hare, R., Logan, M., O’Toole, M., &
Smith, S. (July, 2012). The language of psychopaths. FBI Law Enforcement
Bulletin. Retrieved from http://www.fbi.gov/stats-services/publications/lawenforcement-bulletin/july-2012/the-language-of-psychopaths.

FURTHER READING
Haney, W., Russell, M., Gulek, C., & Fierros, E. (1998). Drawing on education: Using
student drawings to promote middle school improvement. Schools in the Middle,
7(3), 38–43.
Holsti, O. R. (1969). Content analysis for the social sciences and humanities. Reading, MA:
Addison-Wesley.
Krippendorff, K., & Bock, M. (2008). The content analysis reader. Thousand Oaks, CA:
Sage.
Lakoff, G. (1996). Moral politics: What conservatives know that liberals don’t. Chicago, IL:
University of Chicago Press.
Lakoff, G. (2004). Don’t think of an elephant! Know your values and frame the debate. White
River Junction, England: Chelsea Green Publishing.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation
of essays with the Intelligent Essay Assessor. In M. D. Shermis & J. C. Burstein
(Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Mahwah, NJ: Lawrence Erlbaum Associates.
Pennebaker, J. W., & Chung, C. K. (2008). Computerized text analysis of Al-Qaeda
transcripts. In K. Krippendorff (Ed.), The content analysis reader (pp. 453–466).
Thousand Oaks, CA: Sage.
Weber, R. P. (1990). Basic content analysis (2nd ed). Newbury Park, CA: Sage.
Winter, D. (2005). Things I’ve learned about personality from studying political leaders at a distance. Journal of Personality, 73(3), 557–584.

STEVEN E. STEMLER SHORT BIOGRAPHY
Steven E. Stemler is an Associate Professor of Psychology at Wesleyan
University. He received his doctorate in Educational Research, Measurement, and Evaluation from Boston College, where he worked at the
Center for the Study of Testing, Evaluation, and Educational Policy and
the TIMSS International Study Center. Before joining the faculty at Wesleyan, Steve spent 4 years at Yale University where he was an Associate
Research Scientist in the Department of Psychology. His area of expertise
is in the assessment of noncognitive factors, with a special emphasis on
the domains of social intelligence, creativity, intercultural literacy, and
ethical reasoning (see: http://sstemler.web.wesleyan.edu/stemlerlab and
http://www.purposeofschool.com

14

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

RELATED ESSAYS
To Flop Is Human: Inventing Better Scientific Approaches to Anticipating
Failure (Methods), Robert Boruch and Alan Ruby
Ambulatory Assessment: Methods for Studying Everyday Life (Methods),
Tamlin S. Conner and Matthias R. Mehl
Models of Nonlinear Growth (Methods), Patrick Coulombe and James P.
Selig
Quantile Regression Methods (Methods), Bernd Fitzenberger and Ralf
Andreas Wilke
Ethnography in the Digital Age (Methods), Alan Howard and Alexander
Mawyer
Participant Observation (Methods), Danny Jorgensen
Structural Equation Modeling and Latent Variable Approaches (Methods),
Alex Liu
Data Mining (Methods), Gregg R. Murray and Anthony Scime
Digital Methods for Web Research (Methods), Richard Rogers

Content Analysis
STEVEN E. STEMLER

Abstract
In the era of “big data,” the methodological technique of content analysis can be the
most powerful tool in the researcher’s kit. Content analysis is versatile enough to
apply to textual, visual, and audio data. Given the massive explosion in permanent,
archived linguistic, photographic, video, and audio data arising from the proliferation of technology, the technique of content analysis appears to be on the verge of a
renaissance. In this essay, I discuss cutting-edge examples of how content analysis
is being applied or might be applied to the study of areas as diverse as education,
criminology, and social intelligence.

INTRODUCTION
In the past 20 years, technology has profoundly changed the way people
communicate. The widespread proliferation of email, the web, digital photography, social media, YouTube, text messaging, and cellular phones has
yielded unprecedented amounts of permanent, archived data on individuals.
As a result, analysts have dubbed this the era of “big data.” Both private corporations and public governmental entities are actively attempting to mine
this data to discover patterns of individual and group behavior. However, in
order to fully leverage the power of big data, the appropriate methods for
data analysis must be used. Consequently, the methodological technique of
content analysis appears to be on the verge of a renaissance. Content analysis can be used with a wide variety of data sources, including textual data,
visual stimuli (e.g., photographs/videos), and audio data. In addition, the
technique is highly flexible in that it can be either empirically or theoretically
driven. In this essay, I discuss modern examples of content analysis studies that draw on each of the aforementioned sources of data and highlight
emerging trends in this area.

Emerging Trends in the Social and Behavioral Sciences. Edited by Robert Scott and Stephen Kosslyn.
© 2015 John Wiley & Sons, Inc. ISBN 978-1-118-90077-2.

1

2

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

CONTENT ANALYSIS OF TEXTUAL DATA
By far the most frequently used data source for content analysis is written text (Krippendorff, 2012). Perhaps one of the most prominent areas
where text-based content analysis is being used is within the realm of
automated essay scoring in education (Shermis & Burstein, 2013). The
various approaches to content analysis in this domain range in complexity
from simple keyword scoring, in which participants are given credit for
including certain keywords in their essay, to more advanced approaches
that use Bayesian probabilities to determine the likelihood that high-scoring
essays would use a particular set of words in a particular order (Landauer
& Dumais, 1997). However, what most of these programs have in common
is that they are empirically driven rather than theoretically driven.
EMPIRICALLY DRIVEN CONTENT ANALYSIS MODELS
Despite the fact that scholars have been experimenting with different
approaches to automatically analyzing the content of educational essays
for quite some time (Page, 1966; Shermis & Burstein, 2002), efforts to score
essays in a large-scale, high-stakes context have had limited success to
date. Indeed, the College Board, makers of the SAT, has just announced
that it will be rolling back the required writing section that was introduced
as part of the SAT in 2007. Their 7-year experiment in automated content
analysis of student essays was plagued by technical problems. For example,
Les Perelman of MIT conducted investigations exposing several of these
flaws (Weiss, 2014). He replicated the common finding in the literature that
length of essay tends to be positively correlated with essay score (Page,
1966, 1994), but he also found some idiosyncrasies associated with the ETS
automated scoring algorithm. For example, his research found that essays
using so-called fancy words, such as “myriad,” were rated more highly,
even if the words themselves had no relation to the content of the essay.
Furthermore, by using quotations, even when it had nothing to do with the
topic, students tended to increase their scores.
Interestingly, the automated content analysis of student essays need not
be so rudimentary. One of the most impressive approaches to automatically
content analyzing large bodies of text that I have encountered is Latent
Semantic Analysis (Landauer & Dumais, 1997; Landauer, Foltz, & Laham,
1998). This technique uses Bayesian analyses to determine the likelihood that
a quality essay would contain words in a particular context. The downside
of the technique is that the algorithm requires a large body of data on which
to be “trained.” That is to say, there needs to be a predetermined corpus
of high-quality as well as marginally acceptable answers with which to
train the program initially. Nevertheless, the technique shows considerable

Content Analysis

3

promise and represents a major advance over more simplistic scoring
techniques. Given the promise of these more advanced techniques, it is
somewhat surprising that ETS was so attached to their flawed e-Rater
program, which operates using a far more rudimentary algorithm (Burstein,
2003).
The ability to automatically and accurately content analyze large bodies
of textual responses will ultimately determine the success or failure of the
latest trend in higher education—massive open online courses (MOOCs).
Although MOOCs have many promising elements, not the least of which
is the capacity to provide instruction to hundreds of thousands of students
simultaneously, what will truly determine whether this technology is here
to stay or whether it becomes just another educational fad is whether the
content providers can effectively solve the problem of rapidly and automatically content analyzing textual responses to written prompts. It is worth
noting, however, that even an automated approach does not completely eliminate the need for human raters. Someone still has to make judgments about
the quality of responses in order to train any program on what patterns to
look for and this process is, in effect, a content analysis. Once that has been
accomplished, however, preliminary studies have demonstrated that various automated programs can be trained to very high consistency estimates
of interrater reliability with human raters (Shermis & Burstein, 2003).
EMERGENT CODING AND GROUNDED THEORY APPROACHES
TO ANALYSIS
A second approach to content analysis that is somewhere between a purely
empirically derived model and a purely theoretical one is a model known
as emergent coding. This approach is derived from the qualitative research
concept of grounded theory (Glaser & Strauss, 1967). Specifically, one may
approach an analysis without a particular theory in the first place, but then
use the data under investigation to develop a theory. This theory is then
applied to the subsequent data. One example of such approach comes from
my work with Damian Bebell in which we have analyzed the mission statements of a wide variety of schools (Stemler & Bebell, 2012; Stemler, 2012). To
briefly summarize our approach, which has been reported in greater detail
elsewhere (Stemler, 2001), we began the process by identifying a set of school
mission statements. Each of us then independently read and generated coding categories for each of the themes we encountered. We met to review the
themes, revised them, and then recoded the data until we reached a consensus. We created a coding rubric and recruited new, independent raters to code
a new set of mission statements according to our scheme. The independent
raters reached high levels of agreement, indicating that our coding scheme

4

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

was reliably detecting the derived coding categories. From this process, we
developed a theoretical model about the various purposes of schools and we
have subsequently analyzed thousands of school mission statements using
this framework.
Using content analysis, we are able to detect the extent to which changes
in educational policies or events in popular culture have impacted the
mission statements of schools. For example, in one study (Bebell & Stemler,
2004) we randomly sampled a set of high schools in Massachusetts before
the implementation of high-stakes graduation requirements and analyzed
their mission statements. We then did a follow-up analysis of the mission
statements of these same high schools 5 years later, after the implementation of high-stakes graduation requirements and found that schools that
changed their mission statement tended to make more references to the
cognitive purposes of schooling and had reduced or expelled their references to broader themes associated with physical development, citizenship,
and social-emotional development. In another study (Stemler, Bebell, &
Sonnabend, 2011), we found that the majority (62%) of a random sample
of high schools in Colorado stated that providing a safe environment for
children was one of their primary purposes. The presence of this theme
was far more pervasive in schools in Colorado than for schools in any of
the other nine states in our sample where it showed up in only 29% of all
school mission statements. This result was almost certainly influenced by
the Columbine school shootings. We expect that a comparison of school
mission statements throughout the state of Connecticut collected before
the December 2013 massacre at Sandy Hook elementary school would
show systematic differences compared to the mission statements of the
same schools collected after the incident. Specifically, we would predict
a statistically significant increase in the emphasis on safe environment in
these schools. Our approach to content analyzing school mission statements
allows for the quantitative evaluation of such a hypothesis.
THEORETICALLY DRIVEN CONTENT ANALYSIS MODELS
A third area where text-based content analysis methods have been widely
used is in the area of law enforcement. In the mid-2000s, the Federal Bureau
of Investigation (FBI) assembled a team of content analysts to evaluate
the authenticity of particular counterterrorism documents associated with
Al-Qaeda in order to (i) determine whether new documents that had
emerged were authored by either Osama bin Laden or the man in charge
of Al-Queda at the time, al-Zawahiri, and (ii) determine whether any theoretically driven content analysis models could successfully predict future
terrorist activity. The team included several content analysts, each with their

Content Analysis

5

own theoretically driven approach. The results were published as part of a
special issue of the journal Dynamics of Asymmetric Conflict in 2011 and were
also recently compiled into a book edited by Allison Smith (2013). Dechesne
(2013) provides an excellent review of the book in which he notes that the
various authors approach the analyses of the same corpus of text using
different theories. For example, Winter (2011) uses McClelland’s theory of
needs (power, achievement, and affiliation) as a lens by which to analyze
the data. By contrast, Pennebaker (2011) is focused not on the substance of
the content but rather on the grammatical style. Specifically, Pennebaker
used an algorithm he codeveloped called linguistic inquiry and word count
(LWIC) that can be used to determine the degree to which selected texts
use positive or negative emotion words, self-references, causal words, and
70 other dimensions. Other authors used other theories, sometimes relying
on the same computer program to analyze the same corpus of data using a
different theoretical framework. Dechesne notes that, “Across authors and
methods, terrorist rhetoric is found to be of lesser complexity, to come with
greater emphasis on affiliation, to stress issues of control and power, while
remarkably, violent and non-violent organizations do not differ in their
hostility against their adversaries, only in the methods they use to target
them.” (n.p.).
A subsequent series of studies by the FBI have also employed the content
analysis. In one study (Adams & Harpster, 2008), the speech patterns of a
sample of 911 homicide callers were systematically content analyzed. Specifically, the study used one hundred 911 homicide calls in which 50 of the
callers were adjudicated to have been innocent and 50 guilty. The results were
striking. Two-thirds of the innocent callers asked the dispatcher for help and
focused on getting help to the victim quickly, whereas only one-third of guilty
callers did. Nearly half of the callers included extraneous information in their
calls, but of those who did include extraneous information, 96% were guilty
of the offense and only 4% were innocent. Furthermore, the guilty callers
tended to request help for themselves rather than for their victims.
In a more recent study, Woodworth et al. (2012) content analyzed the linguistic patterns of known psychopaths and contrasted them with the linguistic patterns of individuals who were not classified as psychopaths. They
found that psychopaths tended to use more self-referent words, made more
references to basic needs (food, shelter), used the past tense more frequently,
and used a greater number of function words (e.g., “to” and “from”; “a” and
the’’).
Another hot area in which content analysis is being used, particularly with
regard to social media, is in attempting to link the content of Facebook status updates to dimensions of personality. One recent study used the results
of a content analysis of status updates as correlates of personality factors,

6

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

with the findings that narcissists tend to post more self-promotional content
and deeper self-disclosure information (Winter et al., 2014). Meanwhile, users
with high needs for affiliation tended to disclose more personal information
as well. A second study by Garcia and Sikstrom (in press) successfully used
Latent Semantic Analysis to link the content of status updates to the Dark
Triad of personality (i.e., psychopathy, narcissism, and Machiavellianism).
Each of these studies shows the potential power of linguistic content analysis
for both descriptive and predictive purposes.
FUTURE DIRECTIONS IN THE CONTENT ANALYSIS
OF TEXTUAL DATA
While the range of vocabulary used varies tremendously across individuals,
substantial work in the area of cognitive linguistic has demonstrated that
the language we use betrays more about us than we would like to believe.
The metaphors that we use to describe the world frame the way we think
and communicate (Lakoff & Johnson, 1980). Automated content analytic
programs could conceivably categorize people by the speech patterns associated with variables such as education level, geographic location, age, gender,
ethnicity, religious affiliation, cultural values, and so on. Furthermore, it
would not be impossible to conceive of an algorithm that uses latent class
analysis to identify categories of individuals based on the types of words
they use in different contexts. From there, algorithms may be developed that
examine whether changes in tone, verb tense, usage of adjectives, particular
metaphors predict particular behaviors. Developing algorithms to predict
the likelihood that an individual may commit an act of violence could be
immensely useful and represents one future direction for the field. The
major challenge is that such an approach requires a large number of data
points on which to validate the model. However, this is what the big data
movement is all about. Facebook posts and Twitter feeds are two readily
accessible, pervasive, and relatively permanent archives that are ripe for this
type of analysis.
A second interesting direction for text-based linguistic content analysis
comes from the world of artificial intelligence (AI). The AI community
is engaged in text-based content analysis as well in its efforts to create
realistic “bots.” There are annual competitions in which programmers
attempt to develop AI bots that can pass as human (the so-called Turing
Test). One of the best recent examples is “Evie” based on “Cleverbot”
(http://www.existor.com/ai-overview). Some of the more recent iterations
of these bots are programmed to adaptively learn correct answers to questions based on feedback from hundreds of thousands of Internet users.
The procedure is both quite simple and clever. The bot begins with a basic

Content Analysis

7

repertoire of inquiries built into the program (e.g., “How are you”). On the
basis of the responses received from an actual individual every time the bot
asks this question, the bot adaptively catalogs the most and least frequent
responses and associates a probability of then invoking such a response for
itself the next time a new user asks the bot the same question. Thus, the bot
learns the appropriate way to respond to each question by reflecting the
response it has received. From there, the algorithm develops a likelihood of
what is a good/correct answer.
One interesting implication of this work is that one could conceive of personalized bots (e.g., Apple iPhone’s Siri) developed for individuals using
this same learning algorithm. Each bot could invoke an automated content
analytic program that can detect deviations from “normal” speech patterns
of its primary user with regard to the use of emotionally charged words,
sentence/grammatical construction, length of entry, and so on and could conceivably begin to identify emotion within the individual user. Related to this,
AI bots are currently being used within the world of online therapy. Content
analytic algorithms could use information provided by the user to generate
AI intervention that would guide the user to a different set of emotions (e.g.,
emotional intelligence via AI). For example, if the user/client reports feeling
nervous about an impending visit to the new girlfriend’s house to meet her
parents, the bot could generate certain sets of questions that guide the individual into a less anxious state. What is more, the bot would be able to use
content analysis to determine whether the bot’s interventions were working as intended (e.g., making the client more relaxed). The possibilities for
text-based content analysis are staggering. I expect that the era of big data
will yield rapid advances in the analysis of linguistic data.
CONTENT ANALYSIS OF VISUAL DATA
As exciting as the future looks for text-based content analyses, I believe that
the future promise of the technique lies with visually based data. Indeed, it
is in that context that we may truly see the power of the methodology.
Some very interesting work in this area comes from Sarah Carney’s (2013)
content analysis of cartoon depictions of criminals. She and her team have
analyzed thousands of episodes of children’s cartoons across several decades
and have found some fascinating results. For example, they find that criminals are typically depicted as being incapable of change. Once someone is
bad, the person is bad forever. Efforts at redemption and change tend to be
presented in a comedic light rather than as a real possibility for the character. Carney has argued that such framing has important implications for
future voters’ attitudes toward topics such as parole. From a visual perspective, she and her team have found that the physical nature of criminals have

8

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

remained relatively static over time. Criminals are typically male, they are
large, have exaggerated facial features (eyebrows, chins, facial hair, noses,
scars, and/or bodies that are oddly shaped/disproportionate). They tend to
speak with foreign accents. And perhaps most disturbingly, her work has
found that scientists are typically portrayed as villains. The trope of the mad
scientist is regularly invoked and the messaging is that science and scientists
are not trustworthy. Such visual analyses may present some clues as to how
and when stereotypes are formed.
Within the field of personality theory, recent research has focused on content analyzing the presentations of self on social media, such as Facebook. A
common critique of the literature in personality is that the typical self-report
personality questionnaire is highly susceptible to faking. Thus, over time,
alternative indicators of personality have been sought in an effort to circumvent this problem. Historically speaking, concerns about faking led to the
development of projective personality measures such as the Rorschach and
the Thematic Apperception Test; however, the scoring of those instruments
has been criticized on psychometric grounds. Recently, however, a new trend
has emerged that involves content analyzing the data posted to social media
websites. In particular, this data can take the form of text, but also of visual
information such as pictures and videos. Although most research associated
with social media has focused on the content analysis of linguistic content
of Facebook status updates and Twitter feeds (Chew & Eysenbach, 2010),
researchers could conceivably content analyze the pictures that individuals
post onto their site to examine features that correlated highly with traditional
measures of personality or other characteristics. Thus, visual content analysis of new media, such as digital photography, YouTube videos, and the
arrangement of personal websites, has the potential to advance our theoretical understanding and empirical assessment of personality in a way that
overcomes some of the limitations of typical self-report indicators. Interestingly, most personality studies that draw on social network data are still
focusing on text-based linguistic analyses rather than capitalizing on the rich
set of visual stimuli available for analysis. A shift in focus from linguistic to
visually based content analysis seems to be one potential emerging trend.
FUTURE DIRECTIONS IN THE CONTENT ANALYSIS OF VISUAL DATA
In the current era, survey methodology is a ubiquitous approach to studying
human subjects. Surveys are given to assess personalities, intelligence, happiness, learning styles, and so on. It is entirely conceivable, however, that rather
than filling out a questionnaire on a dating website, for example, it may be
possible to instead post a set of photographs that one thinks best represents
one’s personality. In that way, an algorithm could be used to detect the subtle

Content Analysis

9

features that a person may not even be aware of. A person may choose to submit a photograph (or a set of photographs) that show them interacting as part
of a group, alone in portrait mode, or as part of some activity. Such a choice
would reveal something about personality in and of itself. Then, within these
categories, one may be able to match up certain qualities (e.g., personal interactions). One can imagine a program element that detects and estimates ages
of each person involved, gender of those involved, and so on. Then, on the
basis of such information, the algorithm attempts to match the person with
another who has a similar profile. Ideally, one could submit multiple pictures
(e.g., via Facebook) and the program could detect what types of pictures are
usually submitted (e.g., drinking, hiking, posing with grandparents, attending a child’s birthday party, etc.) and make a match to someone who posts
pictures with similar qualities.
Another promising direction for visually based content analysis is that
the technique may become useful in reviving the search for social intelligence. Social intelligence fell out of favor as a field of study in the late 1990s
(Kihlstrom & Cantor, 2000), mainly due to technical constraints. The pioneering theories of J.P. Guilford and his group (O’Sullivan, Guilford, & deMille,
1965) in this area are truly outstanding. I believe that the tremendous access
to new media, particularly digital photographs and videos, will prove
extremely fruitful in the advancement of theories of social intelligence and
cultural competence. At the most basic level, updates can be made to prior
research (Archer, 1980). With video data comes the opportunity to analyze
interpersonal interactions. This can take the form of eye contact, social
distance, and so on to discern patterns of behaviors that people currently do
not even see. Nearly everyone has a camera built into their phone these days
and the proliferation of videos posted to social media sites is astounding.
Content analyses of this footage could contribute a tremendous amount to
our understanding of the dynamics of interpersonal interactions that occur
in a native context (i.e., outside of the research laboratory).
CONTENT ANALYSIS OF AUDIO DATA
A third medium that can be content analyzed is audio data. Perhaps one of
the most interesting examples of this in recent times is the musical application, Pandora. The concept behind the app is that there is an algorithm that
attempts to match a user’s musical preferences by learning what the user
“likes” and “does not like.” The result is an adaptive algorithm that is suited
to the user’s particular taste. Behind the scenes, the musical quality of each
song is what is subject to the content analysis. Each song in the database
must first be categorized and rated according to the content analytic coding
rubric. This rubric presumably classifies songs according to timing, melody,

10

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

harmony, genre (e.g., acoustic, big band), and so on. Once the scoring rubric
is developed, it is a fairly easy matter to write an algorithm to detect a match.
However, the success of the algorithm, from a user perspective, depends on
the extent to which the relevant dimensions associated with musical preference are correctly categorized and coded. Just as textual data can be coded
for a variety of different elements (e.g., content, grammar) so too can audio
information. My opinion is that Pandora is a reasonably good starting point;
however, I believe the algorithm needs refinement and/or that a competitor could easily come up with a different content analytic rubric that would
show superior market value.
Another recent study that is taking audio data as a source and subjecting it
to content analysis comes from an undergraduate thesis at Wesleyan that I am
reading, which aims to examine the speech patterns of criminals as portrayed
by the media. The concept behind this project is that it attempts to content
analyze the pitch, tone, cadence, and so on of speech patterns of individuals
identified as “criminals” within the context of popular media and compare
their patterns to the speech patterns of “heroes.”
Similar types of audio analyses could easily be conducted for presidential
speeches. While there has been past research analyzing the linguistic content
of presidential state of the union addresses (e.g., Lim, 2008), none of these
analyses that I have encountered have systematically analyzed the particular speech patterns of the speakers with regard to the wide variety of audio
information on which they could be classified. I see this as another emerging
area for the field.
EMERGING TRENDS AND FUTURE DIRECTIONS
IN CONTENT ANALYSIS
Analysts in the era of big data can make tremendous advances to our theoretical understanding of a vast array of topics by embracing the techniques of
content analysis. There are myriad research questions on a dizzying array of
topics that can be investigated using this technique. For example, do transformational leaders speak differently, use different audio patterns, use different visual patterns than do mediocre/nontransformational leaders? The
relevant units of analysis may be technical data (e.g., grammatical usage of
active/passive voice) or they could be more substantive (e.g., appeal to affiliation, power, achievement) or any of a variety of theoretical possibilities (e.g.,
appeal to different levels of moral development). As a second example, in the
United States, we seem locked in perpetual conversations surrounding gun
violence prevention and counterterrorism. The methodology of content analysis has a large role to play in advancing our understanding of how to prevent

Content Analysis

11

violence. Thanks to the proliferation of big data, algorithms that content analyze textual, visual, and audio data may ultimately be able to help predict,
and therefore prevent, such incidences in the future.
However, with this nearly unlimited source of newfound data comes a second important element: the need for a guiding theory. In order to find some
relationships, we typically have to have some idea of what it is we are looking for, and these ideas generally come from strong theories. The versatility
of the method of content analysis to handle textual, visual, and auditory data
makes it extremely powerful. The technique can use theory to analyze data,
but it can also use data to help generate theory. The flexibility of the technique, coupled with the massive amounts of newly generated archival data
resulting from advanced technology suggest that we will soon be reading
many more studies that rely on content analysis.
REFERENCES
Adams, S. H., & Harpster, T. (2008). 911 Homicide calls and statement analysis: Is the
caller the killer? FBI Law Enforcement Bulletin, 77(6), 22–31.
Archer, D. (1980). How to expand your Social Intelligence Quotient. New York, NY: M.
Evans and Company, Inc.
Bebell, D. & Stemler, S.E. (April, 2004). Reassessing the objectives of educational accountability in Massachusetts: The mismatch between Massachusetts and the MCAS. Paper
presented at the annual meeting of the American Educational Research Association: San Diego, CA.
Burstein, J. C. (2003). The E-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. C. Burstein (Eds.), Automated essay
scoring: A cross-disciplinary perspective (pp. 113–122). Mahwah, NJ: Lawrence Erlbaum Associates.
Carney, S. (2013). Representing crime and criminals in children’s television. Invited talk
presented to the faculty of Psychology at Connecticut College.
Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: Content analysis
of Tweets during the 2009 H1N1 outbreak. PLoS One, 5(11), 1–13.
Dechesne, M. (2013). Review of A.G. Smith (ed.). The relationship between rhetoric
and terrorist violence. Perspectives on Terrorism, 7(5). Retrieved from http://www.
terrorismanalysts.com/pt/index.php/pot/article/view/298/html.
Garcia, D., & Sikstrom, S. (in press). The dark side of Facebook: Semantic representations of status updates predict the Dark Triad of personality. Personality and
Individual Differences, doi:10.1016/j.paid.2013.10.001
Glaser, B., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative
research. Chicago, IL: Aldine.
Kihlstrom, J. F., & Cantor, N. (2000). Social intelligence. In R. J. Sternberg (Ed.), Handbook of intelligence (2nd ed., pp. 359–379). Cambridge, England: Cambridge University Press.

12

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

Krippendorff, K. (2012). Content Analysis: An introduction to its methodology (3rd ed.).
Thousand Oaks, CA: Sage.
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of
Chicago Press.
Landauer, T. K., & Dumais, S. (1997). A solution to Plato’s problem: The Latent
Semantic Analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to Latent
Semantic Analysis. Discourse Processes, 25, 259–284. Retrieved from http://lsa.
colorado.edu/papers/dp1.LSAintro.pdf.
Lim, E. (2008). The anti-intellectual presidency. New York, NY: Oxford University Press.
O’Sullivan, M., Guilford, J.P., & deMille, R. (1965). The measurement of social intelligence. University of Southern California Psychological Laboratory: Reports from
the Laboratory.
Page, E. B. (1966). The imminence of grading essays by computer. Phi Delta Kappan,
48, 238–243.
Page, E. B. (1994). Computer grading of student prose, using modern concepts, and
software. Journal of Experimental Education, 62, 127–142.
Pennebaker, J. W. (2011). Using computer analyses to identify language style and
aggressive intent: The secret life of function words. Dynamics of Asymmetric Conflict, 4(2), 92–102.
Shermis, M. D., & Burstein, J. (Eds.) (2002). Automated Essay Scoring. New York, NY:
Taylor and Francis.
Shermis, M. D., & Burstein, J. (Eds.) (2013). Handbook of automated essay evaluation:
Current applications and new directions. New York, NY: Taylor and Francis.
Smith, A. G. (Ed.) (2013). The relationship between rhetoric and terrorist violence. London,
England: Routledge.
Stemler, S. E. (2001). An overview of content analysis. Practical Assessment, Research,
and Evaluation, 7(17). Retrieved from http://pareonline.net/getvn.asp?v=7&
n=17.
Stemler, S. E. (2012). What should university admissions test predict? Educational
Psychologist, 47(1), 5–17.
Stemler, S. E., & Bebell, D. (2012). The school mission statement: Values, goals, and identities in American Education. New York, NY: Taylor and Francis.
Stemler, S. E., Bebell, D., & Sonnabend, L. (2011). Using school mission statements
for reflection and research. Educational Administration Quarterly, 47(2), 383–420.
Weiss, J. (March 14, 2014). The man who killed the SAT. The Boston Globe.
Accessed on October 14, 2014 via: http://www.ohio.com/editorial/joanna-weissa-nerd-killed-the-sat-essay-1.473561.
Winter, D. G. (2011). Scoring motive imagery in documents from four Middle East
opposition groups. Dynamics of Asymmetric Conflict, 4(2), 144–154.
Winter, S., Neubaum, G., Eimler, S.C., Gordon, V., Theil, J., Herrmann, J., … Kramer,
N.C. (2014). Another brick in the Facebook wall: How personality traits relate to
the content of status updates. Computers in Human Behavior, 34, 194–202.

Content Analysis

13

Woodworth, M., Hancock, J., Porter, S., Hare, R., Logan, M., O’Toole, M., &
Smith, S. (July, 2012). The language of psychopaths. FBI Law Enforcement
Bulletin. Retrieved from http://www.fbi.gov/stats-services/publications/lawenforcement-bulletin/july-2012/the-language-of-psychopaths.

FURTHER READING
Haney, W., Russell, M., Gulek, C., & Fierros, E. (1998). Drawing on education: Using
student drawings to promote middle school improvement. Schools in the Middle,
7(3), 38–43.
Holsti, O. R. (1969). Content analysis for the social sciences and humanities. Reading, MA:
Addison-Wesley.
Krippendorff, K., & Bock, M. (2008). The content analysis reader. Thousand Oaks, CA:
Sage.
Lakoff, G. (1996). Moral politics: What conservatives know that liberals don’t. Chicago, IL:
University of Chicago Press.
Lakoff, G. (2004). Don’t think of an elephant! Know your values and frame the debate. White
River Junction, England: Chelsea Green Publishing.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation
of essays with the Intelligent Essay Assessor. In M. D. Shermis & J. C. Burstein
(Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Mahwah, NJ: Lawrence Erlbaum Associates.
Pennebaker, J. W., & Chung, C. K. (2008). Computerized text analysis of Al-Qaeda
transcripts. In K. Krippendorff (Ed.), The content analysis reader (pp. 453–466).
Thousand Oaks, CA: Sage.
Weber, R. P. (1990). Basic content analysis (2nd ed). Newbury Park, CA: Sage.
Winter, D. (2005). Things I’ve learned about personality from studying political leaders at a distance. Journal of Personality, 73(3), 557–584.

STEVEN E. STEMLER SHORT BIOGRAPHY
Steven E. Stemler is an Associate Professor of Psychology at Wesleyan
University. He received his doctorate in Educational Research, Measurement, and Evaluation from Boston College, where he worked at the
Center for the Study of Testing, Evaluation, and Educational Policy and
the TIMSS International Study Center. Before joining the faculty at Wesleyan, Steve spent 4 years at Yale University where he was an Associate
Research Scientist in the Department of Psychology. His area of expertise
is in the assessment of noncognitive factors, with a special emphasis on
the domains of social intelligence, creativity, intercultural literacy, and
ethical reasoning (see: http://sstemler.web.wesleyan.edu/stemlerlab and
http://www.purposeofschool.com

14

EMERGING TRENDS IN THE SOCIAL AND BEHAVIORAL SCIENCES

RELATED ESSAYS
To Flop Is Human: Inventing Better Scientific Approaches to Anticipating
Failure (Methods), Robert Boruch and Alan Ruby
Ambulatory Assessment: Methods for Studying Everyday Life (Methods),
Tamlin S. Conner and Matthias R. Mehl
Models of Nonlinear Growth (Methods), Patrick Coulombe and James P.
Selig
Quantile Regression Methods (Methods), Bernd Fitzenberger and Ralf
Andreas Wilke
Ethnography in the Digital Age (Methods), Alan Howard and Alexander
Mawyer
Participant Observation (Methods), Danny Jorgensen
Structural Equation Modeling and Latent Variable Approaches (Methods),
Alex Liu
Data Mining (Methods), Gregg R. Murray and Anthony Scime
Digital Methods for Web Research (Methods), Richard Rogers