What makes a vlog a vlog?

30 minutes to read
Academic paper
Marissa K. Wood

This paper analyzes the “vlog” as a speech event. Using Hymes’s SPEAKING model, contextual and interactional parameters are described in order to identify vlogging as a recognizable genre. Four previous studies as well as three videos from seven prominent YouTube influencers were selected to contribute to a qualitative analysis. 

The vlog as a new form of interaction

In human history, we have never existed in the multiple spheres of life as we do now. Modern technology and the internet have granted us affordances that were beyond imaginable just a generation ago.  In its relatively short existence, the internet has already undergone a shift to being an “interpersonal resource rather than solely an informational network” (Zappaivgna, 2012). With these affordances, many realms of our social life exist in and even shift completely to the online world: dating, collaborating with colleagues, participating in activism, and even teaching and learning. By and through the World Wide Web, people are connecting with each other in a way that bypasses time, space, and traditional social conventions. Van Dijk (1991) describes this mode of prioritization and organization of society, shaped by social and media networks, as a network society.

This paradigm shift of interaction and communication calls for investigation. Not only are people connecting differently, also new genres have been created in order to keep up with this ‘mass communication’ infrastructure. Participants in online culture encounter many “new” genres: Facebook posts, Twitter threads, online forums, instant messaging, etc. What makes these interesting is that participants can distinguish one event from another. For example, if we come across the words “like, comment, and share” our membership to the Web 2.0 community helps us understand that the subject in question is related to a Facebook post. If we hear the words “hashtag, retweet, and at-mention” we know that these are concepts associated with the micro-blogging site, Twitter. In linguistics, a genre is a recognizable communicative occurrence; it is evident that 21st century individuals who spend much of their life in the online sphere can and do recognize these digital genres. 

Undoubtedly, context influences discourse and consequently dictates the conventions and constraints of a genre; therefore, a good starting point to assess any (new) genre would be to dissect and examine its contextual features. In other words, if one wants to examine aspects of a recognized speech event, it is imperative to look at the contextual elements that comprise it. With this in mind, researchers of new media can direct their focus to consider online genres within the digital atmosphere in which they flourish.

Context influences discourse and consequently dictates the conventions and constraints of a genre

One new genre that has proliferated in recent years, thanks to the internet and technological advancements, is the video blog, or “vlog” for short. The already-blended word, blog (from “web log”) spawned the new lexicon, which was created to reference the now-common act of regularly posting content with a video component. This is not a phenomenon isolated to super-savvy web geeks; vlogs today are extremely popular. Discussing YouTube, the “largest online video sharing site”, Werner (2012) explains:

“Back in 2007, when Jean Burgess and Joshua Green performed a comprehensive analysis of the site’s most popular content (tracking what videos were most viewed, favorited, discussed, and responded to) they found that the most popular videos, according to these metrics, were not clips created by or cribbed from old media, as they originally expected. Rather, the most popular content on YouTube was generated by the site’s users. Perhaps surprisingly, most magnetic of all were these dashed-off, rambling video speeches, vlogs” (pp. 8-9)

The present study investigates the personal vlog as a genre — or speech event — that deserves precise examination. Through analysis of its contextual, interactional parameters, I hope to shed light on the various elements that constitute a vlog as well as gain a better understanding of the social functions of the genre.

A vlog is a speech event; a speech event is a genre

It is necessary to begin with a definition of the object in question. Despite slightly different characterizations, people generally have an idea about what a vlog is; they have expectations regarding the phenomenon, and usually recognize it as a familiar occurrence in the online world. Essentially, a vlog is a “a spoken, asynchronous form of computer-mediated communication” (Frobenius, 2014, p.59) with a visual element: a headshot of the speaker.  Cambridge Dictionary Online defines a vlog as “a video blog” that is a “record of your thoughts, opinions, experiences that you film and publish on the internet.” It is important to note that “several types of vlog are available on the web including instructional videos, travel updates, and personal commentaries” (Christensson, 2011).  Vlogs for this study are limited to the individually hosted, user-generated videos from YouTube, where a vlogger speaks to viewers in a casual, conversational way.

An appropriate origin for this qualitative research is within the theory of genre analysis. Genre analysis studies the social functions of different kinds of texts. A text in this sense means any chunk of meaningful discourse, including both written and spoken content (cf. Halliday, 1978 in Jones, 2012). Considering genre in a linguistic, discursive sense, Bhatia (1993) says the following:

“A genre is a recognizable communicative event characterized by a set of communicative purposes identified and mutually understood by members of the community in which it occurs. Most often it is highly structured and conventionalized with constraints on allowable contributions in terms of their intent, positioning, form and functional value.” (p. 13)

Annihilating the parameters of time and geographic space, the discourse that flourishes online does in fact adhere to generic norms

With this definition, two assumptions can be made: (1) a genre is essentially a speech event  and (2) genres are highly influenced by the social sphere in which they exist. Albeit enormous, the online world does indeed make up a community — which branches into many diverse sub-communities. While annihilating the parameters of time and geographic space, the discourse that flourishes within this online sphere does in fact adhere to generic norms.

Another point of departure for this study is from the ethnography of communication, whose main unit of analysis is a speech event. Jones (2012) describes a speech event as:

“a communicative activity that has a clear beginning and a clear ending and in which people’s shared understandings of the relevance of various contextual features remain fairly constant throughout the event [...] Speech events occur within broader speech situations and are made up of smaller speech acts” (p. 64).

Dell Hymes, a linguistic anthropologist who worked vigorously in this field, was interested in “the ways using and understanding language are related to wider social and cultural knowledge” (Jones, 2012, p. 24). He argued that in order to participate effectively in a speech event, it is crucial to have communicative competence, i.e., mastery of the “rules, norms, and conventions regarding what to say to whom, when, where and how” (p.24). Significantly, not only can expert members be identified by their display of adept communicative competence, but also these exemplar members can (re)shape, bend, and blend the norms which dictate a communicative event i.e., the genre. Although Hymes was reluctant to consider genre and speech event as the same analytical unit, Swales (1990) who literally wrote the book on genre analysis begs to differ. He adopted the alternate idea that it is speech situations and genres that should be kept separate, since situations have much more variation within them and can include various speech events.

From the definitions above, we can see that the two concepts (genre and speech event) are inextricably linked. They both deal with participants of a particular community, using their cultural knowledge to recognize a communicative phenomenon by its structure and purpose. The recognition of said speech event propels participants to behave and contribute in the event in certain ways.  Henceforth, they will be considered synonymous in this study. 

As mentioned in the introduction, context is an important aspect that influences a speech event. Additionally, an individual’s communicative competence undoubtedly plays a role in shaping his or her discourse. With these two things in mind, Hymes created one of the most comprehensive models to assess context related to a speech event: the SPEAKING model.

Jones (2012) remarks that the SPEAKING model makes up “a set of guidelines an analyst can use in attempting to find out what aspects of context are important and relevant from the point of view of participants” (p. 65). Therefore, not all components will carry the same weight to the various participants at a given time. Additionally, not all elements of the SPEAKING model will be equally influential in the speech event at hand. Nonetheless, identifying the possibilities of each of these contextual elements can help shed light on the conventions and components of a genre.

Figure 1 - Components of Hymes’ SPEAKING model*

The components of this model cannot stand alone; they rely on and influence the others, regardless of their importance to the participants. Assessing the “linkages” between the components could serve as an additional analysis, surely sharpening the understanding of the genre as a whole. These will be addressed briefly in the discussion section.

For this study, some of the more complex of the SPEAKING components will be informed by other linguistic theories. Analysis of the  participants, one can think of Goffman’s (1979) participation framework and designate the various speaker and hearer roles that could be at play within the speech event. Considering the ends of a speech event, I take a pragmatic approach and consider the vlogs and videos that make up a channel to be logical speech occurrences. Also, by uncovering  the implicatures being employed during the speech event, the motive or goals may become clearer. Act sequence is informed by Swales’ (1990) suggestions for analyzing moves and steps typical of the genres. Although key is a highly dependent element, the notions of footing and footing changes will be discussed (cf. Goffman, 1979).

Studies about the vlogging  phenomena

“Vlogging” and its dynamic, interactive, online-culture essence have attracted researchers from social, communicative and linguistic perspectives. Four previous studies will be mentioned in order to collect preliminary information about the nature of vlogs and gain insight into the contextual elements that they encompass.

Werner’s (2012) dissertation on the rhetorical genres of vlogs provides insight into the evolution of the genre, noting that “three affordances of vlogs differentiate them from speech genres they remediate: vlogging’s reach, replayability, and modularity” (p. iii). He identifies four types of vlogging (the “confession”, “reaction”, “rant”, and “witness” videos) which he uses as the frame of his work, aiming to uncover how vlogs have been cultivated and used by hosts and viewers. Additionally, he mentions that “genres are powerful instruments that allow rhetors to carry out established ‘social actions’” (p.3), supporting one of the aims of this study.

Frobenius (2014) investigated “audience design in monologues” and focused on the strategies that vlog hosts (i.e., “vloggers”) use in order to involve their audience. She argues that vlogs are indeed monologues and uses Goffman’s (1979) notion of “imagined recipients” — as with TV broadcast and radio talk — to highlight the notion that vloggers interact with a  non-present, future audience. Goffman’s (1979) participation framework is also used to discuss the various roles speakers and hearers can have in interaction, as well as audience design theories by Bell (1984) and Clark and Carlson (1982) in order to determine how role assignment takes place within the video blog medium. The study is a qualitative analysis, showing examples of five devices (proposed by Clark and Carlson, 1982) used to designate hearer roles in vlogs: physical arrangement, conversational history, gestures, manner of speaking, and linguistic content. Some of the strategies used to assign participant statuses “resemble those used in face-to-face interaction; [while] some are very specific to the genre vlog as they are adapted to the technical restrictions” (p. 70). By employing these strategies — such as using various terms of address, questions, and directed language; employing specific gaze and gesture movements within a fixed frame; embedding footings to maintain roles; and changing prosody or tenses — the vlog host interacts, albeit asynchronously, with viewers. Frobenius stressed the difficulty to observe the co-participation and negotiation aspect of role assignment; therefore, she also analyzed the written comments below the video blogs in her data set as to assess role uptake by the viewers.  He found that viewers did indeed “comprehend the shift in participation status” and responded to the request for “interaction through comments” (p. 61-69).

Attempting to focus her investigation on one distinct subgenre of vlog, Riboni (2017) conducted a linguistic analysis of “the YouTube makeup tutorial”. Like Frobenius, Riboni states that “the vlogging component confers makeup tutorials the liveliness, immediacy and conversationality typical of face-to-face communication [...] YouTube vlog’s continuous address to the viewers inherently invites their feedback” (p.190). She contextualizes her research with a description of the “guru” phenomenon that exists on YouTube: “‘Gurus’ are content creators who are particularly authoritative in a specific field, have a considerable follower base thanks to their expertise and are often paid by brands in order to promote their products” (p. 189). By collecting her data from “guru” channels, Riboni aimed to uncover the communication skills employed and thus, the successful use of the makeup video genre. Her qualitative analysis focuses on “generic, rhetorical, and linguistic practices” and also shows how “gurus discursively construct their identity as well as represent the idea of beauty and makeup” (p.189).

Gurus construct their identity and connect with their audience [...] by hybridizing their discourse with that of personal development

Riboni uses Faiclough’s (1992) three-dimensional model for the examination of communicative events, primarily because it takes into account textual, discourse, and social aspects, which are applicable for the study of genre realization. She identifies the rhetorical structure (i.e., the “moves”) of the makeup tutorial: greeting/welcoming, summary of video content, makeup application and advice (i.e., the main move), and the leave-taking section usually including a “call to action”. This structure is quite fixed, therefore highlighting one way to recognize the genre. The linguistic features that Riboni found in her 2017 study include: formulaic expressions, engagement markers, and the combined use of different personal pronouns, modes, and text types. These language strategies “aim to build rapport with the audience” and set themselves apart from other makeup channels.  Finally, the study discusses how the makeup gurus construct their identity and connect with their audience: they mention their flaws and are self-ironic, and also they hybridize their discourse with that of personal development so that they present themselves “not just as makeup gurus but as life gurus, providing general suggestions and setting the example for their acolytes to follow” (p. 200). Interestingly, through critical discourse analysis, the author was able to reveal that the presiding ideology constructed through language in the vlogs is that makeup is necessary. This exposes an interdiscursive nature in makeup tutorials (i.e., coupled with advertising discourse). The author argues that “beauty and cosmetic discourse situates itself at a crossroads of important, dominant discourses, such as consumerism, commodification of body image and identity” (p. 200) and argues that, therefore, it is important that we investigate the genre.

Taking a look at the interactional component of vlogging, Sanchez-Cortes, Kumano, Otsuka, and Gatica-Perez (2015), examined verbal and nonverbal cues of vloggers to recognize mood impressions. They argue that although vlogs haven’t been studied much in this sense, identifying mood in text blogs or tweets has gained a lot of attention. For example, “many works have analyzed moods associated with daily life, political opinions, and population habit [...] suggesting that written forms are reliable means to transmit mood” (p.2). Their quantitative research aimed to study mood inference within the multimodal domain that vlogs afford.

These studies are important contributions to the research of vlogs as a genre: Werner (2012) provides a detailed discussion of four rhetorical vlog genres; Frobenius (2014) explains participant roles and their assignment; Riboni (2017) highlights structural and strategic elements in a specific subgenre; and Sanchez-Cortez et al. (2011) examine mood in vlogs. However, there lacks a full contextual analysis that could prove to be useful in determining vlogs as a “recognizable communicative event” (Bhatia, 1993). In what follows, I apply the SPEAKING model within a genre analysis to answer the research question: What are the contextual, interactional parameters that are characteristic of a vlog?

Data selection and method

To analyze the “vlog” as a genre, various samples of the speech event in question were observed. I follow Riboni’s (2017) suggestion that “those videos which receive the most hits and enjoy the widest circulation are likely to be the most representative of the genre while, at the same time, proving more likely to affect it” (p. 191). Therefore, I have selected seven YouTube channels from the most popular individual vloggers on the platform.

Figure 2 - YouTube channels selected for data collection

Three “typical” videos — those which are a regular, recurring type on the channel — from each vlogger were selected for analysis. A total of 21 videos comprising of 3 hours, 24  minutes and 5 seconds of vlog footage were analyzed. The average time of videos was 9:43; the maximum video length was 14:14, and the minimum video length was 4:28.

To analyze the videos, I watched them several times and took notes on each SPEAKING component, considering the aforementioned theories. I cross-referenced these notes to compile lists of similarities and differences among each vlogger’s videos and then with those of the other channels. From the lists of similarities, in addition to recurring notions in the literature, I was able to detect patterns within the medium, thereby identifying some conventions of the genre.

Next, I present the qualitative analysis of the SPEAKING model applied to the vlog genre. Exemplary contextual features realized by the vlogs are illustrated to support the analysis.

The setting of a vlog

The first important concept to address about setting, considering the vlog as a speech event, is that it annihilates limitations of time and geographical space. A vlogger could have filmed and posted a video in 2008 in Brazil, and the viewer could be watching today it in Germany. Yet, the speech event is still able to occur. For a moment, there is a ‘virtual chamber’ where the speech event is taking place, connecting two (or more) people, in different times and perhaps very different places. A full conceptualization of this element would deem it necessary to consider both vlogger setting and viewer setting. However, it is impossible to account for all the locations and conditions which vlogs are viewed from, thus, we can only imagine this variable as an infinite possibility.

With the rise of digital media, news items are increasingly based on emotions rather than rationality. In an ideal public sphere, we would have both.

The vlogger setting is much more feasible to examine, and generalizations are supported by both the literature and the data collected for this study. Werner (2012) states that “vlogs nearly always have intimate, domestic settings. Basements, bathrooms, living rooms, and bedrooms provide the (poorly lit and out-of-focus) backdrop for most vlogs” (p. 6). In all of the channels analyzed in this sample, all but one were in a bedroom or living room area. PewDiePie’s videos were recorded in a small studio/office, which perhaps is the result of his YouTube success — his earlier videos are indeed recorded in a bedroom. Sanchez-Cortez et al. (2015) reaches a similar deduction: “the typical vlog is recorded indoors with a commercial webcam, lasts about 3 min, and features the head and shoulders of the vlogger” (p. 7). Although the average time of video from this data was quite higher (9:43), the “talking-head” format with a static camera shot is a common notion described across the literature. All the videos in the data did indeed include many segments (if not entire videos) in the talking-head format. NigaHiga had a few segments embedded into his videos where he performed skits with his full body (and other participants) in view. These segments were only a few seconds in length, and served to break up the monologue and add a comedy element. The discourse occurring in these atypical segments, however, was not directed to the audience, but rather acted more as an embedded clip. The video quickly resumes to the close-up head shot when he is engaging in conversation with viewers.

Figure 3 — NigaHiga typical ‘talking-head’ setting vs. embedded video skit

It is common for vloggers to edit their videos before posting, which affects the setting (and also the key, which will be mentioned below). Editing results in some segments of the vlog to include computer-generated backgrounds, split screens (of another video or still image and the vlogger’s window), and the addition of written text and graphics. The inclusion of written text is the most popular “enhancement” for vloggers in this data set, probably because it is the most simple to execute. This addition influences the setting because it alters the face-to-face communication feel, and guides viewers focus to particular ideas. Examples of these digital alterations to the setting can be seen in Figures 6-9.

Lastly, the physical arrangement is an important factor to setting. All seven vloggers recorded videos in what appeared to be their home; therefore, many personal items were strewn about in the background. Because vloggers have full production power, it is safe to assume that the types of objects shown and their placement is strategic. Yuya’s physical surroundings (and her appearance) were the most decorative. This is perhaps due to the type of channel she employs, which sometimes posts videos of arts, crafts and DIY projects, but mostly hair and makeup tutorials. The clean and neatly adorned room (and host) is a “pretty” visual to help maintain her image and identity as a trendy, fashionable girl.

Figure 4 — Yuya’s highly decorated background enhances her setting

White walls with posters was a coincidental constant in the data. Also, 100% of the videos recorded in a bedroom had made beds. So, although the essence of a video blog is authentic and rather informal, vloggers perhaps do feel a pressure to portray themselves as “tidy” individuals.

It is important to remember that the physical settings for the viewers can be quite diverse. We do know, however, that they are looking at the vlogger who is situated in a video box within the  layout of YouTube. Frobenius (2014) mentions vloggers’ “ability to imagine the viewer’s perspective as recipient of a vlog embedded in a website with a certain design” and says that hosts use gestures such as pointing down to the “comments box” or the “subscribe” button (p. 65). This is an interesting phenomenon, unique to vlogs, which makes participants’ physical settings converge to seem like one, unified speech event.

The participants in the vlog

Considering the contextual element participants, we must recognize that individuals bear different degrees of relevance in a given speech event. The vlog as a speech event is usually imagined within the dyadic speaker-hearer model: one vlogger, and one viewer. However, because of its wide range of possibility, the viewer setting allows for unratified hearer roles such as bystander, overhearer, and eavesdropper (Goffman, 1979). The vlogger setting is more controlled, and typically only involves the vlogger him/herself. From the data, during conversational segments (i.e., not skits or embedded video), only 2 of 21 videos included one other participant recognized in the vloggers physical setting; furthermore, the extra participants were only present for a few minutes of the video.

Jones (2012) says, “besides identifying the relevant participants, the different identities, roles, and rights different participants have are also important” (p. 66). Establishing the participants’ statuses is generally the vlogger’s responsibility, since he or she has the entirety of the floor in the monologue. Frobenius (2014) highlights that this task is achieved by multiple strategic communicative techniques, and it is common that he or she acknowledges his viewers’ ranks — both ‘old’ and ‘new’ viewers — which establishes a community spirit. “In” and “out” members are also established through discourse based on this created community. While assessing the various YouTube channels for this study, I had to familiarize myself with the discourse community at hand. There are inside jokes, references to past videos, and even lingo that an “outsider” would not quite understand unless prolonged connection with the channels.

The vlogger carries the weight of sole speaker and assumes all three roles of Goffman’s (1979) production format: animator, author, and principal. The vlogger is the animator because they are the one speaking; s/he is portrayed as the author because they are using their own words and sharing their own thoughts; and they have principal rights due to their high subscriber count, success, and YouTube star-status. Modesty aside, some of the vloggers proclaimed themselves as experts and expressed authority on a variety of topics. (Whether these proclamations are sarcastic or not is an issue of key, discussed below).

Excerpt of host self-appraisal:

“Jenna’s drunk brain is really creative and awesome and can do things just as good as your art teacher and you can learn online for free from me! Because I’m a fucking expert at everything.”  - JennaMarbles

However, it can be argued how expert these individuals should be considered. As Riboni (2017) discussed about makeup gurus, they tended to share life advice and personal development strategies, despite being normal girls who are simply interested in makeup. Nothing about their real world, prescribed identity gives them authority to advise on a person’s life or wellbeing.

Ends of vlogging

The ends or goal of a vlog varies depending on the type of vlog and the topic in question: lots of vlogs intend to entertain or make viewers laugh, some are meant to educate about a particular thing, others show how to do something, and a few just give a peek into someone else’s life. Essentially, the purpose of a vlog is for the vlogger to share information. YouTube’s “about page” claims: “We believe that everyone deserves to have a voice, and that the world is a better place when we listen, share and build community through our stories.” Four “freedoms” are highlighted to support this sentiment: expression, information, opportunity and belongingness. Although this description speaks about the platform as a whole, and not the vlog genre specifically, the essential purpose of making videos seeps through. Videos are meant to “connect, inform, and inspire others across the globe” (Riboni, 2017, p.189).

In the prominent channels analyzed, all are considered entertainment vlogs, and all vloggers but Yuya incorporate significant comedic elements. PewDiepie and elrubisOMG feature many segments with video games where they review, advise, or demonstrate gameplay. NigaHiga and HolaSoyGerman regularly incorporate skits; however, Higa embeds videos with other participants (as in Figure 3), while German impersonates others to incorporate various characters. Because he embodies the characters himself, the vlog perhaps remains more intimate, limiting the participants to just him and the viewers.

Figure 5 — HolaSoyGerman incorporates character impersonation for comedic element

The majority of vloggers’ videos analyzed in this dataset aim to make the viewers laugh, thereby giving some insight into the conventions of popular vlogs, i.e., they have a comedy purpose.

Act sequence of vlogging

Act sequence is another contextual element that varies depending on the type of vlog and the topic in question. However, similar first and last moves —recurring openings and closings — are typical. Out of the seven channels and videos analyzed, three had their own intro video segment, three had recurring greetings to open the videos, and all of them welcomed the guest in some informal way. After introductions, vloggers attend to the business at hand (i.e., the main move), addressing a topic that is usually alluded to in the title of the video. Within the main move, vloggers do indeed employ “virtual branding” and include linguistic components such as formulaic expressions and engagement markers (Riboni, 2017). This virtual branding is also witnessed with the recurring greetings (see Excerpts 2 - 4). Vloggers do this to make their channel more distinct and connect with the audience. Yet, the fact that most of them do this, adds to the notion that incorporation of this element makes it recognizable as a vlog.

Excerpts of recurring greeting in all three videos:

(2) “Hola guapuras! Como estan el dia de hoy?” – Yuya
Translation: Hi pretties! How are you guys today?

(3) “Muy buenas criaturitas de señor!!!” –elrubisOMG
Translation: Good day, little creatures of the lord!

(4) “Hey guys!”- NigaHiga

Each vlog analyzed characteristically has their own ‘move structure’ with typical ‘steps’ (Jones, 2012); through adherence to their own established structure, vloggers “fulfil the communicative purpose of the genre” (p. 9).  In other words, the speech event (the vlog) can be considered effective if the vlogger incorporates the expected steps, and choses an order that is anticipated by the viewers. For example, it is more predictable for the phrase “Don’t forget to like this video and subscribe!” to be situated at the end of the vlog, rather than the beginning. This is what Riboni (2017) named a “call to action” and was a step supported in the data set.


Key also varies greatly depending on topic, which is influenced by the goal. The four genres of vlogs that Werner (2012) examines — the confession, reaction, witness, rant videos — each present a different overall mood, which is influenced by what the vlogger is trying to accomplish with their monologue. Sanchez-Cortez e. a.l (2015) highlight that the video format allows vloggers to use both verbal and nonverbal cues in order to evoke different moods. Generally speaking, the overall tone of a vlog is that of an informal conversation with a friend, a sentiment that is supported by the parallels to face-to-face communication in the literature (Werner, 2012; Frobenius, 2014; Sanchez-Cortez et al., 2015; Riboni, 2017). As mentioned in the ends section, the majority of the vlogs in this study have a comedy purpose; therefore, they have a humorous, light feeling. Because of this, some of their statements (like Excerpt 1) can be considered sarcastic.

Also noteworthy is the personality and the mood of the vlogger him/herself. The prosody (loudness, speed, rhythm) and style (register) which a speaker employs can alter the tone of the conversation. In natural speech, interlocutors are accustomed to adapting to the speakers voice, speed, mannerisms, etc. However, the vlog creates a scenario where the speech event is altered and edited afterwards, thereby taking the natural stylistic and prosodic variation further away from face-to-face communication. All videos analyzed in this study incorporated “video cutting” which is shifting from one shot to another and sometimes changing perspective. This common phenomenon for video editing speeds up the transitions between sentences and allows for less “downtime”.  Essentially, video cutting advances the entire segment, and alters the tone of the video to feel faster, more intense, and perhaps more upbeat.

Another way that key is affected in vlogs is through complex footing changes employed (and/or edited in) by the vlogger. Hosts not only change their prosodic features to embed other characters and moods, but the author role gives them full “production” power. When they edit the video afterwards, and include various sound clips or special effects, the mood of the overall speech event becomes extremely dynamic. For example, HolaSoyGerman often refers back to earlier videos and mentions topics he has previously discussed, sometimes even showing an “old” video clip. When he embeds the clip, he changes his footing from present-day host to past host. Additionally, he makes the “old” video in black and white to accent this change in footing.

Vlogging Instrumentalities

Vlogs are delivered through digital video while the messages are transmitted through various means. Predominantly, the vlogger is speaking to the audience; therefore, oral speech is the main mode of communication. However, written text, still images and/or graphics, and video clips are also incorporated. The supplementary visuals (like in Figures 6 and 7) are edited in after, creating  multimodality which gives  emphasis on a particular concept. These visuals also may act as a transitional element, influencing the overall "flow" or feel of the presentation. The inclusion of these elements give viewers a “break” from the vlogger’s verbal mode of address and guides them to focus on a certain concept.

Figure 6 – Supplementary written text (PewDiePie)

Figure 7 – Supplementary image of Emma Watson edited in

A split-screen setting can also be the vehicle communication in the speech event. Figure 8 shows how a split screen can act as an instrument to incorporate multiple mediums of communication. The host is still in the shot, communicating verbally with viewers, yet the majority of the screen shows an image of survey results which is the topic in question.

Figure 8 — A split screen setting employs multiple instrumentalities

Norms & Genres of the vlog

Norms are decided depending on what subgenre the vlog channel exists in. However, if we consider the vlog as a speech event, vloggers are expected to stay in the frame of the camera (i.e., talking-head format) and be the sole host of the channel. They use one language to address viewers and open and close the video appropriately with conventional greetings. Vloggers commonly edit the video after for fluency and consistency, as well as to keep it entertaining, while short in length. No videos in this study surpassed 15 minutes. In the speech event, viewers can only participate posteriorly by contributing to the comments section and to “like” the video.  It is normal for host to ask viewers to do this. While some ‘norms’ are generally accepted, each channel has its own quirks and traces of personality. In this way, the vlogger and his or her followers do indeed make up a discourse community whereby they use the vlogs to get something done: whether it is to have a laugh, chat about old videos, or complete a task.

The last component of the SPEAKING model, genre, is quite paradoxical to employ in this study. Werner (2012) describes different genres based on their rhetoric. However, regarding a vlog as a speech event, it is appropriate to characterize the vlog as “genre of computer-mediated communication” (Frobenius, 2012, p. 59). Going further still, it is appropriate to distinguish the vlog subgenres because they add information to the context as a whole. In this study, the data mostly consisted of comedic, entertainment vlogs. Yuya’s channel included many instructional, how-to videos and segments about beauty and lifestyle. PewDiePie and elrubisOMG demonstrated various video games and incorporated lots of “internet topics” such as memes and gifs. HolaSoyGerman and NigaHiga were essentially comedy vlogs and included various skits and impersonations. However, all of them undeniably have a diaristic, conversational undertone, thereby categorizing them as a traditional vlog.


The analysis shows the SPEAKING model applied to the “vlog”, as both a speech event and genre.  Briefly touched upon in the sections above was the influence of components to one another, i.e., important “linkages”. Perhaps the most important contextual factor is subgenre. This element is linked with nearly every other component of the SPEAKING model. The subgenre, or type of video, dictates the ends, key, and influences act sequence (namely the content of the main move). It also impacts the setting,  participants, and the norms which govern them. For example, PewDiePie and elrubisOMG are channels that incorporate a lot of video game content, so their subgenre could be video-game vlog. This affects the setting, featuring split-screen segments (as seen below in Figure 9), and also the participants: the opponent the host is playing the game with becomes a ratified participant in the interaction. The act sequence features the game demonstration as the ‘main move’ and assumes the step order of the video game. The ends becomes sharing information and perhaps tips of a certain video game.

Figure 9 - PewDiePie split screen to demonstrate video game

Other important concepts to address are the acts of bending and blending genres. As Bhatia (1993) notes, constraints of a genre “are often exploited by expert members of the discourse community to achieve private intentions within the framework of the socially organized purpose(s)” (p.  13). The operation of creating and identifying subgenres of vlogs essentially blend the genre with something else. The “dashed-off, rambling video speeches” (Werner, 2012) are peppered with comedy bits, skits, instructional segments, life advice, and demonstrations of video games. These dynamic components are embedded in the talking-head, confessional vlogs, which are typically understood as the individual’s recorded diary.  Bending the vlog genre is perhaps more difficult to do, since the “constraints” are not well-defined, nor act as hard and fast rules. However, the impersonations that HolaSoyGerman (and others) incorporate can be considered a bend to the traditional vlogger-viewer paradigm. It not only changes the role assignment, it adds more characters, thus complicating the participants’ statuses. Although not examined in the dataset, it is worthwhile to note that vloggers do include dynamic settings in some of their videos; there are videos posted on the channels that are filmed in places other than the vloggers “usual” recording space. Whether it is a log of their vacation or a highlight of a special activity, these “field trips” bend the genre because they take viewers outside of the traditional, domestic setting of the vlog.

Lastly, because this study focused on context and not the discourse itself, many auxiliary elements of vlogging were not investigated. Noteworthy for further research is the YouTube comments section. This tool on the platform could give insights into interactional contextual elements such as how viewers weigh in on their assigned roles (Frobenius, 2014) and also can help determine if the goal was achieved. Examining this component would also inform analysts about the discourse community created by a particular vlog channel, and the certain goals it sets as a group.

Vlogging as a speach event

A vlog is characterized as “a video blog: a record of your thought, opinions, experiences that you film and publish on the internet” (Cambridge Dictionary Online). By looking at the vlog as a speech event, I examined the various contextual and interactional parameters that influence discourse, understanding, and overall meaning that occur within a vlog. Patterns and similarities were detected to give some insight into the genre of vlogging.  Nevertheless the vlogs studied here, which included those with the top subscriber counts on the entire YouTube platform, indicate generic hybridization. The content and essence of the vlog channels analyzed were varied with many different aims; they all seemed to overlap with other genres, such as how-to, instructional, or educational videos, but namely comedy and entertainment videos. The types of videos posted were not necessarily channel dependent (e.g., a typically comedy vlog channel could also post a serious “diary” video to give update about the vlogger’s life).  

Genres, especially on the internet, are ethereal in nature, and unfortunately limitations of this study could not allow for a deeper analysis. Although superficial generalizations have been made about vlogs among prominent vloggers, there are many conditional factors and exceptions to these ‘rules’. Hybridization of these “elite” vlog channels could indicate that vlogs are in a transformative stage, branching into different subgenres. These vlog subgenres are trying — perhaps in vain — to find their temporary niche in the online sphere of 21st century communication and interaction.


Bhatia, V.K. (1993). Analyzing genre: Language use in professional settings. London: Longman.

Christensson, P. (2011, August 31). Vlog Definition. Retrieved from https://techterms.com

Frobenius, M. (2014). Audience design in monologues: How vloggers involve their viewers. Journal of Pragmatics, 72, 59-72.

Goffman, E. (1981). Forms of Talk. Philadelphia: University of Pennsylvania Press.

Herring, S. C. (2009). Web content analysis: Expanding the paradigm. In International handbook of Internet research (pp. 233-249). Springer Netherlands.

Hymes, D. (1974). Foundations in sociolinguistics: An ethnographic approach. Philadelphia: University of Pennsylvania Press.

Jones, R.H. (2012). Discourse Analysis: A resource book for students. London: Routledge.

McAlone, N. (2017,  March 7). These are the 18 most popular YouTube stars in the world — and some are making millions. Retrieved from https://www.businessinsider.nl

Riboni, G. (2017). The YouTube makeup tutorial Video. A preliminary linguistic analysis of the language of “makeup gurus”. Lingue e Linguaggi, 21, 189-205.

Sanchez-Cortes, D., Kumano, S., Otsuka, K., & Gatica-Perez, D. (2015). In the mood for Vlog: Multimodal inference in conversational social video. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(2), 9.

Swales, J.M. (1990). Genre Analysis: English in academic and research settings. Cambridge; Cambridge University Press.

Vlog. (n.d.) In Cambridge Advanced Learner’s Dictionary & Thesaurus

Werner, E. A. (2012). Rants, reactions, and other rhetorics: Genres of the YouTube vlog (Doctoral dissertation, The University of North Carolina at Chapel Hill).

Zappavigna, M. (2012). Discourse of Twitter and social media: How we use language to create affiliation on the Web. New York: Continuum.