So far, I experimented with standardized tasks and datasets that were provided and easily accessible. In the real world though, NLP practitioners often have to solve a problem from scratch. This includes gathering and cleaning data, choosing a model, iterating on the model, and possibly going back to change the data.
Therefore, I have built my own system end-to-end for this project. A starter code was provided to me, but I freely tried many things beyond what was provided. The full process required the following steps:
- Understand the task specification
- Collect raw data
- Annotate training and test data for development
- Train and test models using this data
- “Deploy” the system
- Write a report/article
Task Specification
What is framing?
Framing is selecting and amplifying some aspects of a perceived reality in a communicating text. This could be a tweet, a news article, or any such media. Examples [1]:
- When some news media emphasize the mental illness of gun shooters over other aspects of gun violence in covering the issue, this is framing.
- When you choose to purchase a yogurt product that is advertised “90 percent less fat” rather than one saying “10 percent fat,” this is when framing effect occurs.
Why does it matter?
Quoting from one of the seminal studies of the problem:
In a polarized media environment, partisan media outlets intentionally frame news stories in a way to advance certain political agendas. Even when journalists make their best efforts to pursue objectivity, media framing often favors one side over another in political disputes, thus always resulting in some degree of bias. Hence, a news framing analysis is helpful because it not only tells us whether a news article is left- or right-leaning (or positive or negative), but also reveals how the article is structured to promote a certain side of the political spectrum.
In communication research, manual identification of media frames is a challenging task due to the large amount of media data in this news-saturated environment. More importantly, there is a high level of complexity in framing analysis that often requires a careful investigation of nuances in news coverage, which is time-consuming.
Liu et al. (2019)
Hence, an NLP solution that automates media framing identification would immensely help social scientists and other analysts.
The Task
Identify the framing of a given paragraph/sentence in several languages. Effectively, this is a straightforward text classification problem in a multilingual setting.
Input
A text file with one paragraph/sentence per line. An example of the input looks like this:
Some economists say that immigrants, legal and illegal, produce a net economic gain, while others say that they create a net loss.
Output
The output of the model will be a .tsv file with one sentence per line, a tab, and then the corresponding label.
The Annotation Standard
Social scientists have created a widely accepted list of 15 cross-cutting framing dimensions, such as economics, morality, and politics. These were originally developed by Boydstun et al. (2014) and are termed the “Policy Frames Codebook”. The paper describing the codebook is publicly available. The framing dimensions are also listed in the Table below.
Economy | costs, benefits, or other financial implications |
Capacity and Resources | availability of physical, human or financial resources, and capacity of current systems |
Morality | religious or ethical implications |
Fairness and Equality | balance or distribution of rights, responsibilities, and resources |
Legality, Constitutionality, Jurisdiction | rights, freedoms, and authority of individuals, corporations, and government |
Policy Prescription and Evaluatin | discussion of specific policies aimed at addressing problems |
Crime and Punishment | effectiveness and implications of laws and their enforcement |
Security and Defence | threats to welfare of the individual, community, or nation |
Health and Safety | health care, sanitation, public safety |
Quality of Life | threats and opportunities for the individual’s wealth, happiness, and well-being |
Cultural Identity | traditions, customs, or values of a social group in relation to a policy issue |
Public Sentiment | attitudes and opinions of the general public, including polling and demographics |
Political | considerations related to politics and politicians, including lobbying, elections, and attempts to sway voters |
External Regulation and Reputation | international reputation or foreign policy of the US |
Other | any coherent group of frames not covered by the above categories |
Domains
I applied the system to news data related to two social issues:
- immigration
- same-sex marriage
Languages
I had a few sentences in English already. However, considering the importance of multilingualism, the system was designed to provide labels for sentences in the following languages:
- English
- Mandarin (Chinese)
- Hindi
- Telugu
- Bengali
- Greek
As well as two surprise languages the system never saw before:
- Russian
- Turkish
References
[1] Taken from http://www.openframing.org/home.html
Share your thoughts