The test list methodology
Censorship findings are only as interesting as the sites and services that you test.
We encourage you to suggest sites and services to test for censorship.
Please read the documentation below to contribute to community resources for censorship measurement research.
What are test lists?
Test lists are machine-readable CSV files that include URLs that are tested for
censorship.
Censorship measurement projects like OONI rely on a global community of
volunteers who run censorship detection tests from local vantage points. In
light of bandwidth constraints, testing most websites available on the internet
is not practical (nor possible in many cases). Instead, our measurements focus
on a sample of websites provided in “test lists”: machine-readable CSV files
with a set of curated, interesting domains. There are two types of test lists:
Global test list: Includes a wide range of internationally relevant websites (e.g. facebook.com), most of which are in English
Country-specific test lists: Include websites that are only relevant to a
specific country (e.g. Brazilian media websites), many of which are in local languages
To maximize the breadth of coverage while reducing research bias, test list URLs
are broken down into 30 diverse categories:
- ALDR (Alcohol & Drugs): “Sites devoted to the use, paraphernalia, and sale of drugs and alcohol irrespective of the local legality.”
- REL (Religion): “Sites devoted to discussion of religious issues, both supportive and critical, as well as discussion of minority religious groups.”
- PORN (Pornography): Hard-core and soft-core pornography.
- PROV (Provocative Attire): “Websites which show provocative attire and portray women in a sexual manner, wearing minimal clothing. “
- POLR (Political Criticism): “Content that offers critical political viewpoints. Includes critical authors and bloggers, as well as oppositional political organizations. Includes pro-democracy content, anti-corruption content as well as content calling for changes in leadership, governance issues, legal reform. Etc.”
- HUMR (Human Rights Issues): Sites dedicated to discussing human rights issues in various forms. Includes women’s rights and rights of minority ethnic groups.
- ENV (Environment): “Pollution, international environmental treaties, deforestation, environmental justice, disasters, etc.”
- MILX (Terrorism and Militants): “Sites promoting terrorism, violent militant or separatist movements.”
- HATE (Hate Speech): “Content that disparages particular groups or persons based on race, sex, sexuality or other characteristics”
- NEWS (News Media): “This category includes major news outlets (BBC, CNN, etc.) as well as regional news outlets and independent media.”
- XED (Sex Education): “Includes contraception, abstinence, STDs, healthy sexuality, teen pregnancy, rape prevention, abortion, sexual rights, and sexual health services.”
- PUBH (Public Health): “HIV, SARS, bird flu, centers for disease control, World Health Organization, etc”
- GMB (Gambling): “Online gambling sites. Includes casino games, sports betting, etc.”
- ANON (Anonymization and circumvention tools): “Sites that provide tools used for anonymization, circumvention, proxy-services and encryption.”
- DATE (Online Dating): “Online dating services which can be used to meet people, post profiles, chat, etc”
- GRP (Social Networking): Social networking tools and platforms.
- LGBT (LGBT): A range of gay-lesbian-bisexual-transgender queer issues. (Excluding pornography)
- FILE (File-sharing): “Sites and tools used to share files, including cloud-based file storage, torrents and P2P file-sharing tools.”
- HACK (Hacking Tools): “Sites dedicated to computer security, including news and tools. Includes malicious and non-malicious content.”
- COMT (Communication Tools): “Sites and tools for individual and group communications. Includes webmail, VoIP, instant messaging, chat and mobile messaging applications.”
- MMED (Media sharing): “Video, audio or photo sharing platforms.”
- HOST (Hosting and Blogging Platforms): “Web hosting services, blogging and other online publishing platforms.”
- SRCH (Search Engines): Search engines and portals.
- GAME (Gaming): “Online games and gaming platforms, excluding gambling sites.”
- CULTR (Culture): “Content relating to entertainment, history, literature, music, film, books, satire and humour”
- ECON (Economics): “General economic development and poverty related topics, agencies and funding opportunities”
- GOVT (Government): “Government-run websites, including military sites.”
- COMM (E-commerce): “Websites of commercial services and products.”
- CTRL (Control content): “Benign or innocuous content used as a control.”
- IGO (Intergovernmental Organizations): “Websites of intergovernmental organizations such as the United Nations.”
These categories range from news
media, culture, and human rights issues to more provocative or objectionable
categories, like pornography (the latter are included because they are more
likely to be blocked, enabling the detection of censorship techniques adopted by
ISPs).
Creating test lists requires local knowledge, an understanding of which sites
are commonly accessed and more likely to be blocked in light of a country’s
social and political environment. The Citizen Lab (which manages the test list
project) has, therefore, made the lists publicly available on GitHub,
encouraging community contributions.
What aren’t test lists?
1. A list of thousands of sites scraped from Alexa
Creating (or contributing to existing) test lists is not a question of scraping
“the top 1,000 sites” from Alexa. Rather, it requires research, an understanding
of a country’s social and political environment, and how that may motivate
information controls.
2. Blocklists
Some governments occasionally publish official blocklists (or they get leaked)
which contain lists of websites that are legally prohibited in a country.
Internet Service Providers (ISPs) are then ordered to block access to all
websites included in such blocklists, commonly involving hundreds (or thousands)
of URLs that contain content illegal in that country (such as gambling, file sharing, adult content, etc.).
Test lists, on the other hand, are not meant to be limited to blocked websites.
Rather, they serve the purpose of monitoring when policies change - what’s most
likely to be blocked or unblocked. While test lists may include some websites
that are known to be blocked (and that is useful for detecting censorship
techniques adopted by ISPs), most sites are not censored locally when they are
added to test lists. The aim of using the test list methodology is to not only
identify censorship, but to also confirm the accessibility of sites. Unlike
blocklists (which can include thousands of URLs), each test list is usually
limited to up to 1,000 sites (due to the aforementioned bandwidth constraints).
Why contribute to test lists?
1. Censorship findings are only as interesting as the sites you test
When measuring censorship through the use of software like OONI Probe,
censorship findings are only as interesting as the sites that are tested. If
bbc.com, for example, is blocked in China, OONI Probe is only likely to detect
that if bbc.com was included in the Chinese test list to begin with.
It’s therefore important to ensure that test lists are representative of many
types of online content and reflect the country’s social, economic, and
political environment.
2. You can provide local insight
Examining internet censorship in a country requires local knowledge, an understanding of which sites and services are:
- commonly accessed;
- more likely to be blocked;
- interesting to test in light of a country’s social and political environment.
To ensure that test lists include a variety of different types of URLs that are updated on an ongoing basis, we encourage community contributions from around the world.
3. Potential risks
When running OONI Probe, you will connect to and download data from the websites
included in the global test list and in the test list which is specific to the
country that you are running OONI Probe from.
We therefore encourage you to review all of the URLs included in these lists carefully, prior to running OONI Probe, as connecting to some of these websites might be legally questionable (or illegal) in some jurisdictions.
If you are uncertain of the potential implications of connecting to and
downloading data from the websites listed in the test lists, you can pass your
own test list with the following type of command line option:
ooniprobe <test-name> -f <your-test-list>
Contributing to test lists
You can contribute to test lists in 2 ways:
This requires a bit of research. We provide some recommended research practices for
compiling (or contributing to existing) test lists in the following section.
Test list research
Background research
Understanding information controls in a country requires an understanding of the
country itself. Some background research on the country in question is therefore
essential to identifying websites that are worth testing for censorship.
In-depth PhD style research is not required. In fact, many online resources with
country profiles that you can refer to already exist, such as The World Factbook, the OpenNet Initiative, and Freedom House, among others. Your
background research probably shouldn’t be limited to such resources. Rather,
these resources can serve as a starting point for identifying sites to add to
test lists (in which case, you can even refer to a country’s Wikipedia page).
Knowing that a country has many ethnic minorities, for example, is a starting
point for subsequently exploring which sites represent the voices of those
groups. Due to their sensitive nature, such sites might be more likely to get
blocked (now or in the future), and so it might make sense to add them to your
test list. By reading news websites from that country, you may come across the
names of political activists. Similarly, it may be worth exploring whether those
activists have websites and adding them to your test list.
By researching the main economic, political, and social issues of a country, you
can search for a variety of different types of sites that address them and
present different opinions. Those are the types of sites that are worth adding
to test lists to monitor their accessibility over time. The process of
identifying sites to add to your test list can also be guided by the 30 categories of the test list methodology.
Drawing inspiration from 30 categories
The Citizen Lab’s test list methodology relies on 30 diverse categories for
URLs. These categories serve the following main purposes:
The more diverse the testing sample, the more likely researchers are to identify
different forms of internet censorship. By categorizing URLs, researchers can
more easily characterize internet censorship depending on what is blocked. In
Iran, for example, the breadth and scale of internet censorship appears to be
pervasive since many different types of websites were found to be blocked.
When working on a test list, you can refer to the 30 categories and search for
local websites that fall under each one. Ideally, a test list includes multiple
URLs for each of the 30 categories, though we recognize that this is not always
possible.
Research on previous cases of reported censorship
Has censorship been reported in the country that you’re compiling a test list
for? If so, which websites were reportedly blocked?
As part of your research for identifying sites to add to a test list, it’s
important to explore whether previous censorship events have been reported in
the country. Those sites might still be blocked, even if their ban has been
lifted. We, for example, found Vimeo and Reddit to be blocked in Indonesia, even though their ban was lifted more than two years ago.
Furthermore, certain sites might only be blocked in certain networks, rather
than on a nationwide level. By adding sites that have reportedly been blocked to
your test list, OONI Probe users can collect network measurement data examining
the accessibility of those sites over time (and may even be able to corroborate
media reports).
Previous censorship cases can also help with identifying:
The types of information that the country’s government censors (for example, if
political opposition sites were blocked in the past, it might be worth adding
them to test lists to examine if they’re blocked in the present or future);
The motivations behind censorship (for example, if a government has previously
blocked sites for political reasons, it may be worth searching for other sites
that could trigger politically-motivated censorship).
To identify censorship cases, you can start off by searching for relevant media
articles (where you’re likely to find the most recent cases). In addition to
international news websites, it’s important to search for censorship reports
through local media outlets as well. You can then refer to a variety of research
reports published by a number of digital rights organizations, including (but
not limited to) Citizen Lab, Freedom on the Net (Freedom House), OpenNet Initiative, Reporters Without Borders, and ARTICLE 19.
Given that economic, social, and political systems change over time (and the
motivations of governments change along with them), it’s important to update
test lists on an ongoing basis through the above recommended practices.
Reviewing test lists
All test lists that OONI Probe is designed to test for censorship are hosted in
the Citizen Lab’s test-list repository on GitHub.
To review country-specific test lists, please follow the steps below:
Step 1. Find the csv file which is specific to the country that you want
to run OONI Probe from (based on that country’s code)
here.
If you don’t find a csv file for your country, that’s probably because it
doesn’t exist yet. In this case, please refer to the next section on “Creating new test lists”.
Step 2. Add new URLs to the csv file under the “url” column.
Some criteria for adding new URLs can include the following:
The URLs cover topics of socio-political interest within the country;
The URLs are likely to be blocked because they include sensitive content (for
example, they touch upon sensitive issues or express political criticism);
The URLs have been blocked in the past;
You have faced difficulty connecting to those URLs.
For further criteria, please view the URL categories
here.
Please try to add URLs which fall under as many (if not all) of these categories as possible.
Step 3. Every time you add a URL, please add the following in the csv file for
each new URL:
Category code: Add the code of the category that each URL falls under.
This can be added under the “category_code” column of the csv file. The
category codes can be found here.
Category description: Add the description of the category that each URL
falls under. This can be added under the “category_description” column of
the csv file. The category descriptions can be found
here.
Date: Add the date of when you added each URL. This can be added under the
“date_added” column of the csv file.
Contributor: Optional: Add the name of the organization that you are
affiliated with, in terms of contributing to the test list. This can be
added under the “source” column of the csv file.
Notes: Optional: Here you can add notes for each URL under the “notes”
column of the csv file. This column, in particular, can be useful for describing
the type of URL added, particularly since the standardized categories are quite
broad. As an example, you can write “Site of the political opposition, reported
to be blocked during 2016 elections” in the “notes” column, to provide more
meaning than the standardized POLR category (and to provide context that may be
useful to researchers).
Step 4. Change the category codes and descriptions for URLs (included under
the “category_code” and “category_description” columns of the csv file) only
if you think that those URLs have been allocated to wrong category codes and
descriptions. In this case, please replace the category codes and descriptions
with ones (from the recommended categories that you think are more
suitable. We would also appreciate a comment on
GitHub or via email explaining the
proposed changes.
Step 5. Once you have reviewed a test list based on the above, please submit
your changes to us. If you’re a GitHub user, you can do so through a pull
request. If you’re not a GitHub user, please send us a spreadsheet
(including the same format as github csv files) by dropping us an email at
contact@openobservatory.org (PGP Key Fingerprint: 4C15 DDA9 96C6 C0CF 48BD
3309 6B29 43F0 0CB1 77B7).
Creating new test lists
If you can’t find a test list specific to your country
here, then it
probably does not exist yet. Please help us create a test list for your country
through the steps below:
Step 1. Create a csv file and name it based on an ISO-3166 two-letter country
code which is specific to the country that URLs are being added for. You can
find a reference for international standards for country codes
here. An example would include a csv file created for Andora, named ad.csv.
Step 2. Include the following columns in the newly created csv file:
url
category_code
category_description
date_added
source
notes
Step 3. Add URLs under the “url” column of the csv file.
Some criteria for adding new URLs can include the following:
The URLs cover topics of socio-political interest within the country;
The URLs are likely to be blocked because they include sensitive content (for
example, they touch upon sensitive issues or express political criticism);
The URLs have been blocked in the past;
You have faced difficulty connecting to those URLs.
For further criteria, please view URL categories
here.
Step 4. Every time you add a URL, please add the following in the csv file for
each new URL:
Category code: Add the code of the category that each URL falls under.
This can be added under the “category_code” column of the csv file. The
category codes can be found here.
Category description: Add the description of the category that each URL
falls under. This can be added under the “category_description” column of
the csv file. The category descriptions can be found
here.
Date: Add the date of when you added each URL. This can be added under the
“date_added” column of the csv file.
Contributor: Optional: Add the name of the organization that you are
affiliated with, in terms of contributing to the test list. This can be
added under the “source” column of the csv file.
Notes: Optional: Here you can add notes for each URL under the “notes”
column of the csv file. This column, in particular, can be useful for describing
the type of URL added, particularly since the standardized categories are quite
broad. As an example, you can write “Site of the political opposition, reported
to be blocked during 2016 elections” in the “notes” column, to provide more
meaning than the standardized POLR category (and to provide context that may be
useful to researchers).
Step 5. Once you have created a new test list based on the above, please
submit your csv file to us. If you’re a GitHub user, you can do so through a
pull request. If you’re not a GitHub user, please send us your csv file by
dropping us an email at contact@openobservatory.org (PGP Key Fingerprint: 4C15
DDA9 96C6 C0CF 48BD 3309 6B29 43F0 0CB1 77B7).
Important tips
Always include the full URL, including the HTTP or HTTPS prefix, exactly as it appears when you type it into a browser. If you include example.com
in a test list, OONI Probe won’t be able to test it. Rather, it should be included as http://www.example.com
, if that is what it looks like in a browser.
Always use the format described in the sections above. The test lists are meant to be machine-readable, and OONI Probe will not parse test lists that don’t strictly follow the prescribed format.
Please use the categories provided here and refrain from adding your own categories. The categories may not be perfect, and we welcome your suggestions for additional/alternative categories. But if you don’t use the prescribed category codes, OONI Probe will not be able to test those URLs, since test lists are meant to be machine-readable.
Please do not scrape and add “the top 1,000 Alexa sites”. Community contributions are more useful when they include URLs that (a) fall under these 30 diverse categories and (b) reflect local insight. Given that many OONI Probe users around the world have bandwidth constraints, we favour quality over quantity in terms of what is tested.
Thanks for contributing!