WikiCred, Wikimedia, and Iffy

Add Wikipedia article links to sites in the Iffy Index.
Explore Wikimedia data as an indicator of site reliability.

Fake news fails fact checks

Iffy.news is an Index of Unreliable Sources. It’s for mis/disinfo researchers that need a fake-news list that is current, well-documented, and independently verifiable.

If we’re going to call other sites untrustworthy, we can’t just say “Trust us” as the reason why. So each Iffy site links to the failed fact-checks that make that site unreliable.

Bullshit meter (parody of audio volume meter)

Bias is not a factor. Only bullshit is. Failed fact-checks, by IFCN-verified fact-checkers, are what gets a site listed as Iffy news.

The index lists all publishers that Media Bias/Fact Check gave a Factual Reporting rating of Low or Very Low. Each listed site links to their MBFC page that documents that rating and to fact-checks of their articles. (That’s another Iffy.news tool: Fact-check Search, which is a custom Google search of about 30 fact-check sites.)

The index factors in other trust-ratings too. For instance, if MBFC calls a site Questionable but NewsGuard rates it Green, that site’s taken off the list. (The changelog keeps track of removals and adds. The public spreadsheet has additional site data.)

Iffy.news is used in university research guides and in mis/disinfo research (including my own). And we’re always looking for other datasets that can make ours stronger. That’s where WikiCred came in.

1. Iffy adds Wikipedia links

The question was: Can Wikipedia data be used as an indicator of a site’s credibility? Spoiler: No, it cannot. Not in any machine-harvestable, structured-data way. But Wikipedia is an excellent source of additional information about sites.

So that was part one of this WikiCred project: The Iffy Index now has links to either a site’s Wikipedia article, if they have one, or Wikipedia’s List of fake news websites, if they’re listed. More than 125 of the 400 Iffy sites have a Wikipedia link, including about half of the top 100 most-visited Iffy sites (sorted by Site Rank).

2. Wikipedia not a credibility indicator

The problem with using Wikipedia as structured data is its data isn’t. Wikidata is well-structured but its data for the same entity often differ from Wikipedia’s. And neither’s taxonomy is helpful in identifying credibility on a large scale. A few examples:

In Wikipedia, Infowars’ “Type of Site:” is “Fake News.” But there’s no mention of “fake” in its Wikidata.
The type of site for Gateway Pundit is “Political blog.” Its infobox mentions nothing about unreliability. But its Wikidata lists it as an Instance of “fake news website” (the only entity with that designation: 1 SparQL query result).
The Natural News type of site is “Fake news blog.” But, again, WikiData is mum on mentioning “fake.”

Wikipedia articles have data for a site that Wikidata doesn’t have, and vice-versa. And Wikipedia’s List of fake news websites is a hodgepodge of name and URL syntaxes and a mix of defunct and active sites.

A consistent taxonomy across Wikimedia projects would help make the data a useful tool for indicating source reliability, as would making the domain name a unique identifier for a news site. With that in mind, the other part of this WikiCred project was to learn the taxonomy terms associated with all types of online news sources.

3. Iffy makes Wikidata SparQL

The result is a dataset of all news outlets with (English) Wikipedia articles (built off News on Wiki’s SparQL query for U.S. newspapers). This required combining separate queries, as no single class encompasses all news types: print, online, pod-/broadcast (nor does a single Infobox template).

The dataset lists 9.6K news sources, 9K of which are newspapers. The methodology and SparQL queries are explained on Iffy’s Wikimedia Newspapers page. More for entertainment than information, there’s also a map of the world’s newspapers, with tooltip links to their Wikipedia and Wikidata entries.

Newspapers of the world: Map of Wikidata query

My other datasets have lots of data not in Wikipedia or Wikidata for many news sites, like Alexa Site Rank, verified URL (200), circulation, and owner. I’d be happy to upload that into their infoboxes, but I’ve yet to discover an easy way to do that.

Along the way, I also saved several queries that aren’t useful as credibility indicators but may prove useful later, for example, all Wikidata entities with a RationalWiki ID. And I should mention the Sourceror project (another WikiCred-funded project), which provides API access to Wikipedia’s Reliable Sources table. That list, however, categorizes news sites mainly for fitness as a Wikipedia source, not whether the site is a reliable news source for the general public (e.g., Wikipedia_is_not_a_reliable_source).

If anyone wants to talk about structuring data and standardizing taxonomies for news-publication data in Wikimedia projects, contact me and count me in.