Social Science One: Request for Proposals for Fast Access to CrowdTangle and Ad Library Data

A year ago, founders of Social Science One, Gary King and Nathaniel Persily developed a new model of industry-academic partnerships and began a collaboration with Facebook to embark on an unprecedented effort to enlist the world’s research community to study social media’s effect on elections and democracy. Today, they are announcing their approval of the first group of over sixty researchers from around the world to use Facebook data for this purpose in a safe, secure, and privacy-protected manner.

Gary King and Nathaniel Persily believe that they can provide the fuel in the form of data access to the scholarly community to help solve some of the major issues in social media that affect elections and democracy across the world.

How Far They Have Come ?

King and Persily along with their partners at the Social Science Research Council issued a Request for Proposals in July 2018 for a database at the URL level. The database was to contain information about aggregate exposure to URLs by large numbers of different population subclasses and different characteristics of the URL (e.g., whether it was fact-checked and what the fact-check revealed). Since the resulting dataset would have some subclasses that contained sparse aggregations, there was concern that a researcher with bad intentions could, in theory, leverage the dataset to discover what some individual may have once seen on Facebook. As such, that dataset was deemed insufficiently protective without further modifications to their plans.

Over the last 6 months, Facebook has built a research tool that allows data grantees to log into and query Facebook data for insights. Since July 2018, they have been both helping Facebook deploy cutting edge “differential privacy” systems that, by adding specialized types of noise to the data or analysis methods, prevent researchers from re-identifying individuals, while simultaneously not obscuring research findings about societal patterns when researchers perform appropriate analyses. ‘We are still working with the Facebook team to finalize and test the research tool, validate the datasets within the tool, and ensure that the differential privacy algorithms are implemented in ways that provide both utility to researchers and privacy guarantees for the data’, they point out.

Where Are They Now ?

The researchers announced on April 2019 that access will be gained to some data immediately, and other datasets in stages when their testing indicates they both are useful for scholarly research and meet appropriate privacy and legal standards. They also confirmed the approval of the first group of over sixty researchers to use Facebook data to study social media’s effect on elections and democracy.

In addition, on May 2019, they shared 2 new requests for proposals; an RFP for the CrowdTangle API and an RFP for the Ad Library API.

1.    The CrowdTangle API includes a subset of public pages on Facebook and Instagram. The data we are providing access to includes, from Facebook, 6.9 billion page posts, 1.2 billion group posts, and 11.2 million verified profile posts, as well as 1.6 billion Instagram posts. More information about the dataset can be found in this codebook.

2.    The Ad Library API includes data on political advertising. The Ad Library is new and under active development, and we are hopeful Facebook will continually improve its capabilities so researchers can better analyze campaign spending, ad targeting, and ad content. It includes information about $540 million spent on 3.25 million ads in the US since May 2018; it also includes somewhat smaller numbers of ads in India, the UK, Ukraine, and Brazil. More information about this dataset can be found in the Ad Library codebook (with known issues here).

Researchers with successful proposals for these RFPs to Social Science One will obtain data access along with training and an online community to ask questions. If you seek funding, you may apply through the separate process described here.

What does the Future hold?

King and Persily are also working towards releasing the URL dataset to researchers in several stages, each of which are contingent on successful completion of reviews to ensure the system is privacy-preserving and of utility to researchers.

 

You can read more here.