Age ( n = 52,700 average, µ = 25.6 SD = 10), gender ( n = 57,505 62% female), relationship status (“single”/“in relationship” n = 46,027 49% single), political views (“Liberal”/“Conservative” n = 9,752 65% Liberal), religion (“Muslim”/“Christian” n = 18,833 90% Christian), and the Facebook social network information were obtained from users’ Facebook profiles. Intelligence ( n = 1,350) was measured using Raven’s Standard Progressive Matrices (SPM) ( 26), and SWL ( n = 2,340) was measured using the SWL Scale ( 27). We selected traits and attributes that reveal how accurate and potentially intrusive such a predictive analysis can be, including “sexual orientation,” “ethnic origin,” “political views,” “religion,” “personality,” “intelligence,” “satisfaction with life” (SWL), substance use (“alcohol,” “drugs,” “cigarettes”), “whether an individual’s parents stayed together until the individual was 21 y old,” and basic demographic attributes such as “age,” “gender,” “relationship status,” and “size and density of the friendship network.” Five Factor Model ( 9) personality scores ( n = 54,373) were established using the International Personality Item Pool (IPIP) questionnaire with 20 items ( 25). The design of the study is presented in Fig.
However, those other digital records are still available to numerous parties (e.g., governments, developers of Web browsers, search engines, or Facebook applications), and, hence, similar predictions are unlikely to be limited to the Facebook environment. In contrast to these other sources of information, Facebook Likes are unusual in that they are currently publicly available by default. For example, observing users’ Likes related to music provides similar information to observing records of songs listened to online, songs and artists searched for using a Web search engine, or subscriptions to related Twitter channels. Likes represent a very generic class of digital records, similar to Web search queries, Web browsing histories, and credit card purchases. The study is based on Facebook Likes, a mechanism used by Facebook users to express their positive association with (or “Like”) online content, such as photos, friends’ status updates, Facebook pages of products, sports, musicians, books, restaurants, or popular Web sites. This study demonstrates the degree to which relatively basic digital records of human behavior can be used to automatically and accurately estimate a wide range of personal attributes that people would typically assume to be private. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard personality test. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases.
The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender.