Sunday, January 8, 2017

Big Data: Its Role in Increasing Inequality

We are living in the age of big data.  Everything we do, everywhere we go, every website we visit, every book we read, every comment we make on social media, every email we write, who we phone is being examined by people who wish to make a profit from information about us.  Everything known about you is fed into analysis routines that try to characterize you and predict how you are likely to behave as a consumer, as a voter, as an employee, as a member of an insurance plan, even as a lover.  This data is being used in ways we have no control over and drawing conclusions about us that may be quite in error yet still determining whether or not we are worthy of a job, a loan, or even a prison sentence.

Sue Halpern provided valuable insight into this issue in an article that appeared in The New York Review of Books: They Have, Right Now, Another You.  The accuracy of the profiles obtained from big data techniques is of great interest to her

She tells us that Facebook accumulates 98 data points that it uses to characterize an individual.  Some of these are self reported by the individual of interest, while most are extracted via other means.  For example, if you provide Facebook with a photo of yourself, its facial recognition software is good enough to pick you out of other peoples’ photographs.  It can clearly mine information from posts by you and those who it associates with you, but since they wish to make money by selling you to vendors, they need to learn more than you are likely to be willing to share.

“Facebook also follows users across the Internet, disregarding their ‘do not track’ settings as it stalks them. It knows every time a user visits a website that has a Facebook ‘like’ button, for example, which most websites do.”

“The company also buys personal information from some of the five thousand data brokers worldwide, who collect information from store loyalty cards, warranties, pharmacy records, pay stubs, and some of the ten million public data sets available for harvest. Municipalities also sell data—voter registrations and motor vehicle information, for example, and death notices, foreclosure declarations, and business registrations, to name a few. In theory, all these data points are being collected by Facebook in order to tailor ads to sell us stuff we want, but in fact they are being sold by Facebook to advertisers for the simple reason that the company can make a lot of money doing so.”

Halpern managed to delve into Facebook’s assumed knowledge about her and discovered that its information was often comically wrong.  This is what she learned:

“That I am interested in the categories of ‘farm, money, the Republican Party, happiness, gummy candy, and flight attendants’ based on what Facebook says I do on Facebook itself. Based on ads Facebook believes I’ve looked at somewhere—anywhere—in my Internet travels, I’m also interested in magnetic resonance imaging, The Cave of Forgotten Dreams, and thriller movies. Facebook also believes I have liked Facebook pages devoted to Tyrannosaurus rex, Puffy AmiYumi, cookie dough, and a wrestler named the Edge.”

“But I did not like any of those pages, as a quick scan of my ‘liked’ pages would show. Until I did this research, I had never heard of the Edge or the Japanese duo Puffy AmiYumi, and as someone with celiac disease, I am constitutionally unable to like cookie dough.”

If there is one thing Facebook should know about an individual it is her list of pages she has actively liked.  She then asks a troubling question: Is Facebook possibly this inaccurate, or has it decided she is more valuable as an asset if she is presented as a more marketable consumer.

“But maybe I am more valuable to Facebook if I am presented as someone who likes Puffy AmiYumi, with its tens of thousands of fans, rather than a local band called Dugway, which has less than a thousand. But I will never know, since the composition of Facebook’s algorithms, like Google’s and other tech companies’, is a closely guarded secret.”

Halpern also presents results from an encounter with a group of researchers at the Psychometrics Centre at Cambridge University.  This outfit attempts to derive a personality profile based on a person’s Facebook information.

“    [they] developed what they call a ‘predictor engine,’ fueled by algorithms using a subset of a person’s Facebook ‘likes’ that ‘can forecast a range of variables that includes happiness, intelligence, political orientation and more, as well as generate a big five personality profile.’ (The big five are extroversion, agreeableness, openness, conscientiousness, and neuroticism, and are used by, among others, employers to assess job applicants. The acronym for these is OCEAN.) According to the Cambridge researchers, ‘we always think beyond the mere clicks or Likes of an individual to consider the subtle attributes that really drive their behavior.’ The researchers sell their services to businesses with the promise of enabling ‘instant psychological assessment of your users based on their online behavior, so you can offer real-time feedback and recommendations that set your brand apart’.”

Again, Halpern was presented results that were bizarrely inaccurate.

“So here’s what their prediction engine came up with for me: that I am probably male, though ‘liking’ The New York Review of Books page makes me more ‘feminine’; that I am slightly more conservative than liberal—and this despite my stated affection for Bernie Sanders on Facebook; that I am much more contemplative than engaged with the outside world—and this though I have ‘liked’ a number of political and activist groups; and that, apparently, I am more relaxed and laid back than 62 percent of the population. (Questionable.)”

“Here’s what else I found out about myself. Not only am I male, but ‘six out of ten men with [my] likes are gay,’ which gives me ‘around an average probability’ of being not just male, but a gay male. The likes that make me appear ‘less gay’ are the product testing magazine Consumer Reports, the tech blog Gizmodo, and another website called Lifehacker. The ones that make me appear ‘more gay’ are The New York Times and the environmental group 350.org. Meanwhile, the likes that make me ‘appear less interested in politics’ are The New York Times and 350.org.”


“And there’s more. According to the algorithm of the Psychometrics Centre, ‘Your likes suggest you are single and not in a relationship.’ Why? Because I’ve liked the page for 350.org, an organization founded by the man with whom I’ve been in a relationship for thirty years!”

These results can be amusing, but one must not forget that these types of data interpretations are being used to determine peoples’ lives.  Halpern wrote her article partly as a review of the book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil, which she correctly describes as “insightful and disturbing.”

O’Neil has coined the phrase Weapons of Math Destruction (WMDs) to describe those algorithms that are either poorly constructed or misused in such a way that they are capable of causing extreme harm.  Algorithms and big data collection allow millions of people to be quickly evaluated, correctly or not, potentially spreading pain and suffering nationwide.

One of the presumed advantages of using mathematical algorithms to characterize individuals is that the particular biases of an individual are eliminated from the process.  However, the creation of the algorithm inevitably involves the biases of the creators.  While the errors in judgment made by an individual can be recognized because they are there for others to see, the false assumptions built into an algorithm are often hidden from view, protected as “proprietary information.”

If one wishes to evaluate applicants for a job, one might wish to evaluate an individual’s, honesty, conscientiousness, reliability, and creativity.  These are all quantities that are nearly impossible to quantify.  But algorithms can only deal with things that are quantifiable, therefore, they must select proxies for those attributes that can be converted to numbers that may or not be directly relevant.  Credit scores are a popular proxy that is used in evaluations, but a low credit score can be attained by a person with a poor sense of responsibility as well as by a highly responsible person who has just had a run of bad luck.  A personal evaluation could address those differences, but there is no way for the algorithm to know.  An employer who utilizes a mathematical routine that eliminates persons with low credit scores will likely be satisfied with the process and not realize that he is throwing away a number of potentially excellent employees.  There is no feedback possible to evaluate this process.  As long as employees are attained who allow the employer to continue to make money he will be happy.  Profit is the only scorecard—a crude one at best.

Another common technique used by algorithms is to judge people by who they digitally resemble.  An algorithm can create a collection of characteristics of a desirable person and a collection of attributes of an undesirable person.  An individual is then assumed to be similar to the type of person he most looks like.  This approach essentially eliminates any attempt to evaluate a person by their individual characteristics and renders them equal to their presumed look-alikes.  There are plenty of biases that can be built into this approach.  While bias based on race is technically illegal, there are many ways in which a person living in a poor neighborhood with a high crime rate and where interactions with police are frequent can be deemed undesirable even if he/she has an unblemished record.

O’Neil provides numerous examples of how this mass production approach tends to be biased against poor people.  Consider the case of the person who has been rejected for a job because a high medical bill or unemployment caused him to fall behind on payments and lowered his credit score.

“….the belief that bad credit correlates with bad job performance leaves those with low scores less likely to find work.  Joblessness pushes them toward poverty, which further worsens their scores, making it even harder for them to land a job.  It’s a downward spiral.  And employers never learn how many good employees they’ve missed out on by focusing on credit scores.  In WMDs, many poisonous assumptions are camouflaged by math and go largely untested and unquestioned.”

“This underscores another common feature of WMDs.  They tend to punish the poor.  This is, in part, because they are engineered to evaluate large numbers of people.  They specialize in bulk, and they’re cheap.  That’s part of their appeal.  The wealthy, by contrast, often benefit from personal input.  A white-shoe law firm or an exclusive prep school will lean far more on recommendations and face-to-face interviews than will a fast-food chain or a cash-strapped urban school district.  The privileged, we’ll see time and again, are processed more by people, the masses by machines.”

These algorithms that have become so prevalent are usually formulated with the best of intentions.  In many cases they will work quite well—on the average.  But what about the cases in which they make a wrong decision?  In these cases the victims have no recourse.  It is often the case that both the user of the tool and the victim have no idea why he/she was deselected.  And if a person is deselected in one area, they could find themselves deselected in other types of applications as well.

“….scale is what turns WMDs from local nuisances to tsunami forces, ones that define and delimit our lives.  As we’ll see, the developing WMDs in human resources, health, and banking, just to name a few, are quickly establishing broad norms that exert upon us something very close to the power of law.  If a bank’s model of a high risk borrower, for example, is applied to you, the world will treat you as just that, a deadbeat—even if you’re horribly misunderstood.  And when that model scales, as the credit model has, it affects your whole life—whether you can get an apartment or a job or a car to get from one to another.”


Inequality is enhanced in more subtle ways.  Big data can be used to pinpoint vulnerable people and take advantage of that vulnerability.

“We are ranked, categorized, and scored in hundreds of models, on the basis of our revealed preferences and patterns.  This establishes a powerful basis for legitimate ad campaigns, but it also fuels their predatory cousins: ads that pinpoint people in great need and sell them false or overpriced promises.  They find inequality and feast on it.  The result is that they perpetuate our existing social stratification, with all of its injustices.”

It has been demonstrated that degrees from for-profit colleges are of little value to students.  They are much more expensive than equivalent education from a community college and less highly valued by employers—in fact, little better than a high school education.  These schools make nearly all their money from government-guaranteed loans.  Whether students succeed or fail has little to do with their business plan.

“In education, they promise what’s usually a false road to prosperity, while also calculating how to maximize the dollars they draw from each prospect.  Their operations cause immense and nefarious feedback loops and leave their customers buried under mountains of debt.”

“Vatterott College, a career-training institute, is a particularly nasty example.  A 2012 Senate committee report on for-profit colleges described Vatterott’s recruiting manual, which sounds diabolical.  It directs recruiters to target ‘Welfare Mom w/Kids.  Pregnant Ladies.  Recent Divorce.  Low Self-Esteem. Low Income Jobs.  Experienced a Recent Death.  Physically/Mentally Abused.  Recent Incarceration.  Drug Rehabilitation.  Dead-End Jobs—No Future.’”

The word is gradually getting out about these organizations and a few of them have been forced to close. 

If one is concerned about the uses to which our personal data is being put, what can one do about it?  One possibility suggested by O’Neil is to produce legislation similar to that in place in Europe.

“If we want to bring out the big guns, we might consider moving toward the European model, which stipulates that any data collected must be approved by the user as an opt-in.  It also prohibits the reuse of data for other purposes.  The opt-in condition is all too often bypassed by having a user click on an inscrutable legal box.  But the ‘not reusable’ clause is very strong: it makes it illegal to sell user data.  This keeps it from the data brokers whose dossiers feed toxic e-scores and microtargeting campaigns.”

O’Neil points out that many WMDs could be turned into beneficial tools.  The knowledge that allows people to be deemed vulnerable and targeted for harmful advertizing could be used instead to identify people who are in need of social assistance.  One might even consider providing that social assistance.  What a concept!


No comments:

Post a Comment