We are living in the age of big data. Everything we do, everywhere we go, every
website we visit, every book we read, every comment we make on social media,
every email we write, who we phone is being examined by people who wish to make
a profit from information about us. Everything
known about you is fed into analysis routines that try to characterize you and
predict how you are likely to behave as a consumer, as a voter, as an employee,
as a member of an insurance plan, even as a lover. This data is being used in ways we have no
control over and drawing conclusions about us that may be quite in error yet
still determining whether or not we are worthy of a job, a loan, or even a
prison sentence.
Sue Halpern provided valuable insight into this issue in
an article that appeared in The New York
Review of Books: They Have, Right Now, Another You. The accuracy
of the profiles obtained from big data techniques is of great interest to her
She tells us that Facebook accumulates 98 data points
that it uses to characterize an individual.
Some of these are self reported by the individual of interest, while
most are extracted via other means. For
example, if you provide Facebook with a photo of yourself, its facial
recognition software is good enough to pick you out of other peoples’
photographs. It can clearly mine
information from posts by you and those who it associates with you, but since they
wish to make money by selling you to vendors, they need to learn more than you
are likely to be willing to share.
“Facebook also follows users
across the Internet, disregarding their ‘do not track’ settings as it stalks
them. It knows every time a user visits a website that has a Facebook ‘like’
button, for example, which most websites do.”
“The company also buys personal
information from some of the five thousand data brokers worldwide, who collect
information from store loyalty cards, warranties, pharmacy records, pay stubs,
and some of the ten million public data sets available for harvest.
Municipalities also sell data—voter registrations and motor vehicle
information, for example, and death notices, foreclosure declarations, and
business registrations, to name a few. In theory, all these data points are
being collected by Facebook in order to tailor ads to sell us stuff we want,
but in fact they are being sold by Facebook to advertisers for the simple
reason that the company can make a lot of money doing so.”
Halpern managed to delve into Facebook’s assumed
knowledge about her and discovered that its information was often comically
wrong. This is what she learned:
“That I am interested in the categories of ‘farm, money, the Republican
Party, happiness, gummy candy, and flight attendants’ based on what Facebook
says I do on Facebook itself. Based on ads Facebook believes I’ve looked at
somewhere—anywhere—in my Internet travels, I’m also interested in magnetic
resonance imaging, The Cave of Forgotten Dreams, and thriller movies.
Facebook also believes I have liked Facebook pages devoted to Tyrannosaurus
rex, Puffy AmiYumi, cookie dough, and a wrestler named the Edge.”
“But I did not like any of those pages, as a quick scan of my ‘liked’ pages
would show. Until I did this research, I had never heard of the Edge or the
Japanese duo Puffy AmiYumi, and as someone with celiac disease, I am
constitutionally unable to like cookie dough.”
If there is
one thing Facebook should know about an individual it is her list of pages she
has actively liked. She then asks a
troubling question: Is Facebook possibly this inaccurate, or has it decided she
is more valuable as an asset if she is presented as a more marketable consumer.
“But maybe I am more valuable to
Facebook if I am presented as someone who likes Puffy AmiYumi, with its tens of
thousands of fans, rather than a local band called Dugway, which has less than
a thousand. But I will never know, since the composition of Facebook’s
algorithms, like Google’s and other tech companies’, is a closely guarded
secret.”
Halpern also presents results from an encounter with a
group of researchers at the Psychometrics Centre at Cambridge University. This outfit attempts to derive a personality
profile based on a person’s Facebook information.
“ [they] developed what they call a ‘predictor
engine,’ fueled by algorithms using a subset of a person’s Facebook ‘likes’
that ‘can forecast a range of variables that includes happiness, intelligence,
political orientation and more, as well as generate a big five personality
profile.’ (The big five are extroversion, agreeableness, openness,
conscientiousness, and neuroticism, and are used by, among others, employers to
assess job applicants. The acronym for these is OCEAN.) According to the
Cambridge researchers, ‘we always think beyond the mere clicks or Likes of an
individual to consider the subtle attributes that really drive their behavior.’
The researchers sell their services to businesses with the promise of enabling ‘instant
psychological assessment of your users based on their online behavior, so you
can offer real-time feedback and recommendations that set your brand apart’.”
Again, Halpern was presented results that were bizarrely
inaccurate.
“So here’s what their prediction engine came up with for me: that I am
probably male, though ‘liking’ The New York Review of Books page makes
me more ‘feminine’; that I am slightly more conservative than liberal—and this
despite my stated affection for Bernie Sanders on Facebook; that I am much more
contemplative than engaged with the outside world—and this though I have ‘liked’
a number of political and activist groups; and that, apparently, I am more
relaxed and laid back than 62 percent of the population. (Questionable.)”
“Here’s what else I found out about myself. Not only am I male, but ‘six
out of ten men with [my] likes are gay,’ which gives me ‘around an average
probability’ of being not just male, but a gay male. The likes that make me
appear ‘less gay’ are the product testing magazine Consumer Reports, the
tech blog Gizmodo, and another website called Lifehacker. The ones
that make me appear ‘more gay’ are The New York Times and the
environmental group 350.org. Meanwhile, the likes that make me ‘appear less
interested in politics’ are The New York Times and 350.org.”
“And there’s more. According to the algorithm of the Psychometrics Centre, ‘Your
likes suggest you are single and not in a relationship.’ Why? Because I’ve
liked the page for 350.org, an organization founded by the man with whom I’ve
been in a relationship for thirty years!”
These results can be amusing, but one must not forget
that these types of data interpretations are being used to determine peoples’
lives. Halpern wrote her article partly
as a review of the book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil, which she
correctly describes as “insightful and disturbing.”
O’Neil has coined the phrase Weapons of Math Destruction
(WMDs) to describe those algorithms that are either poorly constructed or
misused in such a way that they are capable of causing extreme harm. Algorithms and big data collection allow
millions of people to be quickly evaluated, correctly or not, potentially spreading
pain and suffering nationwide.
One of the presumed advantages of using mathematical
algorithms to characterize individuals is that the particular biases of an individual
are eliminated from the process.
However, the creation of the algorithm inevitably involves the biases of
the creators. While the errors in judgment
made by an individual can be recognized because they are there for others to
see, the false assumptions built into an algorithm are often hidden from view,
protected as “proprietary information.”
If one wishes to evaluate applicants for a job, one might
wish to evaluate an individual’s, honesty, conscientiousness, reliability, and
creativity. These are all quantities
that are nearly impossible to quantify.
But algorithms can only deal with things that are quantifiable, therefore,
they must select proxies for those attributes that can be converted to numbers
that may or not be directly relevant.
Credit scores are a popular proxy that is used in evaluations, but a low
credit score can be attained by a person with a poor sense of responsibility as
well as by a highly responsible person who has just had a run of bad luck. A personal evaluation could address those
differences, but there is no way for the algorithm to know. An employer who utilizes a mathematical
routine that eliminates persons with low credit scores will likely be satisfied
with the process and not realize that he is throwing away a number of
potentially excellent employees. There
is no feedback possible to evaluate this process. As long as employees are attained who allow
the employer to continue to make money he will be happy. Profit is the only scorecard—a crude one at
best.
Another common technique used by algorithms is to judge
people by who they digitally resemble. An
algorithm can create a collection of characteristics of a desirable person and
a collection of attributes of an undesirable person. An individual is then assumed to be similar
to the type of person he most looks like.
This approach essentially eliminates any attempt to evaluate a person by
their individual characteristics and renders them equal to their presumed look-alikes. There are plenty of biases that can be built
into this approach. While bias based on
race is technically illegal, there are many ways in which a person living in a poor
neighborhood with a high crime rate and where interactions with police are
frequent can be deemed undesirable even if he/she has an unblemished record.
O’Neil provides numerous examples of how this mass
production approach tends to be biased against poor people. Consider the case of the person who has been
rejected for a job because a high medical bill or unemployment caused him to
fall behind on payments and lowered his credit score.
“….the belief that bad credit correlates
with bad job performance leaves those with low scores less likely to find
work. Joblessness pushes them toward
poverty, which further worsens their scores, making it even harder for them to
land a job. It’s a downward spiral. And employers never learn how many good
employees they’ve missed out on by focusing on credit scores. In WMDs, many poisonous assumptions are
camouflaged by math and go largely untested and unquestioned.”
“This underscores another common
feature of WMDs. They tend to punish the
poor. This is, in part, because they are
engineered to evaluate large numbers of people.
They specialize in bulk, and they’re cheap. That’s part of their appeal. The wealthy, by contrast, often benefit from
personal input. A white-shoe law firm or
an exclusive prep school will lean far more on recommendations and face-to-face
interviews than will a fast-food chain or a cash-strapped urban school
district. The privileged, we’ll see time
and again, are processed more by people, the masses by machines.”
These algorithms that have become so prevalent are
usually formulated with the best of intentions.
In many cases they will work quite well—on the average. But what about the cases in which they make a
wrong decision? In these cases the
victims have no recourse. It is often
the case that both the user of the tool and the victim have no idea why he/she
was deselected. And if a person is
deselected in one area, they could find themselves deselected in other types of
applications as well.
“….scale is what turns WMDs from
local nuisances to tsunami forces, ones that define and delimit our lives. As we’ll see, the developing WMDs in human
resources, health, and banking, just to name a few, are quickly establishing
broad norms that exert upon us something very close to the power of law. If a bank’s model of a high risk borrower,
for example, is applied to you, the world will treat you as just that, a
deadbeat—even if you’re horribly misunderstood.
And when that model scales, as the credit model has, it affects your
whole life—whether you can get an apartment or a job or a car to get from one
to another.”
Inequality is enhanced in more subtle ways. Big data can be used to pinpoint vulnerable
people and take advantage of that vulnerability.
“We are ranked, categorized, and
scored in hundreds of models, on the basis of our revealed preferences and
patterns. This establishes a powerful
basis for legitimate ad campaigns, but it also fuels their predatory cousins:
ads that pinpoint people in great need and sell them false or overpriced
promises. They find inequality and feast
on it. The result is that they
perpetuate our existing social stratification, with all of its injustices.”
It has been demonstrated that degrees from for-profit
colleges are of little value to students.
They are much more expensive than equivalent education from a community
college and less highly valued by employers—in fact, little better than a high
school education. These schools make
nearly all their money from government-guaranteed loans. Whether students succeed or fail has little
to do with their business plan.
“In education, they promise what’s
usually a false road to prosperity, while also calculating how to maximize the
dollars they draw from each prospect. Their
operations cause immense and nefarious feedback loops and leave their customers
buried under mountains of debt.”
“Vatterott College, a
career-training institute, is a particularly nasty example. A 2012 Senate committee report on for-profit
colleges described Vatterott’s recruiting manual, which sounds diabolical. It directs recruiters to target ‘Welfare Mom
w/Kids. Pregnant Ladies. Recent Divorce. Low Self-Esteem. Low Income Jobs. Experienced a Recent Death. Physically/Mentally Abused. Recent Incarceration. Drug Rehabilitation. Dead-End Jobs—No Future.’”
The word is gradually getting out about these
organizations and a few of them have been forced to close.
If one is concerned about the uses to which our personal
data is being put, what can one do about it?
One possibility suggested by O’Neil is to produce legislation similar to
that in place in Europe.
“If we want to bring out the big
guns, we might consider moving toward the European model, which stipulates that
any data collected must be approved by the user as an opt-in. It also prohibits the reuse of data for other
purposes. The opt-in condition is all
too often bypassed by having a user click on an inscrutable legal box. But the ‘not reusable’ clause is very strong:
it makes it illegal to sell user data.
This keeps it from the data brokers whose dossiers feed toxic e-scores
and microtargeting campaigns.”
O’Neil points out that many WMDs could be turned into
beneficial tools. The knowledge that
allows people to be deemed vulnerable and targeted for harmful advertizing
could be used instead to identify people who are in need of social
assistance. One might even consider
providing that social assistance. What a
concept!
No comments:
Post a Comment