Monday, 2 November 2015

{Racist} Ghosts in the Machine?

Big data is a perfect representation of the difficulty in governing innovation. A complex web of technologies creates a seemingly endless chain of questions that require regulatory attention, if not answers. How should personal data be collected? What should be collected? How much consent should be required for the collection? How much of that consent should be based on knowing the end-use of the data? Do the companies who collect the data even understand how it is used? Do the people who wrote the algorithms which analyze it even know?


This last question has become particularly interesting and difficult to answer as machine learning’s ability ability to process big data removes the requirement of explicitly programming the decision of what to do with the information. Rather, only an objective function (which, in the case of the private sector, is typically profit) and a method programmed to optimize that function are required. This is referred to as a "black box": things go into it, things come out of it, but the transmutation itself is inscrutable.

Facebook made a few headlines when it patented an algorithm which could allow lenders, when looking at someone’s credit score, to also look at the scores of those in their friend network on the platform, subsequently bolstering—or lowering—it. This was not even particularly new technology – Facebook filed for the patent in 2012 but it went unnoticed until this year. Similarly, the Chicago Police Department created a list of 400 people who were considered high-risk for committing a violent crime from algorithms based on a Yale sociologist’s work. ProPublica found that SAT tutoring packages offered by the Princeton Review resulted in higher prices being charged to Asian markets based on geography variables entered into its pricing model. There are many other examples of how big data inadvertently discriminates on the basis of gender, race, socio-economic background. Some of the ways in which it manifests itself initially appear somewhat benign, such as 11% of the Google Image search results for “C.E.O” being a picture of woman, but which provides an unintentionally misleading portrait of the position when you consider that  27% of C.E.Os in the US are women.

Governments appear to be aware of the need for some oversight, particularly since many of these groups fall under legal protection in these situations. The Executive Office of the President of the United States put out a document entitled “Big Data: Seizing Opportunities, Preserving Values” in May of 2014. In it, the advisory committee tasked with examining the effects of big data on the American way of life wrote: “The increasing use of algorithms to make eligibility decisions must be carefully monitored for potential discriminatory outcomes for disadvantaged groups, even absent discriminatory intent” (pg 47). If we agree that algorithms which discriminate need some form of oversight, the next question, naturally, is how?

A group of computer scientists* focused on discriminatory algorithms have proposed one method to address them in the US. Their paper examines the algorithms from a legal framework using a theory of US anti-discrimination law called disparate impact. To summarize a highly complex piece of legal theory in the briefest way possible: disparate impact is used to guard against unintended discrimination against a protected class, such as race (protected classes are defined by statutes). Disparate impact causes a protected class to experience an adverse effect; again, this is not an intentional effect to harm a protected class but an indirect outcome of some policy or in the age of big data, an algorithm. (Additionally, in order to actually be illegal, the discrimination caused by disparate impact must not be a provable, necessary requirement in the context in which it occurs). ProPublica also has a good explanation of the legalities of disparate impact in the context of new technology.

The authors put forth a mathematical way to determine how well an algorithm can predict one of these protected classes (“protected attribute”) based on the data it uses (other “attributes”). They then introduce methods of transforming datasets so that the algorithm in question cannot predict the protected attribute, while maintaining the other data necessary for the algorithm to function to an acceptable degree of accuracy. These methods are then put into practice on real-life datasets from actual disparate impact cases. This has the potential to be a particularly effective approach because neither the test to determine whether an algorithm has the potential to cause disparate impact, nor the remedy to prevent it from doing so, relies on access to the algorithm itself (which tend to be proprietary and therefore difficult to obtain). Instead, it focuses on the dataset used by the algorithm. The authors note that this paper explores only numerical attributes, and thus other types of attributes (eg categorical) may prove more of a challenge. 

Discrimination by credit lenders, law enforcement and employers is not new, but big data has, intentionally or not, enabled new ways to obfuscate it, hidden in rows of code, and new rows of code may be the only way to catch it.

*The group has also compiled a list of further reading on the topic of algorithmic discrimination here:

Technology in the Driver's Seat

  Wearable technology, enabling the tracking and logging of human functionality from heart rate to distance ran has a vast array of implementations and implications. Certainly much has been made about the rise of self-tracking and the quantification and gamification of everyday activities and vitals towards some goal: better sleep, losing weight, improved sex life, etc. No longer constrained by high costs of data storage, the applications for the quantification and collection of human behaviour and biological functions seem endless. One of the most obvious and practical applications of wearable technology is in the workplace as an employee monitoring tool. This usually falls under two main streams of justification: health and safety (both the employee’s and others’), and productivity, and has been employed in the financial services sector, by healthcare providers, and in grocery warehouses, to name a few.

One example among many is the use of monitoring technology to assess the fatigue of drivers and heavy machinery operators in industries such as mining. While actual statistics for the percentage of accidents in industry caused by fatigue are difficult to estimate the perception of fatigue as a critical factor in workplace accidents has been found to be quite strong among workers.   Avoiding these types of accidents is ostensibly the objective of fatigue management systems, a mix of hardware and software designed to 1) predict the onset of fatigue before it occurs, 2) determine when fatigue has set in, and 3) be able to do something about it.  In their paper The challenges and opportunities of technological approaches to fatigue management, Balkin et al position technology is seen as an objective way of determining and even predicting driver fatigue: 
With such technologies, current work- place rules and regulations designed to afford workers ample time for sleep and recovery could instead be rewritten to require that operators maintain adequate levels of objectively monitored alertness/performance on the job—a change in emphasis that would more directly address the issues of ultimate concern. (pg 2)
A fatigue management system could combine a fitness-for-duty screening before a driver begins their shift, and a personalized predictive model which utilizes real-time driver measures from an online monitoring system as well as environmental factors (eg time of day) and specifics about the job itself.  Given the physiological elements involved in how an individual functions under fatigue, the current technology is far from perfect.  System developers are thus continuously trying to achieve the best balance between false positives and fatigue identification failure. Each element of the system has reliability issues as well as implementation concerns. For instance, stimulants could be used to pass a fit-for-duty test, electroencephalography (EEG, which measure brain wave activity) requires electrodes stuck on one’s head, and less intrusive systems of eye movement (oculomotor) monitoring tend to be less reliable. And while some of these detection systems are more accurate than others, they tend not to facilitate actual intervention in the case of fatigue (what an intervention should look like is another topic of much discussion).

In a 2008 report on operator fatigue detection technologies by the heavy equipment company CATERPILLAR,  22 technologies were assessed based on number of metrics; only 3 of the highest rated were actually commercially available at the time, however. Both fatigue research specialists and representatives from a large mining company were asked to assess the importance of each of the metrics, and the differences between what each group prioritized is a good argument for multidisciplinary assessment of the technology. Whereas the research specialists placed importance on how the fatigue detection technology worked, the mining experts only cared about what the technology was measuring and operator acceptance, including how intrusive actual implementation was, how easy the technology would be to manipulate, and how accepting miners and unions were.

Of course, even if we agreed that heart rate measurements were objective, and that the technology used to capture them accurately was available, the act of implementing that technology is far from objective. While initially employed as a safety mechanism, what guarantee is there against scope-creep—that is, using the technology to monitor and provide feedback on productivity? How then does the constant monitoring of vital signs and eye blinks actually affect the worker, their relationship with themselves, and their job?

Moore and Robinson provide some insight into this question by looking at "the quantified self"—originally coined in response to the increase in tracking technologies in our livesin the context of the workplace, a feature which they ascribe to " neoliberalism as an affective regime exposing a risk of assumed subordination of bodies to technologies" (pg 3) The authors posit that a sense of disposability permeates the modern workplace, true for those in both white-collar and blue-collar jobs. Marx’s theory of worker alienation resulting from assembly line production starts to look quite quaint compared to the monitoring and reduction of employee performance down to measures of basic biology. 

Another critical perspective is that these monitoring technologies take the place of trust which can be seen as stemming from respect. Monitoring is an admission of lack of trust, and so respect is lost too by extension. This then results in a continuous feedback loop, where further monitoring is then necessary to make up for the loss of trust, resulting in even less trust, and so on. This could also result in workers feeling less of a need for self-trust, or as Balkin et al put it: "Over-reliance would in such cases reflect an inflated trust in the reliability of the system relative to its actual reliability" (pg 570). On the other hand, a certain level of trust is needed for the system to be effective, and technology with a high rate of false positives could lead to workers simply ignoring its feedback (putting aside the fact that a fatigued driver may be more likely to overestimate their abilities and disagree with the system’s assessment anyway). Others are optimistic that buy-in for monitoring devices might be greater among the younger generations as they are already used to carrying around portable technologies. Although the inertia of the development of these technologies makes their implementation in almost every facet of life inevitable, it is worth examining how our current world has shaped our acceptance of them, and how they have shaped our world.

Monday, 9 December 2013

Update: Elsevier's Knowledge Lockdown

In Open Access, Accountability, and For-Profit Publishing I wrote about the publisher Elsevier and its continual efforts to restrict access to published research by locking it behind expensive subscriptions, without any explanation as to why access was so prohibitively expensive in the first place (besides making $$$). I mentioned that authors who publish in Elsevier's publications can choose to make their articles openly accessible to everyone, but that
Elsevier's revenue model currently dissuades researchers from sharing by charging authors who wish to make their work open access $3,000 per article (the actual amount varies depending on the journal - it's £400 per page in The Lancet and $5,000 for Cell Press titles).
However, it's been common practice among researchers to disregard this technicality and publish their papers on their own websites as well - free of charge. And Elsevier's legal team must be extra restless right now, because they served up a bunch of take-down notices, nicely summed up over at Sauropod Vertebra Picture of the Week:
Preventing people from making their own work available would be insane, and the publisher that did it would be committing a PR gaffe of huge proportions. Enter Elsevier, stage left. Bioinformatician Guy Leonard is just one of several people to have mentioned on Twitter this morning that took down their papers in response to a notice from Elsevier. explained its actions with the following notification, laying the responsibility squarely and fairly at the feet of Elsevier:

Hi Guy
Unfortunately, we had to remove your paper, Resolving the question of trypanosome monophyly: a comparative genomics approach using whole genome data sets with low taxon sampling, due to a take-down notice from Elsevier. is committed to enabling the transition to a world where there is open access to academic literature. Elsevier takes a different view, and is currently upping the ante in its opposition to academics sharing their own papers online.

Over the last year, more than 13,000 professors have signed a petition voicing displeasure at Elsevier’s business practices at If you have any comments or thoughts, we would be glad to hear them.
          The Team

So, there. The battle lines were contractually drawn a long time ago, and big publishing is simply entrenching itself further.

Big thanks to Mike Taylor and SV-POW for bringing this to our attention

Wednesday, 4 September 2013

Optimism Without Reserves

Peak Oil Is Dead, Long Live Peak Oil

Less than a month ago, geologist Euan Mearns wrote a piece called "Three Nails in the Coffin of Peak Oil". The article was posted on The Oil Drum, a website that has been a source of peak oil information and debate for nearly a decade, but is now being mothballed and indefinitely put to rest.

This is emblematic of the state of peak oil today. The idea that a peak in oil production rates is imminent seems to appear much less in the media now than during the oil price spike of 2005-2008, and the graph below suggests that public interest in it is waning as well.

When peak oil does appear in the media these days, it is often dismissed or outright ridiculed. A widely cited report from Harvard University put the ivory seal of academia on peak oil's death sentence. The June 2012 report, which can be read in full here, contends that global oil production will rise until 2020 at rates that have not been seen since the 1980s. Most of this growth is attributed to an increase in shale/tight oil production, especially in North America, where Montana and North Dakota could become "a big Persian Gulf producing country within the United States".

Taking a slightly different angle, The Economist recently argued that the world will soon figure out out how to reduce its oil dependency through a mix of fuel efficiency improvements and switching to newly abundant natural gas, meaning a peak in demand, rather than supply, is expected. There have even been calls to shut down the U.S's Strategic Petroleum Reserve, a massive oil storage facility put in place after the 1973-1974 embargo, implying that the kind of oil worries started in the '70s have been put to rest once and for all.

This lack of concern with oil supply constraints is generally regarded as good news, although taken from another perspective, the idea that we might manage to keep up (or even increase) the rate at which we're burning oil for decades to come is perhaps not particularly appealing. Some of those who reject the peak oil hypothesis aren't thrilled about it either. One of the most prominent names to have switched sides in the debate, journalist George Monbiot, wrote a piece shortly after the Harvard report came out called "We were wrong on peak oil. There's enough to fry us all". However, while the more severe effects of climate change are still decades away according to most predictions, a dip in the supply of oil could have immediate and far-reaching repercussions: about 95% of global transportation energy is petroleum-based, meaning pretty much everything you eat or use has been moved around using oil at some point in its life. Personally, with my parents living more than 7000 km away from me, any decrease in the availability of affordable transportation would have a significant impact on my life. As a species with a relatively-short lifespan, it's not hard to guess how far on the horizon our priorities are going to lie.

Of course, all this pre-supposes that the combination of science and speculation behind this production/demand optimism is accurate. Look for another post coming down the pipeline which critically examines what, exactly, is lending credence to these reports.

Thursday, 2 May 2013

Experimental Lakes Area Bailed Out, Still Living Paycheque to Paycheque

Temporary Respite from the War on Science

The Experimental Lakes Area (ELA) is a series of 58 lakes and a research facility in northwestern Ontario where scientists study the effects of pollutants and other stressors in a naturally-occurring ecosystem. In other words, it's a literal massive wet-lab. The world-class testing facility, unique to Canada, allows scientists to introduce chemicals into entire lakes monitor the effects in a natural ecosystem, rather than in a smaller artificial lab setting. Some of the globally recognized work to come out of the ELA includes the investigation of algae blooms which led to the ban of high-phosphate laundry detergents and phosphorous use at sewage treatment plants in the 1970s.
Last summer, however, the federal government announced it would no longer be funding the facility, effective March 31, 2013.

The reasons for the decision were never clearly articulated, the usual fall guy of budget cuts being touted, alongside claims that the ELA, which was under the operational umbrella of the Department of Fisheries and Oceans, no longer fell under the department's mandate. Since then there has been rampant speculation about the fate of the ELA, and at one point it was being sold to the UN's International Institute for Sustainable Development.

It would cost the federal government $2 million a year to continue funding the ELA, with an additional $600,000 in operation costs, but as MP for the area Greg Rickford concedes, this really isn't about the money:
“We do intend to withdraw our role in the ELA. The motivation for that is because the federal government must have flexibility to move certain types of research, including some that has gone at the ELA, to other parts of the country where there is the potential for monitoring new environmental factors that are more proximal to resource development in Western Canada. That research is continuing, it’s just moving to other areas of the country where it’s required.”
As it turns out, this bullshit response can be refuted by the very history of the Experimental Lakes Area itself. In the 1970s, the acid rain research conducted at the ELA was initially funded by the Government of Alberta's Oil Sands Research Program, with the intent of investigating the long-term impacts of developing the oil sands. They were  able to secure money for the first three years of the research from the program because, lo and behold, "their freshwater ecosystems had a lot of the same species that we did at ELA and their most sensitive lakes were like ours" (from a great interview with the renowned scientist David Schindler, former ELA director).

It seemed that efforts to save the area had failed, when last week the Government of Ontario announced that it would work with the Government of Manitoba, the federal government, and "other partners" to keep the ELA operational for the rest of 2013. It's still unclear what the long term plans are, but as of right now Ontario seems to willing to cover operating costs, at least for now. It's a temporary victory, but a relief for the scientists who currently have federal grant money to study the effects of nanosilver on lakes - $800,000 to be exact, which perhaps the federal government would have been fine writing off as a cost of war.

Conservatives Vote "No" to Science

The mounting frustration regarding the federal government's handling of the ELA, and science in general, reached its apex this March with the following vote put forward during Parliament, which is now in the midst of federal budget discussions. From Vote No. 631 on March 20, 2013:
That, in the opinion of the House: (a) public science, basic research and the free and open exchange of scientific information are essential to evidence-based policy-making; (b) federal government scientists must be enabled to discuss openly their findings with their colleagues and the public; and (c) the federal government should maintain support for its basic scientific capacity across Canada, including immediately extending funding, until a new operator is found, to the world-renowned Experimental Lakes Area Research Facility to pursue its unique research program.
The vote, sponsored by NDP MP Kennedy Stewart, is a particularly loaded piece of political savvy. By tacking on the call for the government to continue its support for the ELA, Stewart guaranteed its defeat at the hands of the 157 Conservatives in Parliament. In short, the Government of Canada voted that it is against science, a very catchy soundbite which fits perfectly into 140 characters and makes for a great screen cap. Of course, this is accomplished through a bit of circular logic not uncommon to political rhetoric. But in light of the policy pushed through by this government, the vote is not an inaccurate representation of the prevailing attitudes of the current government at all. The Unmuzzled Science has a brief takedown of how the Conservatives have opposed each point present in the motion.

Not Scientists, Just Doctors of Spin

The advocacy group Democracy Watch released a report in January of this year, alleging that the Government of Canada has been systematically limiting access to government information, specifically federal scientists from the departments of Environment, Fisheries and Oceans, and Natural Resources. Spin and the restriction of information is, of course, not unique to this government, but the unabashed, unapologetic way in which the Harper Government does so is downright insulting. Democracy Watch has subsequently filed a complaint along with the University of Victoria's Environmental Law Centre to the Information Commissioner of Canada, asking for an investigation into the government's obstruction of information. Until then, here's to hoping that public backlash will continue the stay of execution of further campaigns against science.

Tuesday, 23 April 2013

Betrayal in the Banana Republic

Or, The Short Version of Why Open Access Matters

In Open Access, Accountability, and For-Profit publishing, I wrote about how the lack of model transparency and access to data prevented researchers from replicating others' results, leading to an over-reliance on the peer-review process to vouch for a piece of research's validity. I also mentioned how peer-reviewers are not required to audit the veracity of a model or verify its results, and that this lack of oversight could result in inaccurate information influencing policymakers.

Now, researchers Thomas Herndon, Michael Ash, and Robert Pollin have called into question the conclusions of the economics paper "Growth in a Time of Debt" for exactly these reasons. Written in 2010 by Carmen Reinhart (University of Maryland) and Kenneth Rogoff (Harvard), the paper has played a not-insignificant role in the public discourse on economic policy during the current recession. As Mike Konczal of Next New Deal summarizes:
This has been one of the most cited stats in the public debate during the Great Recession. Paul Ryan's Path to Prosperity budget states their study "found conclusive empirical evidence that [debt] exceeding 90 percent of the economy has a significant negative effect on economic growth." The Washington Post editorial board takes it as an economic consensus view, stating that "debt-to-GDP could keep rising — and stick dangerously near the 90 percent mark that economists regard as a threat to sustainable economic growth."
Their findings also provided the basis of Reinhart's testimony before the Senate Budget Committee on Feb. 9, 2010, and have been used to support the argument for austerity in Europe and the United States.

Initially it would seem that access to data isn't an issue, as Reinhart and Rogoff provide the historical data they used along with their sources on their website. But publicly-available data is meaningless without context, and the authors don't provide any clarity on which dataseries and methodology they used. This is why model access is equally important to experiment repeatability and accountability. (There is some discussion on whether or not "Growth in a Time of Debt" was ever actually peer-reviewed, but as noted above, there's a good chance it would have passed review without anyone ever having to look at the model and datasets anyway.)

From Herndon, Ash and Pollin (pg 5):
We were unable to replicate the RR results from the publicly available country spreadsheet data although our initial results from the publicly available data closely resemble the results we ultimately present as correct. Reinhart and Rogoff kindly provided us with the working spreadsheet from the RR analysis. With the working spreadsheet, we were able to approximate losely the published RR results. While using RR's working spreadsheet, we identifi ed coding errors, selective exclusion of available data, and unconventional weighting of summary statistics.
Reinhart and Rogoff's work demonstrated a seemingly straight-forward, data-driven analysis linking causation from a country's public debt to GDP growth, which is great talking-head bait and support for austerity economics. But, as succinctly put by Herndon, Ash and Pollin, "A necessary condition for a stylized fact is accuracy".

Update: Now with video! 
Reinhart/Rogoff debacle reaches fever pitch with Steven Colbert going to town.

Friday, 22 March 2013

Greenwashing Blitz: Métro

Bref Vert, or Briefly Green

Cut down a bunch of trees, mill them into paper, paint it turquoise, put a bizarre picture of a monkey doing monkey business on the cover, say a few words about how cute animals are awesome and therefore they will save the planet, and there, you've got yourself a "Green Special". Never mind that one of the ways these animals are supposed to save the planet is through ecotourism, which involves rich people spending half a year's worth of SUV emissions on a flight to some tropical paradise to swim with the turtles or whatever.  Either the people who are editing this daily pastime for the short-of-attention are completely unaware of what global environmental challenges are, or they just decided to make a mockery of it by praising the virtues of "panda diplomacy"- an actual political term which has less to do with saving the planet and more with sending over furry animals in exchange for natural resource access.

For some reason, I feel like the former is the explanation, and the latter is the outcome.