Friday, 29 December 2017

Assorted Thoughts #1: Applying Averages

This series contains some of the key books, studies, papers and events that I’ve read over the last year. I’ve decided to try to write one of these posts for two reasons. First, because as I finish books, I want a record of some the key insights from them both because it’s useful to refer to and because it might be interesting for other people to read. Second, because it provides a nice little glimpse into the topics that keep my occupied in a particular year and allows for me to track how my thoughts on them develop. In an attempt to publish more and shorter posts, each topic will be a separate post.

#ApplyingAverages Long Live Cohen’s D

This topic, more than any other, has occupied my mind over the course of 2017. It’s a huge shame that it’s not a topic of discussion in public discourse and I think it’s hugely important to have mainstream voices take it on. The issue flared up during the public fallout from the “Google Memo” – but it was an issue I was thinking about in the context of immigration last year. I now feel comfortable that my thoughts on this have fully formed and set them out in detail here.

Consider the following findings:

An analysis of the available FBI data by Vox's Dara Lind found that US police kill black people at disproportionate rates: Black people accounted for 31 percent of police killing victims in 2012, even though they made up just 13 percent of the US population (Vox, 2017).

--

As shown [reproductive women in the US who had] ever [had an] abortion, sterilization, and methods of contraception increase the likelihood of divorce compared to ever married women who have never used these methods of family planning from one to two times the risk of divorce, having an abortion in the past twelve months did not meet statistical significance (Fehring (2015))

--

The physiological differences between the sexes disadvantage women in strength-based and aerobic fitness tests by 20 to 40%; so for the same output women have to work harder than men. Despite the differences, there will be some women, amongst the physical elite who will achieve the entry tests for GCC roles. But these women will be more susceptible to acute short term injury than men: in the Army’s current predominantly single sex initial military training, women have a twofold higher risk of musculoskeletal (MSK) injury. The roles that require individuals to carry weight for prolonged periods are likely to be the most damaging... On recent operations women experienced a 15 to 20% higher rate of Disease Non Battle Injury (DNBI) ("Women in Ground Close Combat Review Paper", Ministry of Defence Review, December 2014).

---

Young black people are nine times more likely to be locked up in England and Wales than their white peers, according to Ministry of Justice analysis picked up by Lammy. The BAME proportion of youth prisoners rose from 25% in 2006 to 41% last year (The Guardian on the Lammy Review)

--

66% of global terrorism [in 2014 was] attributable to just four groups: Islamic State (Isis) in Iraq and Syria, Boko Haram in Nigeria, the Taliban in Afghanistan and al-Qaida (The Guardian on the Global Terrorism Index 2014)

--

Rates of estimated diagnoses of HIV infection, rates of people living with HIV, and rates of P&S syphilis were higher among MSM than among other men or women.. The rate ratios indicate disparities between MSM and other men and MSM and women. Comparing MSM to other men, the estimated rate of diagnoses of HIV infection in 2008 was 59 to 75 times as high, the estimated rate of MSM living with a diagnosis of HIV infection was 38 to 48 times as high (Table ​55), and the P&S syphilis diagnosis rate was 63 to 79 times as high (Purcell et al (2012))

--

84% of ‘grooming gang’ offenders were (South) Asian, while they only make up 7% of total UK population and that the majority of these offenders are of Pakistani origin with Muslim heritage (Qulliam (2017)).

The point here is not whether any of these findings are true, but how we should act if they were. Could it be said that, for example, on the basis that women on average “experienced a 15 to 20% higher rate of Disease Non Battle Injury (DNBI)”, they should not be allowed in to serve in direct combat roles? Should the different infection rates among homosexuals vs  heterosexuals mean they shouldn’t be able to donate blood, or, as Heritage stated back in 1994, the “risk of AIDS is itself sufficient reason to deny gays the privilege of serving in the U.S. military”? Or that given we see average differences between races in the crime stats that the criminal justice system is prejudiced? Or that given that Muslim groups are responsible for a majority of global terrorism that we should have a ban on Muslims entering the country?

You can clearly see why I think this area requires mainstream voices actively talking about the issue. These are the implications that reactionaries and nativists come to. Without an intellectually (and, more particularly) statistically response to these policies, a raft of illiberal policies or policies that can cast doubt on our institutions can be justified. And without actually engaging with the statistics or the arguments, we are potentially left vulnerable to believing dead dogmas, or worse, having our institutions affected.

The answer to the crude nativist applying averages, I believe, is that we have to acknowledge average differences and emphasise that we are acknowledging the average not the whole. Sam wrote about how and why it was important to do this:

Think about the claim that “Women generally have a stronger interest in people rather than things, relative to men.”… For a given man and a given woman who seem similar in other respects, this claim sounds like it’s saying that each man will be more “interested in things” than each woman.

But it doesn’t! This is what my friend “the Anonymous Mugwump” refers to as a difficulty in “applying averages”. We tend to take claims about groups as claims about each individual within those groups. That means that claims about men or women being more inclined towards one thing than another sound both obviously false and easily rejected, and very insulting.

For every one hundred people, there might be ten men and nine women who are equally “interested in things”. This would mean that on average men are substantially more likely to be “interested in things” than women, but that as individuals none of those women are any less interested than those men and that there are millions of women who are just as “interested in things” as any man is. Applying the average to the individuals would be a very silly (if easily made) mistake.

Sam goes onto say:

Group differences, in other words, tell us next to nothing about the traits of a given man or woman within those groups — there are lots of neurotic men and lots of genius women. Very simple information about any individual is going to tell you much much more than whatever the distribution of attributes is for a group they happen to be part of.

I want to elaborate on Sam’s points here. Ben gave a very good talk at the Adam Smith Institute’s Forum in 2016 about how granular information can help us deal with statistical discrimination. For example, people may use average differences in conviction rates between races as information for hiring practices when they don’t have further information. The way to deal with this is to provide information about the individual: did they actually go to prison? Do they have the qualifications? When granular, individuated information is used, it makes using crude averages redundant. Take the example that Ben always uses from Air BnB reviews, as per Cui et al (2017):

We find that when guest accounts have no review, the average acceptance rate of White guests is 48%, and the average acceptance rate of African American guests is 29%. In other words, guests with White names are accepted 19 percentage points more often than those with African American names (p-value =0:0002)… When there is one positive review, the acceptance rate of White guests is 56%, and the acceptance rate of African American guests is 58%. Note that, irrespective of guests' race, the acceptance rate rises when a guest's quality is validated by a positive review. In this case, the acceptance rates between White and African American guests are statistically indistinguishable (p-value =0:8774).That is, discrimination is eliminated in the presence of one positive review.

And it’s not just new information which can be identified and utilised but also that technology can develop so that we can. The UK government justified its ban on gay men donating blood on the basis of the above mentioned discrepancy between straight and gay men. This policy was rightly reversed following a determination that “new testing systems were accurate and donors were good at complying with the rules.” Technology and free markets, operating not with "optimal" but more and more information, are key to fighting the use of crude averages.

There is a further reason why applying averages in a crude way is bad: it is encourages intellectually laziness and denies the complexity of the world around us. It purports to end the debate with the use of average statistics but issues and policy conclusions are almost always more complex. It is simple to say, for example, that as a result of the disparity in those subject to Stop and Search powers that the police are racist. But what if there are other considerations which do not rely on maligning an entire police force? And, for your information, there is suggestive evidence this is true, as per Miller (2000):

The research shows resident population measures are very different from populations actually available to be stopped and searched. Specifically, the research suggests that available populations tend to include larger proportions of people from minority ethnic backgrounds than resident populations. Furthermore, when statistics on stops and searches are compared with available populations, they do not show any general pattern of bias against those from minority ethnic backgrounds, although there are some specific exceptions… This suggest that stops and searches are generally targeted at areas where there are crime problems.

The same could be said about police murdering African Americans. It’s so easy to refer average differences between shooting White and Black Americans and conclude that the police in the U.S must be racist and gunning down black people – but what if is something else? I think there is tentative evidence to suggest that its related to violence used in particular areas, rather than racist motivations (Klinger et al (2015)), the rates do not appear to be disproportionate to the levels of average crime (MacDonald (2016/7)) and experimental evidence suggests that  police officers are more delayed in using force against Black people because of a fear of social reprisal (James et al (2016)).

These studies are not conclusive and there are conflicting ones, but the point is that the debate cannot end with the use of crude averages. Applying averages in such a ham fisted way (e.g. ban Muslims because of average differences in committing terror) stops us from properly considering controls and mitigation measures. I want to explore two issues in a bit more detail to tease out some nuances.

First, consider Donald Trump’s “Muslim ban” on 7 Muslim majority nations. The argument is easy to understand: Muslims are committing terrorism, so let’s stop Muslim immigration. To avoid being intellectually barren, we need to provide a response that does not deny the (global) disproportionality of self-described Muslims being involved in terror activities. Here are some arguments – not fully made - that I think weigh in favour of not having a blanket ban:

1. Refugees are already extensively vetted, so there’s no reason to ban a whole group of people in the first place (see a form of this argument from Natasha Hall in the Washington Post).

Response: Yes, but what about the desecendants of those refugees or migrants who come here and then commit an act of terror?

2. Terrorism is incredibly rare and, in particular, the terrorism which comes from refugees is extremely rare, as per Nowrasteh (2016):

Including those murdered in the terrorist attacks of September 11, 2001 (9/11), the chance of an American perishing in a terrorist attack on U.S. soil that was committed by a foreigner over the 41-year period studied here is 1 in 3.6 million per year. The hazard posed by foreigners who entered on different visa categories varies considerably. For instance, the chance of an American being murdered in a terrorist attack caused by a refugee is 1 in 3.64 billion per year while the chance of being murdered in an attack committed by an illegal immigrant is an astronomical 1 in 10.9 billion per year.

Response: even though its rare, terrorism poses a unique quasi-existential threat to Western civilisation, see this post of mine for an elaboration of this argument.

Response to the response: yes, but even considering that threat, the utility gained from providing new homes to refugees or immigrants outweighs this downside. Evidence points:

(a) There are major economic benefits to migrants who come here accompanies with marginal or neutral outcomes on the economic outcomes of natives (see The Empirics of The Places We Go).

(b) The amount of Muslims who are terrorists globally represents 0.0066% of the Muslim population (source), suggesting that the amount of people who can have utility gains is high. To be more particular, there a Muslim population in the UK of approximately 2.7million Muslims. There are, according to MI5, approximately 13 attacks stopped between 2013 and 2017. There are, further, 20,000 people who are considered to be at risk of being involved in terrorist activity, a significant minority of that 2.7million figure.

3. It is better that, if there are terrorists and criminals amongst migrants, that they be in a country with high state capacity and the rule of law where they are likely to be caught than their home countries where they are likely to carry out crimes with impunity. This has an empirical basis: 1 standard deviation increase in migration is associated with 1/3 of a standard deviation decrease in civil conflict in the origin state (see Preotu (2016)).

Response: But you’re increasing the crime rate that natives will be subject to! Further, or in the alternative, the purpose of the nation state is to protect its own populace before others!

Response to the response: I’m not sure that the nationalist philosophies justify anything more than deriving some sense of enrichment from your own culture. The interest that we derive from our own cultures seem incapable to me of justifying any civil or political obligations on others. And even if they did, I think they would be part of the equation in coming to a policy with all of the trade-offs including, as described above, the massive utility gains for the migrants themselves. 

In any event, if we look at the data from Germany suggests that we are not seeing levels of crime that outweigh the utility given to migrants themselves. In particular, Gehrsitz and Ungerer (2017) find that with respect to crime rates, they find “at best muted increases in criminal activity”. In particular, they found a whole one standard deviation increase in migrant flow is associated with about 95 additional crimes per 100,000 people. It’s also worth noting that this aligns with official data:

… the statistics compiled by the authorities also show that the probability is no higher among refugees than in the domestic population. According to police crime statistics, the number of criminal acts increased by about 4 percent in 2015 over the previous year. The increase was mainly attributable to a rise in asylum- and visa-related offences. If these offences are factored out of the equation, the number of criminal acts remained virtually constant, even though the number of people in the country had increased by hundreds of thousands (De Spigel (2017)).

I’m hoping the above back and forth – which I hope others and I will expand on in the future – shows why the use of crude statistics is particularly unthinking. The second example is the decision to allow women into front line combat roles in the UK. If you look at the one statistic quoted above in isolation, you see that there may be a case for not allowing women into direct combat roles. But the assessment is a perfect example in showing how policy considerations are more complex than members of the alt right who latch onto such statistics would have:

The review studied 21 factors that contribute to CE [combat effectiveness], of which physiology and team cohesion are the most relevant; these were considered under separate workstrands. The review assessed that one of the factors will be improved by the inclusion of women, seven are neutral or multi directional, eleven are likely to have a negative impact on CE and in two the impact was unknown.

The review goes onto note that with mitigation measures, these 11 negative factors drop to 3. The outstanding three negative measures are difficult to mitigate against and will be kept onto review. If we followed a crude way of applying averages, we would not have tried to identify these mitigation measures in the form of minor changes, increased resilience training etc. etc.
To summarise: we shouldn’t use the average as the whole, but nor should we deny the facts about average differences. The reason why we apply averages in such a crude way is because firstly, its not as illuminating as noting the degree of overlap between group differences; it leads to lazy thinking on both the left and right, ignoring that there are complex considerations and controls that mitigate against their conclusions.

But I want to be clear about something I am not saying. Am I saying that averages differences should never be used? No. I am saying that where we have individualised, granular information or we have the capability of obtaining it, we shouldn’t use average differences for the purposes of making policies that affect individuals. In addition, where using an average difference is completely out of line with the proportions involved, we should air on the side of not using the average: for example, the Green Team may be 50% more likely to kill, but if that amounts to a 2% chance that any member of the Green Team may kill, we cannot use a sledge hammer to crack the nut that is the killers of the Green Team.

And this brings me to the final point: why should we try to avoid using averages aside from the fact they may not illuminate as much as statistics about variations within groups and it makes people unthinking and lazy? I think one potential argument is that it reinforces a world view not based on treating people as individuals. Individualism matters: it is associated with good outcomes like a nations polity score (Gorodnichenko and Roland (2015) which in turn is related to economic outcomes (Kyriacou (2015)).

Methodological individualism is important because it tells us more about the world. What we mean by methodological individualism is that the unit of social and moral life for causal and moral purposes is the individual. An ideology or policy preference based on using averages as a unit is at odds with the correct way of scientific and historical analysis a la Karl Popper. He alludes to just as much in the Poverty of Historicism: “Unable to ascertain what is the minds of so many individuals, he [the historicist] must try to simplify his problems by eliminating individual differences” (p.90).  

When we, as a matter of policy, start treating averages as the basis for policy, rather than individuals (even when that is informed by an understanding of group differences), we end up undermining the moral and other status of individuals. I consider it to be no surprise that hard right individuals, nativists and extreme leftists utilise collectives and averages at the expense of individuals – it is the core of their ideology. It explains why so much of what they do is at odds with liberal principles of due process and the rule of law. 

I appreciate that some of the above appears to be abstract and may, to some, appear to be obvious. But I am always struck at how selective people’s approach to applying averages really is. I wouldn’t try to make a claim about how wide spread such inconsistencies are – there is no data – but I would ask people to apply, consistently, the need to for controls, the need to appreciate overlapping bell curves, the need to assess trade-offs, the need to avoid reinforcing a norm contrary to a fundamental Western norm: individualism. The next post is about the need to reinforce norms during the Trump administration.