Wednesday, December 29, 2010

Big Five Personality Traits, Outliers and Per Capita Income in the U.S.

I went back and played with the data from that survey of Big Five personality traits by state (the OCEAN traits) and the NYT has a neat gadget; data below from there. Per capita income data comes from the Census. Automatically you start thinking about correlations between the Big Five and other things like income.


- Big Five traits and per capita income.
- Interestingly the strongest correlation at the state level both in terms of goodness of fit and slope (strength of influence) in a linear model is an inverse one with conscientiousness. Want to make money? Be lazy! Who knows the causal relationship here if any, so I won't hand-wave. With an R^2 of 0.2646, the regression says the state's per capita rises $236.87 for every rank lower you go on the conscientiousness scale. You also get money for being disagreeable ($221.87 for every rank the state drops, R^2 = 0.2321), although you also get money for being open-minded ($182, R^2=0.1566). This is probably largely the Northeast talking, although it might be interesting to look at the correlation between education and openness.

- Combinations of Big Five traits by state. If you do scatterplots with each of the Big Five characteristics against the others, several states appear repeatedly as outliers at the "corners" of the scatter - DC is one and Alaska is another. The whole reason that the study which initially generated this data is interesting is because it shows that there actually is a geographical difference in personality types (whether that's because of memes or genes is another question entirely). But it stands to reason that if gene and/or culture flow can explain this, then we should see outliers at the geographical extremities. It also stands to reason that the outliers on the coasts won't necessarily be outliers in the same way. While the map of openness does bear a similarity to the Red/Blue presidential voting distribution, there is no other obvious correlation between other characteristics or states.

So I looked at the coastal states (those with a saltwater port) against the non-coastal states, excluding Hawaii and Alaska. If geographic extremes correlate with personality type "outliers", then the saltwater states (which are farther from each other) should also be more dissimilar to each other than the interior states. By my count there are 26 interior and 23 saltwater states, so all other things being equal, the saltwater states should look more similar to each other, since there are fewer of them.

Indeed, the standard deviations for 3 of the 5 big 5 are smaller for the interior. They're the same for the other 2. As far as the averages, the interior states are more extraverted, agreeable and conscientious than the coastal states. (Those first two are consistently a shock to me.) The noncoastal states are on average less neurotic and less open to new experiences and ideas.

- Combinations of Big Five traits by region. If you do scatterplots with combinations of Big Five scores by region, then the Northeast (DC to Maine) is frequently an outlier (conscientiousness vs neurotic, conscientiousness vs openness, consciousness vs extraversion; more on this in a bit). The contiguous Pacific states are a major outlier on agreeableness vs openness; otherwise they land near the other states in and west of the Rockies. The Frontier Strip lands with the Midwest and the Bad Stripe tends to sort with the South, except that it's much less open than other Southern states.

- Per capita income distribution and average, coastal vs. non-coastal. It's also worth pointing out that the per capita incomes of non-coastal states were a) more similar to each other than the coastal states' were and b) on average lower, by $5,128. This is not surprising, as land-locked countries also have lower per capitas than those with coastline. 2/3 of all Alpha and Beta cities are on saltwater (or a river delta leading into saltwater), so, par for the course.

The Dialect of the Bad Stripe

The Bad Stripe as I've marked it out before (most recent here) is in many ways a boundary or transition zone between North and South, for example in social networks and religion. It turns out that it's a separate dialect zone too. The map can't be embedded well so click through to this dialect map of North American English, and you'll see that the Bad Stripe largely overlaps with the non-Texas part of the Inland South zone.

In many other systems (ecology and social networks) being at a phase transition is good, i.e. tidepools, savannas near jungles, being the only person who speaks both languages of two adjacent and relatively wealthy populations, etc. If the repeatedly observed "boundariness" of the Bad Stripe is not a coincidence or a historical accident, it could be that either the principle is reverse here, or that some aspects of the Bad Stripe are caused by the other negative conditions that previously obtained.

Monday, December 27, 2010

Punished for Transparency

"The Wikileaks method punishes a nation -- or any human undertaking -- that falls short of absolute, total transparency, which is all human undertakings, but perversely rewards an absolute lack of transparency. Thus an iron-shut government doesn't have leaks to the site, but a mostly-open government does."

- Jaron Lanier, The Hazards of Nerd Supremacy in The Atlantic


Amen to that. Where are the leaks from China? From North Korea? Hint: nowhere, and not coming anytime soon either. Without even accusing Assange of a deliberate focus on one or the other imperfect liberal democracy, it's easy to see how they might draw relative benefit from this and future leaks.

Sunday, December 26, 2010

Why Is the Frontier Strip Where It Is

Summary: Climate continues to have a profound impact on the distribution and economic activities of humans. There is a rapid population density decrease in the Frontier Strip states. Moving westward, there is a sharp increase in population when the Pacific coast is reached. This is usually attributed to difficulty of farming in the country's interior as the 100 W meridian is approached. I show here that this is not consistent with current agricultural productiveness of Frontier Strip and Pacific states on a per person or per area basis. I also show that in the Frontier Strip, temperature, precipitation and latitude are poor predictors of agricultural output but strong predictors of population density. Population density and agricultural output do not predict each other.

Since the 100 W theory of population density drop-off appears falsified, other explanations must be sought. The appearance of easier transportation during the settlement of the Frontier Strip, as well as the depression are explored and discarded. Further research with better agricultural output data and higher resolution climate data may support the hypothesis that investments in agricultural capital came too early in the Frontier Strip to benefit from irrigation technology, and that modern transportation and U.S. population preferences of climate as well as the coastal location of large population centers with service economies combine to keep the Frontier Strip the low-population boundary of the U.S. interior.



PART 1. I've written enough about the Bad Stripe recently so I thought I'd move to another grouping of U.S. states: the Frontier Strip. The Frontier Strip is the north-south line of states including the Dakotas, Nebraska, Kansas, Oklahoma and Texas, identified retrospectively in the 1890 census as having been the frontier in 1880. I've been interested in the idea ever since I visited Scottsbluff, Nebraska on a road trip two years ago. Like many places in the High Plains, Scottsbluff is proud of its pioneer heritage, although you have to ask how proud you want to be of being famous for being on the way to somewhere else. Inspection of a population density map of the U.S. shows that there is a dramatic drop-off as you move west across the Frontier Strip states, which never "recovers" until you hit the coastal cities of Washington, Oregon and California. The precipitation map is underneath it for comparison.


The geographers' conventional wisdom on this phenomenon involves the hundredth meridian west, which I've drawn in as a black stripe. It does seem to track just to the west of the density drop-off. The argument is that precipitation drops off to the west of this area, so agriculture becomes less productive. In earlier times economies were more strongly dependent on agriculture; consequently, large cities would not tend to develop in these dry regions and, even if they did, could not be supplied with food. Therefore further settlement coming from the East passed these areas and continued to the (contiguous) Pacific coast states. This argument also assumes that the potential agricultural productivity of an area has been realized by settlement and farming, i.e. it is at steady-state, and North Dakota's population isn't about to explode due to settlement by new farmers.

I've always found this argument dubious. First of all, most of these assumptions are not addressed explicitly in discussions of the 100 W boundary. But the most direct thing to do would be to look at agricultural output and see how it stacks up by state. Because states differ in population and size, we should look at output per person and per area. This information comes from the 2010 Census. There is not an obvious unit of "agricultural output" (if we include tobacco, none of the Frontier Strip or Pacific states will do very well) so I used combined receipts for the four principal products that the Census tracks by state (beef cattle, dairy products, corn and broiler chickens). Yes, wheat is an obvious omission but I couldn't find data for it by state, so it's possible these numbers are off by virtue of neglecting important crops.

So, if we have cause to doubt the traditional 100 W explanation, the Frontier Strip states should do the same or better than the Pacific states in agricultural output, meaning that it's not such a farm productivity death sentence to be near 100 W. Here's how the Frontier states and Pacific states stack up when ordered by agricultural value produced per-person.

StateAg $/person
Nebraska 6,040
South Dakota 3,954
Kansas 2,846
North Dakota 2,091
Oklahoma  890
Oregon  590
Texas  451
Washington  286
Arizona  233
California 229


(I include Arizona because I'll use it later as an example.)

Before we get excited and declare 100 W theory dead, it should be noted that the above table probably isn't that interesting. We're looking at 2010 data, and even if California really is more productive than the Frontier states (as the traditional theory predicts), it might be masked in this table: states with large populations are likely to have higher proportions of people in other sectors, i.e. doing something other than agriculture. What really counts is the potential productivity of the land itself (owing to climate); that's the crux of the argument. So here's how they stack up when look at agricultural value produced per land area:

StateAg $/Area
Nebraska   55,405
Kansas  38,319
California  21,154
Oklahoma  18,765
Texas  16,698
South Dakota  16,364
Washington  11,128
Oregon   9,080
North Dakota   7,598
Arizona   5,059


This doesn't look good for the 100 W theory. Yes, California is productive per land area, but still, if you're in Nebraska or Kansas, why go west? You're better off in the High Plains! There are possible explanations for this, besides the limited four-product index I'm using: a) The farming business has changed a lot over the past century and a half, and what used to be family-owned farms are now gigantic industrial facilities. So what makes Nebraska so productive now might not matter to people economically today (if it's all going to one or two corporations) or it might be different from conditions a century ago. b) You could also make the argument that dollars produced per area is a weak proxy indicator, because the land isn't being used for valuable crops, because the value of crops has changed over time, or (least likely) the areas haven't reached steady state and are still being developed by settlers agriculturally.


Additional point: agricultural productivity cannot depend entirely on precipitation, i.e. on proximity to 100 W. Length of growing season has something to do with it as well, which as you might guess has a strong north-south trend. This map shows number of days below freezing:


That map is a little small so, if we can assume 1911 was typical, here's a larger map showing basically the same thing, the day of the last killing frost:



As it turns out, there is a north-south population density trend in the Frontier Strip. Looking further north to Canada, any semblance of a 100 W-population boundary disappears. In fact the population density along the country's southern border increases beyond 100 W, after having dropped off in Ontario well to the east. Again, black line is 100 W.


You might make several arguments to explain this: 1) The 100W boundary is in fact country-specific, due to different development policies, and/or 2) that as latitude increases, the strength of the association of precipitation becomes relatively weaker and latitude becomes stronger. (Who cares about number of days below freezing when you go from 30 to 50? Precipitation makes a bigger difference. But when you go from 120 to 140, latitude makes a bigger difference.) 3) The western part of Ontario isn't yet at steady state and its population will growin the future when settlers arrive to realize its agricultural potential, and the precipitation-driven east-west density gradient will assert itself.

Rather than speculate, I did linear regressions on the Frontier Strip and Frontier plus Pacific and Arizona, using latitude, rainfall (not including snow), days below freezing, average annual temperature, lowest average monthly low, highest average monthly high, and population density, and their correlation to agricultural dollars per area. Rather than take up space with a bunch of ugly scatter plots I made a table of R^2 values. This is exploratory so there's no fancy Bonferroni corrections going on, but in any event none are strong correlations.

 AllFrontier
Latitude0.00770.0039
Rainfall0.02050.1513
Frz Days0.09860.0014
Ave Temp0.02160.0003
Lowest Mnth Lo0.11230.0004
Highest Mnth Hi0.00060
Pop Dens0.03580.0125


Since agricultural output is just a proxy for the original question we're concerned with (population density), I used that as the output and looked at the same factors. Wow. The R's are much stronger.

 AllFrontier
Latitude0.13550.9463
Rainfall0.04840.4844
Frz Days0.56280.9773
Ave Temp0.29180.9475
Lowest Mnth Lo0.60010.9577
Highest Mnth Hi0.01170.8587
Ag Output/Area0.03580.0125


First: agricultural output and population density do not seem to be strongly related. Furthermore if rainfall is the reason 100 W is significant, it's interesting that rainfall is a much weaker predictor than temperature. These are both problems for the traditional 100 W theory.

Second, for the "inputs" with R's above 0.9, here's what they mean in terms of population density.

- For every day of the year where the temperature drops below freezing, the population density is lower by 0.197 people per square kilometer.

- For every degree lower the lowest monthly low is, the population density is lower by 0.863.

- For every degree the average annual temperature drops, the population density drops by 1.33.

- For every degree of latitude gained, the population density drops by 2.05.

It seems strange that there could be such a strong connection between climate and population but not agricultural output, especially in states where a huge fraction of the economy depends on agriculture. That connection could very well be concealed here by the limited set of agricultural products I'm considering. Really, what I should be using is climate and agriculture output data at the county level, and the agriculture data should be the value of all commercial livestock and crops. But I leave that to someone with access to real data and real software. It would also be interesting to crunch these numbers for every decade from 1850, and see how the R's and slopes for each of these change over time if at all. Maybe the weak predictiveness between climate factors and agricultural output is real, but it used to be much stronger.


PART 2. Other Possible Explanations

Explanation #1. The population drop-off is a result of timing, i.e. improvements in transportation and communication. Couldn't it just be that just as the Frontier Strip was getting settled, trains and telegraph lines were built? Finally, you could find out whether there were jobs in San Francisco, and you could get there. (Assuming that independent of mineral extraction revenues*, loss of population is equivalent to loss of economic growth, this is the root of productivity paradoxes that exist between areas with free movement of people and goods.

If this were the case, you would expect to see a shift westward from the Frontier Strip at least by the 1880s. The Transcontinental railroad was completed in 1869 and it was connected to Los Angeles in 1876. In fact this is not what we see. First, here are the population densities of all Frontier Strip states and Pacific coast states, and then the average density of both groups.



After the 1920s, the Pacific states never looked back. Among the Frontier states, Texas is by far the strongest performer, owing probably to a longer growing season, early discovery of mineral wealth, and ports to make transportation cheaper (though I suspect immigration from Mexico is also a contributor there; Texas Hispanics are 35.9% of the population, compared to 6.1% in neighboring Oklahoma and comparable or lower northward.) Furthermore, you can see there was a local optimum in every Frontier state except Texas sometime in the teens. I don't know of any events that would explain this drop, nor if people were heading further west. Bottom line, the fact that the drop happened so late argues against the railroads having been the contributor, and the roads weren't yet developed enough that cars could have been the explanation.

Initially I was also skeptical because of Arizona. Arizona is clearly not a state blessed with rainfall, and yet it had (and still has) a strong citrus growing industry (thanks to irrigation), and its population density passed the Frontier Strip's in the 1980s and is still growing. It's worth asking why irrigation doesn't also cause a sudden increase in the relatively wetter Frontier Strip's population; climate and sunshine preference are likely to play a part, and we could make a guess based on sources of internal immigration for Arizona. This would suggest that in some ways, transportation and information about local conditions do make a difference, but the precondition is that the area can't have been settled prior to modern irrigation technology (because capital commitments are made in the agricultural infrastructure), and it has to have sunlight. I think that the real explanation will have something to do with the random impact of the historical timing of technological progress, if not transportation, then commitment to pre-existing agricultural infrastructure, shifting of economic importance to other sectors, and population preferences that make a greater difference in more recent history due to increased mobility.

Explanation #2. The Okie Effect. Can farm abandonment and consolidation during the Depression and Dustbowl explain it? See graphs above. These economic/agricultural events may have contributed but the trend was already underway at least a decade earlier.

*I give the caveat about mineral wealth because Texas and to a lesser extent Oklahoma have big oil sectors that impact their populations. In addition, most of North Dakota's counties are losing population and the state is just barely growing but recent oil discoveries have made the state profitable. Research suggestion: to what degree can sub-national entities be subject to the resource curse? What institutions or cultural features are protective against resource-curse type damage to economies?

Native-Born Californians Regain Majority Status

For the first time since the Gold Rush. LA Times story here. Interestingly during the 2000-2009 Census period, California LOST 1,509,708 people due to internal em/immigration. Only international immigration kept the population-due-to-em/immigration above water (by about 1% of the state's total population.) Many of these are Latin American, but a large number of high-skill well-to-do Asian immigrants are coming on 747s.

Suggested research*: relationship between proportion of ethnic populations who are legal immigrants or permanent legal residents vs. citizens, and per capita income in those populations; relationship between prevalence of international immigrants or incidence of international immigration in last decade and per capita income change. Also, distribution within California by country of origin of new international immigrants.

*"Suggested research" translates to "someone else do the work, I'm not a frickin demographics grad student over here".

Saturday, December 25, 2010

Let's Be More Like West Virginia

"Only 1.1% of the state's residents were foreign-born, placing West Virginia last among the 50 states in that statistic. It also has the lowest percentage of residents that speak a language other than English in the home (2.7%)." From Wikipedia.

"Let's be more like West Virginia!" I encourage the nativists of the U.S. to adopt that as their rallying cry. Catchy!

Note also that a) West Virginia is the eastern- and northern-most member state of the Bad Stripe (most recent post here), and b) never having had as strong an agricultural component to its economy, WV's population is 3.5% African-American vs. 20.5% for VA, which by most accounts has performed slightly better.

Unexpected Wisdom from Hayek

"The gullible do find agreement. Meanwhile, growing national confusion leads to protest meetings. The least educated - thrilled and conviced by fiery oratory, form a party."

- From the cartoon summary of The Road to Serfdom (at 2 minutes exactly).

Emphasis mine. Amazingly I've not yet seen Glenn Beck quote this.

Wujie: Top-Notch Internet Privacy Application

So it claims, so evaluate for yourself. Website here, in Mandarin.

Thursday, December 23, 2010

China's Development Policy in Africa

From Wikileaks, via Spiegel Online: "No matter whether it's war in Darfur, repression in Zimbabwe or corruption in Nigeria -- for the Chinese, it's not their problem. For example, instead of taking Zimbabwean dictator Robert Mugabe to task for his totalitarian policies and looting of his own country, they bestowed an honorary doctorate on him in 2005 and declared him 'China's No. 1 Friend.' Three years later, in 2008, they sent Mugabe the An Yue Jiang, a ship full of weapons and ammunition."

The Chinese government is at least internally consistent in their position on basic civil liberties: Memetjan Abdulla was just sentenced to life in prison for his role in the 2009 Uighur riots in Xinjiang. And what was his role, exactly? Reporting on the riots.

Sunday, December 19, 2010

The Bad Stripe and Sexual Curiosity


That's a heat map from okcupid showing the density of self-identified straight people who have had, or would like to have, a same-sex experience. Note the appearance of the Bad Stripe again: markedly less adventurous. Richard Florida has frequently shown positive correlations between economic growth, innovation as measured by number of patents, education and property values between gay-tolerant attitudes in cities and states, the assumption being that this reflects the post-scarcity values that promote innovation in a modern economy. (You can see the Bad Stripe jumping out at you again in this map, and others before it.) No surprise to most that West Virginia, Mississippi and Oklahoma are not the places to start your software company or make discoveries about yourself.