Friday, May 24, 2013

Platforms can tax externalities and generate costly signals


One thing that's great about platforms is that socially efficient, signal-generating Pigovian taxation like the kind proposed in this tweet is not a joke---you can actually do things like this, which may be one of the great advantages of markets mediated by a powerful third party.

Tuesday, May 21, 2013

Country-Specific Minimum Wage Data, Courtesy of Wikipedia

I was looking for some data on minimum wages in various countries and found that Wikipedia (perhaps unsurprisingly) has a very nice, well-annotated table. After downloading the data & cleaning it a bit (harder than it should be), I made several plots. There were too many countries for one plot, so I made one for each (approximate) quartile. At the end of the blog post is the R code I used for fetching the data & making the plots.

Fourth Quartile 

NB: Some countries have exemption policies for worker or occupation characteristics, so for a more complete understanding, of say, why Australia appears to have a minimum wage more than 2x the US minimum wage, check the Wikipedia table. 




Third Quartile 




Second Quartile 





First Quartile 

Distribution of Minimum Wages



Below is a some R code for grabbing the table of country-specific minimum wages from Wikipedia.



Tuesday, July 31, 2012

The Indian blackouts & oDesk

A nationwide blackout in India has left some 600 million people without electricity. Given that a large number of the contractors on oDesk are from India, I assumed that effects of the blackout would show up readily in the oDesk data. This evening, I wrote a query to get the hours worked each day by Indian contractors during the last month and the number of applications sent. I divided these counts by the respective totals for that day for all of oDesk. From this time series, we can get a sense of what was supposed to happen today and compare it to what actually happened. The time series for applications (top) and hours worked (bottom) are plotted below [1], with today annotated in red. Each percentage estimate has a 95% confidence interval.  

Some observations
  • There is a very easy to detect drop-off in the hours worked---my eyeball calculation says they should have been responsible for around 22% of the hours worked today, while the actual number is closer 17.5%. This is far less of a fall-off than we would naively predict from the "1/2 of Indians without power" headline. Presumably many contractors have access to private generators, or perhaps oDesk is over-represented in parts of the country that were less affected by the blackout. 
  • There is no corresponding obvious drop-off in the fraction of applications. I don't have a good explanation for this, but perhaps non-affected Indian contractors have made up the difference and exploited the now-thinner market. If I can get some data on what parts of the country are actually being affected by the blackout, I could test this notion since I do have contractor locations down to the city level. 
  • Indian contractors take weekends off, both in terms of working and job finding (or at least more so than their oDesk counter-parts from other countries). Remember that this time series is the fraction for a given day, so there's no reason for a strong weekend/weekday pattern. See oDesk Country Explorer for more of this kind of data.   
  • Indian contractors are generally over-represented in the application pool, making up ~25% of applications but only about ~20% of hours worked, though this could easily reflect differences in the kinds of categories Indian contractors work in---there is a great deal of variance in the average number of applications per opening across the different job categories. 
Code for the plots (done in ggplot2):





Wednesday, July 25, 2012

Digitization of the supply side of the labor market

Note: This blog post also contains a short review of Google's new Consumer Surveys service. See the end of the blog post for details. 

On most electronic commerce sites, information about the supply side is digitized and publicly available while information about the demand side is generally not: Amazon, Expedia, iTunes, Etsy etc., all collect and display detailed data about the items for sale, but there is generally little or no information about the consumers with the demands. If we look at the labor market, the reverse us true, in that it is the demand side that's digitized. On online job boards like CareerBuilder, Monster.com, Indeed, SimplyHired etc., vacancies are described via detailed textual descriptions about the nature of the work, skills required, location and approximate salary, but the job seekers---the sellers---generally do not create profiles that describe themselves to the marketplace. 


While we might think that there are some fundamental reason for this difference, I don't think this is the case for the simple reason that in the case of labor markets, the supply side is being digitized, primarily though LinkedIn (in a big way) and through sites like oDesk (in a comparatively smaller, but more comprehensive way). On these sites, workers create permanent, searchable profiles for employers that containe rich, employment-relevant data about themselves.

With the rise of LinkedIn, we are witnessing an unprecedented, voluntary data collection and digitization of the supply side of the labor market. On LinkedIn, individuals can create public profiles and list their education, professional credentials, associations, skills, current and past work experiences and, critically, their other professional connections (indicated by approved links to other LinkedIn users).  As of yesterday (July 24th, 2012), approximately 19% of the US-based Internet using population had a LinkedIn profile [* see note below for interesting background for this 19% figure]. According to LinkedIn, as of March 12, 2012, over 160 million people have created profiles, and in many industries, a LinkedIn profile is expected of all applicants. I talked recently to oDesk's corporate recruiter, asking her how many candidates had LinkedIn profiles. She responded: 

I'd say it is close to 100% (and certainly 100% for viable candidates).   I can't think of an example of someone who I have screened who didn't have a profile on LinkedIn. 

I think this supply digitization is likely to prove consequential, because once the supply side of the labor market is digitized, platforms can begin making data-driven, highly contextualized recommendations to both sides of the market. The recommendations made by a platform can have the advantage of being potentially informed by the platform's holistic perspective on the marketplace. In computer-mediated marketplaces, by necessity essentially every piece of data that goes into or is generated by the marketplace is captured in an electronic database that could conceivably used to make recommendations. 

Of course, job board do try to make recommendations by suggesting vacancies to workers, but they are limited to conditioning those recommendations on whatever search terms and perhaps geographic and/or salary constraints a job-seeker enters in a relatively brief search session. The platform cannot condition its recommendations on a worker's employment history, educational background, skills, current employment status, professional connections, certifications, personality, test scores and other match-relevant factors, nevermind try to balance recommendations to navigate the twin shoals of market thinness and market congestion. 

Unfortunately, I think a lot of this work on recommendations will happen within companies in a state of semi-secrecy, but hopefully enough will be made public that others can contribute, ala the Netflix challenge. It's a little sad that to date society has expended more machine learning research effort trying to predict taste in moves rather than fit for jobs, despite the enormous welfare consequences of the labor market. However, I predict this will change and expect a lot more work on this topic from computer scientists and market designers in the coming years. 

[*] The Origin of "19% of the US Population has a LinkedIn profile" Number

In writing this blog post, I wanted to get an accurate number for what fraction of the US population has a LinkedIn profile. This number was proving hard to come by, so I decided to try a relatively new service launched by Google called Google Consumer Surveys. For 10 cents an answer, you can pose questions to a supposedly representative sample of US-based Internet users. You also get some of the respondent's basic demographics, such as inferred age, gender and income. I launched a one question survey and got 1511 responses in less than a day. The screenshot below shows the main results, but it also includes some neat tools for looking at the data in different ways. I made the survey public---check it out here. I'm quite pleased with the service and plan to use it again.  


Thursday, July 5, 2012

Shrimponomics, Complements & BPOs


Most relevant image available from doing a
Google Image search for "Shrimp using a computer"

A few years ago, there was a Freakonomics post about how people reason about economic situations and phenomena. The phenomenon in question was shrimp consumption: the amount of shrimp people eat in the US per capita  tripled between 1982 and 2007. When asked to explain this rise, non-economists mainly give demand reasons (changes in preferences), while economists are more likely to also give supply reasons (improved fishing efficiency, rise of aquaculture etc.). 

If I had to offer an explanation for this focus on demand explanations, my guess it that demand explanations come more easily to us because it is the side of the market that is more familiar to us : most of us have eaten shrimp & bought shrimp---very few of us have worked in commercial fishing. So when asked "why are people consuming more shrimp?" we start with "why might I consume more shrimp?" and although price is certainly a reason (and a path of thought that would help lead to a demand explanation), it's not as salient or even as interesting as things like changing tastes, health trends, exciting new shrimp-based dishes etc.     

So this blog post isn't about shrimp and it isn't about supply & demand. It's about complements and substitutes. I think there is a similar psychological tendency to focus on goods-as-substitutes than goods-as-complements.  At the individual level where we are making choices, we are usually thinking in terms of substitutes: do I want coffee or tea? Should I take a vacation to Las Vegas or Hawaii? Mac or PC? It's a bit more subtle to think about "if I had X, would it make Y more useful to me" which is at the heart of all complementarity stories. 

This is a long-winded introduction to my real topic, which is that in my last blog post, I made the argument that online work could disrupt the BPO industry by serving as a substitute for what BPOs offer. A point I didn't think of---but in retrospect seems pretty obvious---is how just as easily complementarity could be the dominate effect. After my blog post, my CTO at oDesk, Odysseas, emailed me with his thoughts:  

The primary benefit of BPOs is not that of labor cost arbitrage. Thats typically the motive/benefit for offshore staff augmentation firms - but BPOs are business process outsourcers. BPO is ADP [Automated Data Processing] that outsources your payroll or a business that outsource your HR process etc... We often tend to think of BPOs as an offshore firm that does a little bit of everything having as sole pivot point its lower cost of labor - thats true, but its an abuse of the term and I would agree there that the particular type of business is going to be affected in the years to come from online labor.
This part is basically my substitutes story---now the complements part:
However, the more interesting effect would be the effect of online labor to the real BPOs..
There BPOs will not be negatively affected - the opposite. The availability of online labor would allow BPOs to become more flexible lower their overall fixed costs force them to become more automated and streamline (their virtual nature will require that), allowing them to lower even the cost per customer, allowing them to focus on smaller projects, smaller customers allowing to address smaller/different market segments.  They will become less relying on an enterprise sales force customer acquisition model which is dramatically affecting their cost structure.
We are seing examples of what the new BPOs will become in companies that outsource the process of testing (uTest) of seo writing (Mediapiston) etc.
He's of course exactly right---and he's a CS PhD, not an economist, so shame on me :). If you think of true BPOs in the sense that Odysseas is talking about, then the complementarity story becomes more important. These true BPOs would be big buyers in the inputs market and would benefit greatly from a liquid, efficient market for labor.


Tuesday, June 26, 2012

Will online labor markets disrupt the traditional BPO firm?

Today I spoke on a panel on something called "impact sourcing" at the BPO World Forum. The idea of impact sourcing, in a nutshell, is that online work is a tool for development and that for-profit firms outsourcing some part of their business should look beyond traditional BPO firms and consider non-profits like Samasource and Digital Divide Data. It was a good audience for this pitch, as many of attendees were CIOs from big companies that are accustomed to signing multi-million dollar IT outsourcing deals with the likes of traditional BPO firms like Wipro, Infosys, Tata Consultancy etc.

After the panel, I was at a reception where I talked to someone fairly high up in a traditional BPO. When I described my elevator pitch version of oDesk's business---clients post jobs, contractors make bids, clients make a hire, we intermediate the work and take a percentage---he said, literally "what are you doing here at this conference? You guys are like the Antichrist." What he meant (in a half joking, half serious way) is that oDesk and similar companies threaten the model of the BPO. 

My perception is that the traditional BPO model is possible because of two facts: (a) the enormous, purely placed-based differences in wages and (b) the difficulty of actually arbitraging those differences without help. BPOs stand ready to help companies reap the benefits of (a) by giving the help necessitated by (b). The word is still very far away from (a) no longer being true, but if oDesk and similar companies can radically lower the barriers to arbitraging differences by making it easy to hire, manage and pay workers regardless of geography, then (b) starts to become less true. If we get to the point where the qualitative differences of online remote and in-person work diminish and assessing and hiring workers is simple and easy, it would obviate the need for much of what the BPO firm is selling

This is not to say that there isn't still a huge space for IT consulting---outsourcing an entire process is hard and BPOs with lots of experience have something very valuable to offer. Furthermore, besides purely cost level, one of the motivations for business process outsourcing is ability to change cost structure, namely by turning a fixed cost into a variable cost. But these caveats aside, on the margin, the mediation aspect of the BPO role seems likely to get less attractive over time as technology improves and online labor markets mature.       

Wednesday, June 6, 2012

Resources for online social science

The Economist recently had an article about the growing use of online labor markets as subject pools in psychology research; ReadWriteWeb wrote a follow-up. If you've been following this topic, there wasn't very much new, but if you're a researcher that would like to use these methods, the articles were pretty light on useful links. This blog post is an attempt to point out some of the resources/papers available. This is my own very biased, probably idiosyncratic view of the resources, so hopefully people will send me corrections/additions and I can update this post.

To start, let's have this medium pay tribute to itself by running through some blogs and their creators. 

Blogs 

  • There is the "Follow the Crowd" blog which I believe is associated with HCOMP conference. It's definitely more CS than Social Science, but I think it's filled with good examples of high-quality research done w/ MTurk and with other markets. 
  • There's Gabriel Paolacci's (now at Erasmus University) "Experimental Turk" blog which was mentioned in the article and is probably the best resource for examples of social and psychological science research being done with MTurk.  
  • Panos Ipeirotis (at NYU and who is now academic-in-residence at oDesk) has a great blog "Behind-enemy-lines" that's basically all things relating to online work 
  • The defunct "Deneme blog" by Greg Little (who also works at oDesk) and Lydia Chilton (at University of Washington). 

Guides / How-To (Academic Papers)

A number of researchers have written guides to using MTurk for research. I think the first stop for social scientists should be the paper by Jesse Chandler, Gabriel Paolacci and Panos Ipeirotis:

Chandler, J. Paolacci, G. and Iperiotis, I. Running Experiments on Mechanical Turk,
Judgement and Decision Making (paper) (bibtex)

My own contribution is a paper with Dave Rand (who will still be starting as new assistant professor at Yale) and Richard Zeckhauser (at Harvard). The paper paper contains a few replication studies, but the real meat and the part I think is most important is the part discussing precisely why/how you can do valid causal inference online (I'm stealing this write-up/links of the paper from Dave's website):

Horton JJ, Rand DG, Zeckhauser RJ. (2011) The Online Laboratory: Conducting Experiments in a Real Labor Market. Experimental Economics14 399-425. (PDF) (bibtex)

Press: NPR's Morning Edition Marketplace [audio]The AtlanticBerkman Luncheon Series [video]National AffairsCrowdflowerMarginal Revolution, Experimental TurkMy Heart's in AccraJoho blogVeracities blog

Software 

Unfortunately there hasn't been too much sharing of software for doing online experiments. Since a lot of the experimentation is done by computer scientists who do not feel daunted by making their own one-off, ad hoc applications, there are a lot of one-off, ad hoc applications. Hopefully people know of other tools that are out there that the can open source / they can share links to.

"Randomizer"

Basically, it lets you provide subjects one link that will automatically redirect them (at random) to a collection of URLs you've specified.I made the first really crummy version of this and then got a real developer to re-do it so it runs on Google App Engine.



"QuickLime"
This is a tool for quickly setting up an Limesurvey (an open source alternative to Qualtrics & Surveymonkey) on a new EC2 machine. This was made courtesy of oDesk research. I haven't fully tested it yet, so as with all this software, caveat oeconomus.


"oDesk APIs"
There haven't been lot of experiments done on oDesk by social scientists, but there's no reason it cannot be done. While it currently is not as convenient or as low-cost as doing experiments on MTurk, I think long-term oDesk workers would make a better subject pool since you can more carefully control experiments, it's easier to get everyone online at the same time to participate in an experiment, there are no spammers etc. If you're looking for some ideas or pointers, feel free to email me.

"Boto"
This is a python toolkit for working with Amazon Web Services (AWS). It's fantastic and saved me a lot of time when I was doing lots of MTurk experiments.

"Seaweed"
This was Lydia Chilton's masters thesis. The idea was to create tools for conducing economics experiments online. I don't think it ever moved beyond the beta stage, but if you (a) have some grant money and (b) are thinking about porting z-tree to the web, you should email Lydia and see where the codebase is & if anyone is working on it.

Here's a little javascript snippet I wrote for doing randomization within the page of an MTurk task.  

People 

I'm not doing to try to do a who-is-who of Crowdsourcing, but if you're looking for some contacts of other people (particularly those in CS) who are doing work in this field, you can check out the list of recent participants at "CrowdCamp" which was a workshop prior to HCI.

History

Probably the first paper I'm aware of that pointed out that experiments (i.e., user studies) were possible on MTurk was by Ed ChiNiki Kittur and Bongwon Suh.  As far as I know, the first social science done on MTurk was Duncan Watts and Winter Mason's paper on financial incentives and the performance of crowds.