Online Labor

Saturday, November 30, 2013

Would a job by any other name pay as much?

I'm working on a project where it would be useful to know what an oDesk job is likely to pay at the time it is posted. Although there are plenty of structured predictors available (e.g., the category, skills, estimated duration etc.), presumably the job description and the job title contain lots of wage-relevant information. The title in particular is likely to identify the main skill needed, the task to be done and perhaps the quality of the person the would-be employer is looking for (e.g., "beginner", or "senior").

Unfortunately, I haven't done any natural language processing before, so I'm a bit out of my element. However, there are good tutorials online as well as R packages that can guide you through the rough parts. I thought writing up my explorations might be useful to others that want to get started with this approach. A gist of the code I wrote is available here.

What I did:

1) I took 20K recent hourly oDesk jobs that where the freelancer worked at least 5 hours. I calculated the log wage over the course of the contract. Incidentally, oDesk wages---like real wages---are pretty well approximated by a normal distribution.

2) I used the RTextTools package to create a document term matrix from the job titles (this is just a matrix of 1 & 0 where the rows are jobs and the columns are relatively frequent words that are not common English words---if the job title contained that word, it gets a 1, otherwise a 0).

3) I fit a linear model using the lasso for regularization (using the glmnet package). I used cross validation to select the best lambda. A linear model probably isn't ideal for this, but at least it gives nicely interpretable coefficients.

So, how does it do? Here are a sample of the coefficients that didn't get set to zero by the lasso, ordered by magnitude (point sizes are scaled by the log number of times that word appears in the 10K training sample):

The coefficients can be interpreted as % changes from the mean wage in the sample when that corresponding word (or word fragment) is present in the title. Nothing too surprising I think: at the extremes, SEO is a very low paying job, whereas developing true applications is high paying.

In terms of out of sample prediction, the R-squared was a little over 0.30. I'll have to see how much of an improvement can be obtained from using some of the structured data available, but explaining 30% of the variation just using the titles is a higher than I would have expected before fitting the model.

Sunday, September 22, 2013

Economics for skeptical social scientists

I recently gave a talk at the "Training school on Virtual Work," which was held at the University of Malta. The participants were mostly graduate students and junior faculty at European universities studying some aspect of virtual work e.g., Wikipedia editors, gold farmers, Current TV contributors, MTurk workers etc. Most were coming from very different methodological background than my own and the people I usually work with---sociology, anthropology, media studies, gender studies etc. I think it is fair to say that most participants have a fairly dim view of economics.

One of the organizers felt that few participants would have encountered the economic perspective on online work. I was asked to present a kind of non straw man version of economics and present the basic tools for how economists think about labor markets. Below is the result---a kind of apologia for economics, combined with a smattering of basic labor economics. I'm not the best judge obviously, but I think it was reasonably well received.

Economics and Online Work (a slightly misleading title though - see description) from John Horton

PS - I should write more about the school later, but one of the main take-aways for me was how (a) pervasive the acceptance of the labor theory of value was among participants and (b) how this leads to very different conclusions about almost everything that matters with respect to online work. It would be interesting to try to analyze a couple of different online work phenomena using the LTV and the marginalist approach to value.

Impressions from a visit to a large call center

Paul Krugman has a piece of research advice, which is to "listen to the gentiles." What he means is to pay attention to what smart practitioners say about their business in order to get economic ideas and insights. I recently had the chance to take a tour of large call center for a major global financial institution. It was interesting throughout and I thought I would share some of my notes.

Recruiting and Training
The company uses a tiered screening approach. They start with an online test. Some proceed to phone interviews and the finally round is in-person interviews. The HR director felt that the simulated work environment test was primary predictor of future job success---this matches up quite well with the industrial psychology literature on employer screening. He also felt that the best predictor of retention was how comfortable a worker seemed up front with the demands of shift-work.

The company made extensive use of their existing employees to recruit new ones. Referrals were highly valued because they were more likely to bring in candidates who understood the reality of shift-based call center work (and thus were less likely to turn over). It seemed to be less about bonding or reducing formal recruitment costs.

Long company-specific training period, but no general training (as Becker would have predicted). However, there are several other call centers in this region and the company does lose employees to them. The company-specific training period was surprisingly long (on the order of 3 months) and was conducted by more senior employees during low call volume periods.

Compensation

The company acts like a price taker with respect to wages. Compensation for new employees was determined by doing yearly market research into what competitors were paying. The company did have very high turn-over (though about in line with the industry), but there was no mention of raising wages as a solution. Their approach seems to be to wait for people to "sort out" of the job that find that they cannot handle shift work. I meant to ask about explicit performance incentives but didn't get a chance to. However, my impression was that rewards came through promotion and one-off bonuses rather than through relating payment to specific actions, despite performance being quite measurable.

They had surprisingly rich amenities. Although pay was not high, amenities were reminiscent of a Silicon Valley start-up: pleasant office, cheap and free food, free gym, concierge service etc. Some of these things seemed like amenities the firm could more cheaply offer than their competitors because of their larger size, since some were club goods. In other words, they could amortize a concierge over many more employees.

Operations

Customers are segmented by value and routed accordingly. The company is multi-national and has call centers in several locations, including Europe, Southwest Asia and East Asia. The company's clients are segmented based on value and routed to the call center that roughly corresponds to the skill level of the workers at the call center e.g., the best customers get the European call center and low-tier do not.

They are highly sophisticated at demand and supply management. Perhaps unsurprisingly, they are good at forecasting call volumes and staffing accordingly---with all of this done semi-automatically. They can adjust supply on the fly by calling off training, meetings etc. if demand spikes via building-wide announcements of status changes.

There was little evidence that much technologically-driven productivity improvement was on the horizon. Although the tasks are highly structured, there was no evidence that significant technology-driven productivity gains were on the horizon. All the big gains from automation already occurred many years ago (e.g., the ubiquitous "Press 1 for "Accounts"). There was no talk of Watson-like automation of responses to customer queries. The one technology they really wanted---and that would radically reduce their costs---was some easy way to verify customer identities over the phone. This alone would increase their productivity by about 20-30%.

There was little evidence that this would could be easily distributed. Most of the firm's workplace policies seemed to be driven by concerns about regulatory compliance and fear of losing sensitive customer information and required a great deal of monitoring and control. It is difficult to imagine a substantial chunk of this work being done by a geographically distributed workforce.

Wednesday, July 10, 2013

You Can Sometimes Trust Research Done on Mechanical Turk, But It Depends on the Research Question

Dan Kahan has an interesting post on some of the validity problems with research conducted on Mechanical Turk (MTurk). I think I largely agree with his main point, which is that the evolution of the marketplace has been such that it's become less useful for conducting certain kinds of research. However, I do worry there's a potential baby/bathwater problem if researchers decide that "unrepresentative" or "experiment-savvy" means a useless subject pool (e.g., Andrew Gelman titled his blog post about Kahan's article "Don't Trust the Turk").

I haven't done MTurk research in several years, but the external validity issue raised by the blog post is something I thought about quite a bit when I was running experiments on the platform. I wrote a section about external validity in my ExpEcon paper with Richard Zeckhauser and Dave Rand). They key portion is excerpted below (the source code and data for that paper are available here):

Representativeness

People who choose to participate in social science experiments represent a small segment of the population. The same is true of people work online. Just as the university students who make up the subjects in most physical laboratory experiments are highly selected compared to the U.S. population, so too are subjects in online experiments, although along different demographic dimensions.

The demographics of MTurk are in flux, but surveys have found that U.S.-based workers are more likely to be younger and female, while non-U.S. workers are overwhelmingly from India and are more likely to be male (Ipeirotis, 2010). However, even if subjects "look like" some population of interest in terms of observable characteristics, some degree of self-selection of participation is unavoidable. As in the physical laboratory, and in almost all empirical social science, issues related to selection and "realism'" exist online, but these issues do not undermine the usefulness of such research (Falk, 2009).

Estimates of changes versus estimates of levels

Quantitative research in the social sciences generally takes one of two forms: it is either trying to estimate a level or a change. For "levels" research (for example, what is the infant mortality in the United States? Did the economy expand last quarter? How many people support candidate X?), only a representative sample can guarantee a credible answer. For example, if we disproportionately surveyed young people, we could not assess X's overall popularity.

For "changes" research (for example, does mercury cause autism? Do angry individuals take more risks? Do wage reductions reduce output?), the critical concern is the sign of the change's effect; the precise magnitude of the effect is often secondary. Once a phenomenon has been identified, "changes'" research might make “levels” research desirable to estimate magnitudes for the specific populations of interest. These two kinds of empirical research often use similar methods and even the same data sources, but one suffers greatly when subject pools are unrepresentative, the other much less so.

Laboratory investigations are particularly helpful in "changes" research that seeks to identify phenomena or to elucidate causal mechanisms. Before we even have a well-formed theory to test, we may want to run experiments simply to collect more data on phenomena. This kind of research requires an iterative process of generating hypotheses, testing them, examining the data and then discarding hypotheses. More tests then follow and so on. Because the search space is often large, numerous cycles are needed, which gives the online laboratory an advantage due to its low costs and speedy accretion of subjects.

Friday, May 24, 2013

Platforms can tax externalities and generate costly signals

The word "URGENT" should cost at least $100 per usage.
— Merlin Mann (@hotdogsladies) May 24, 2013

One thing that's great about platforms is that socially efficient, signal-generating Pigovian taxation like the kind proposed in this tweet is not a joke---you can actually do things like this, which may be one of the great advantages of markets mediated by a powerful third party.

Tuesday, May 21, 2013

Country-Specific Minimum Wage Data, Courtesy of Wikipedia

I was looking for some data on minimum wages in various countries and found that Wikipedia (perhaps unsurprisingly) has a very nice, well-annotated table. After downloading the data & cleaning it a bit (harder than it should be), I made several plots. There were too many countries for one plot, so I made one for each (approximate) quartile. At the end of the blog post is the R code I used for fetching the data & making the plots.

Fourth Quartile

NB: Some countries have exemption policies for worker or occupation characteristics, so for a more complete understanding, of say, why Australia appears to have a minimum wage more than 2x the US minimum wage, check the Wikipedia table.

Third Quartile

Second Quartile

First Quartile

Distribution of Minimum Wages

Below is a some R code for grabbing the table of country-specific minimum wages from Wikipedia.

Tuesday, July 31, 2012

The Indian blackouts & oDesk

A nationwide blackout in India has left some 600 million people without electricity. Given that a large number of the contractors on oDesk are from India, I assumed that effects of the blackout would show up readily in the oDesk data. This evening, I wrote a query to get the hours worked each day by Indian contractors during the last month and the number of applications sent. I divided these counts by the respective totals for that day for all of oDesk. From this time series, we can get a sense of what was supposed to happen today and compare it to what actually happened. The time series for applications (top) and hours worked (bottom) are plotted below [1], with today annotated in red. Each percentage estimate has a 95% confidence interval.

Some observations

There is a very easy to detect drop-off in the hours worked---my eyeball calculation says they should have been responsible for around 22% of the hours worked today, while the actual number is closer 17.5%. This is far less of a fall-off than we would naively predict from the "1/2 of Indians without power" headline. Presumably many contractors have access to private generators, or perhaps oDesk is over-represented in parts of the country that were less affected by the blackout.
There is no corresponding obvious drop-off in the fraction of applications. I don't have a good explanation for this, but perhaps non-affected Indian contractors have made up the difference and exploited the now-thinner market. If I can get some data on what parts of the country are actually being affected by the blackout, I could test this notion since I do have contractor locations down to the city level.
Indian contractors take weekends off, both in terms of working and job finding (or at least more so than their oDesk counter-parts from other countries). Remember that this time series is the fraction for a given day, so there's no reason for a strong weekend/weekday pattern. See oDesk Country Explorer for more of this kind of data.
Indian contractors are generally over-represented in the application pool, making up ~25% of applications but only about ~20% of hours worked, though this could easily reflect differences in the kinds of categories Indian contractors work in---there is a great deal of variance in the average number of applications per opening across the different job categories.

Code for the plots (done in ggplot2):