Indirect information about number of contractors, average hour rates across skills and their demand now can be drawn only from oDesk tests. This information is spread over pages with descriptions of tests and it must be gathered in one place for future analysis. Scraping is tedious even for a pythonista with requests and html5lib.
Analyzing data with Pandas
Pandas is a large library for data-analysis, based on Numpy. If Numpy is called "Matlab in Python", then Pandas is "R-language in Python".
Let's run interactive Python interpreter and load data from CSV file:
>>> import pandas as pd
>>> import numpy as np
>>> def default_value(typ, default, val):
... try:
... return typ(val)
... except ValueError:
... return default
>>> def maybe_int(val):
... return default_value(np.int64, None, val.replace(',', ''))
>>> def maybe_float(val):
... return default_value(np.float64, None, val)
>>> tests = pd.read_csv('tests_apr7.csv', thousands=',', converters={
... 'hourly_rate_max': maybe_float,
... 'hourly_rate_avg': maybe_float,
... 'percent_independent': maybe_float,
... 'average_qualificatinos': maybe_float,
... 'taken_test': maybe_int,
... 'passed_test': maybe_int,
... 'tests_taken': maybe_int,
... })
>>> tests
<class 'pandas.core.frame.DataFrame'>
Int64Index: 440 entries, 0 to 439
Data columns:
hourly_rate_max 432 non-null values
hourly_rate_avg 432 non-null values
percent_independent 440 non-null values
title 440 non-null values
average_qualifications 440 non-null values
taken_test 440 non-null values
average_hours 435 non-null values
passed_test 440 non-null values
test_id 440 non-null values
tests_taken 440 non-null values
dtypes: float64(5), int64(4), object(1)
Now some interesting statistics can be devised.
10 tests with most contractors
Guess which test is the most popular.
>>> tests.sort_index(
... by=['passed_test'], ascending=False
... ).ix[
... :, ['test_id', 'title', 'passed_test']
... ][:10]
test_id title passed_test
0 752 oDesk Readiness Test for Independent Contracto... 743081
439 511 U.S. English Basic Skills Test 345213
438 688 English Spelling Test (U.S. Version) 269360
435 545 Office Skills Test 114577
436 584 Windows XP Test 104314
434 693 English Vocabulary Test (U.S. Version) 88943
437 753 oDesk Readiness Test for Agency Contractors 84282
433 506 Email Etiquette Certification 60019
429 571 Telephone Etiquette Certification 48861
428 484 Call Center Skills Test 44063
See also oDesk Knowledgebase article What is the oDesk Readiness Test?
10 tests with highest average hourly rates
Show very interesting correlation between number of contractors and the average cost of the hour.
>>> tests.sort_index(
... by=['hourly_rate_avg'], ascending=False
... ).ix[
... :, ['title', 'hourly_rate_avg', 'passed_test']
... ][:10]
title hourly_rate_avg passed_test
14 VB.NET Programming Skills Test (Hands-on progr... 49.50 5
253 Adobe FrameMaker 8 Test 47.75 36
131 Design Considerations for Mobile Web Applicati... 36.19 58
166 VLSI Test 34.00 48
143 Checkpoint Security Test 29.00 68
248 RDF Test 28.50 26
240 Knowledge of ColdFusion 9 Skills Test 28.49 50
29 PostgreSQL Test 28.10 199
266 Web Services Test 27.95 301
53 Cocoa programming for Mac OS X 10.5 Test 27.55 567
10 tests with most worked hours
Can be used to get the lower bound of total worked hours and amount of earned money till Apr 8, 2013.
>>> tests['total_hours'] = tests['passed_test'] * tests['average_hours']
>>> tests['total_earnings'] = tests['total_hours'] * tests['hourly_rate_avg']
>>> tests[tests['total_hours'] > 0].sort_index(
... by=['total_hours'], ascending=False
... ).ix[
... :, ['title', 'total_hours', 'total_earnings']
... ][:10]
title total_hours total_earnings
0 oDesk Readiness Test for Independent Contracto... 308378615 2.692145e+09
439 U.S. English Basic Skills Test 196080984 1.815710e+09
438 English Spelling Test (U.S. Version) 144646320 9.922738e+08
435 Office Skills Test 70808586 4.935358e+08
436 Windows XP Test 61023690 5.339573e+08
434 English Vocabulary Test (U.S. Version) 52120598 3.841288e+08
437 oDesk Readiness Test for Agency Contractors 49642098 2.765065e+08
433 Email Etiquette Certification 44474079 3.740270e+08
429 Telephone Etiquette Certification 36499167 2.901684e+08
428 Call Center Skills Test 33884447 2.232985e+08