What is Data Science?

Here’s a pivotal article from O’Reilly titled “What is data science?”

Quora’s thread on What is data science is a good starting point too.

Can you show me good examples?

Data Visualisation

Hans Rosling’s TED Talk is probably the best to start with. His “200 countries, 200 years, 4 minutes” is an extra-ordinary historical narrative via numbers. The BBC made a one-hour programme called The Joy of Stats which is well worth watching. The statistics behind these can be explored at GapMinder.org.

The New York Times has produced excellent visualisations. Their gallery has visualisations using their API. The sites of some of their team, like Matthew Ericson, Steven Heller, Shan Carter and Graham Roberts, are full of excellent visualisations as well.

The Guardian is another publication that makes excellent use of visualisations. Their datablog has most of their recent work. For more data related stuff, see their data store.

Edward Tufte is popularly recognised as an expert on visualisation. His writings and books are well worth a read. The Visual Display of Quantitative Information is a classic.

Stephen Few’s Information Dashboard Design is a good introduction to dashboard design.

The Smashing Magazine has a good round-up of data visualisation approaches, infographics and a ddof resources.

Machine Learning

Peter Norvig’s The Unreasonable Effectiveness of Data [PDF]

How do I learn data science?

Quora’s thread on How do I become a data scientist has some useful tips.

Reading the best blogs about data can keep you in touch with the latest in the field.

Where can I find data?

Gapminder has various statistics about countries

The Guardian Data Store has a mishmash of data from various topics as Excel documents

The following links have not been vetted. Your mileage may vary

http://theinfo.org/

http://knoema.com

http://opengovernanceindia.org

http://infochimps.org/datasets

http://ckan.org

http://www.datawrangling.com/some-datasets-available-on-the-web.html

http://opendatauganda.com

http://opendataforafrica.org/

http://www.reddit.com/r/datasets/

http://www.trustlet.org/wiki/Repositories_of_datasets

http://www.daniel-lemire.com/blog/data-for-data-mining/

http://www.quantlet.org/mdbase/

http://datamob.org

http://freebase.com

http://www.diggingintodata.org/Repositories/tabid/167/Default.aspx

http://www.quora.com/Where-can-I-get-large-datasets-open-to-the-public?q=dataset

http://wiki.dbpedia.org

http://radar.oreilly.com/2010/03/open-data-pointers.html

http://www.datawrangling.com/some-datasets-available-on-the-web

http://www.data.gov/

http://timetric.com/

http://lib.stat.cmu.edu/datasets/

http://www.kdnuggets.com/datasets/index.html

https://datamarket.azure.com/

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13

https://datamarket.azure.com/

http://unstats.un.org/

http://wiki.openstreetmap.org/wiki/Planet.osm

http://jacquesmattheij.com/Free%2C+Public+Data+Sets

http://data.worldbank.org/developers

http://data.sunlightlabs.com/

http://musicbrainz.org/doc/MusicBrainz_Database

http://www.kinlane.com/data/

http://rdf.dmoz.org/

http://elev.at/

https://finances.worldbank.org/page/datasets

What tools can I use?

The Open Data Hackathon tools list is a good compilation.