## Wednesday, April 8, 2015

### Influential works in Data-Driven Discovery

A recent initiative to fund data-driven discoveries was completed last year by the Gordon and Betty Moore Foundation. Over 1,100 applications were received and each application had the opportunity to cite five "influential works in the general field of 'Big Data' for scientific discovery".  An analysis was done to see which works were cited the most and in what genres these works were from.  A paper summarizing the results was posted on arXiv on March 30, 2015 and had some interesting results that I wanted to share!

First, I was happy to see I have read several of the most cited influential works, but this also gave me a nice summer reading list of things I haven't read (I think this will basically be my #tbt (throwback Thursday) data-science papers for the next year)!  It was such a great list of works that I wanted to share the most cited influential works on here (each cited at least 10 times):

1. MapReduce [Dean and Ghemawat, 2008] - 63 (citations)
2. Fourth Paradigm [Hey et al., 2009] - 51
3. Elements of Statistical Learning [Hastie et al., 2009] - 43
4. Initial sequencing of the human genome [Lander et al., 2001] - 30
5. A mathematical theory of communication [Shannon, 2001] - 24
6. Sloan Digital Sky Survey [York et al., 2000] - 23
7. BLAST [Altschu et al., 1990] - 20
8. Lasso [Tibshirani et al., 1996] - 19
9. Latent Dirichlet allocation [Blei et al., 2003] - 19
10. EM algorithm [Demster et al., 1977] - 17
11. Support vector networks [Cortes and Vapnik, 1995] - 17
12. Random forest [Breiman, 2001] - 15
13. Pattern Recognition [Bishop et al., 2006] - 14
14. Anatomy of web search engine [Brin and Page, 1998] - 14
15. Numerical Recipes [Press, 2007] - 13
16. Boostrap methods [Efron, 1979] - 11
17. Equation of state calculations [Metropolis et al., 1953] - 11
18. Exploratory data analysis [Tukey, 1977] - 11
19. Probability reasoning [Pearl, 1988] - 11
20. PageRank [Page et al., 1999] - 10
21. Bayesian Data Analysis [Gelman et al., 2013] - 10
22. Unreasonable effectiveness of data [Halevy et al., 2009] - 10