Data Science

Attivita' di ricerca

Data Science is emerging as a disruptive consequence of the digital revolution. Based on the combination of “big data” availability, sophisticated data analysis techniques, and scalable computing infrastructures, Data Science is rapidly changing the way we do business, socialize, conduct research, and govern society.  It is also changing the way scientific research is performed. Model-driven approaches are supplemented with data-driven approaches. A new paradigm emerged, where theories and models and the bottom up discovery of knowledge from data mutually support each other. Experiments and analyses over massive datasets are functional not only to the validation of existing theories and models, but also to the data-driven discovery of patterns emerging from data, which can help scientists design better theories and models, yielding deeper understanding of the complexity of social, economic, biological, technological, cultural and natural phenomena.
Data science is an interdisciplinary and pervasive paradigm aiming to turn data into knowledge, born at the intersection of a diversity of scientific and technological fields: databases and data mining, machine learning and artificial intelligence, complex systems and network science, statistics and statistical physics, information retrieval and text mining, natural language understanding, applied mathematics. Spectacular advances are occurring in data-driven pattern discovery, in automated learning of predictive models and in the analysis of complex networks.
Within this context, the Ph.D. in Data Science is aimed at educating the new generation of researchers that combine their disciplinary competences with those of a “data scientist”, able to exploit data and models for advancing knowledge in their own disciplines, or across diverse disciplines. To this purpose, the Ph.D. in Data Science develops a mix of knowledge and skills on the methods and technologies for the management of large, heterogeneous and complex data, for data sensing (how to harvest data), for data analysis and mining (how to make sense of data), for data visualization and storytelling (how to narrate data), for understanding the ethical issues and the social impact of Data Science. The Ph.D. students will have the opportunity of developing data science projects in a variety of domains, including:

  • Data science for society and policy
  • Data science for economics and finance
  • Data science for culture and the humanities
  • Data science for industry and manufacturing
  • Data science for biology and health
  • Data science for the hard and environmental sciences
  • Data science ethics and legal aspects
  • Data science techniques and methods

The Ph.D. leverages the critical mass of data science labs and researchers accumulated in Pisa since early 2000’s, across the University of Pisa, the ISTI and IIT institutes of the CNR (National Research Council), Scuola Normale, Scuola Sant’Anna and Scuola IMT Lucca. These labs gave rise to pioneering European projects in big data analytics and data science, as well as to the earliest educational programs for data scientists at graduate and PhD level. In 2015, the European Commission has chosen this hub as the coordinator of the European Research Infrastructure for Big Data Analytics & Social Mining, SoBigData  This initiative provides an ecosystem of data, analytics and competences to support inter-disciplinary open data science and data-driven innovation, within an ethical framework of transparency, privacy, and responsibility. SoBigData provides a unique platform for doctoral education in Data Science, recognized by the Ministry of Education, University and Research[1], where Ph.D. students can carry out multi-disciplinary data-driven research.
Applications from graduate students from any discipline are welcome. The successful candidate is expected to possess a solid motivation and personal preparation, and a strong propensity towards quantitative studies in own field.
[1] Rapporto MIUR BigData,  pag. 33

Attivita' didattica

Teaching is articulated in two lines: alignment of data science skills, to create a common ground for students with diverse background, and applications of data science in disciplinary and multi-disciplinary contexts. For alignment, Ph.D. students will have the opportunity to take selected courses offered by the post-graduate Master in “Big Data Analytics and Social Mining” (Master Big Data) of the University of Pisa, in collaboration with CNR, Scuola Normale, Scuola Sant’Anna and Available courses cover the basics of Data Science and Big Data Analytics:

  • Big Data Sensing & Procurement (Analytical Web Crawling, Scraping, Web Search and Information Retrieval, Semantic Text Annotation, Big Data Sources, Crowdsensing)
  • Big Data Mining (Data Mining, Machine Learning and Statistical Learning, Network Science and Social Network Analysis, Mobility Data Analysis, Web Mining, Nowcasting, Sentiment Analysis and Opinion Mining)
  • Big Data Storytelling (Visualization, Visual analytics, Data Journalism)
  • Big Data Ethics (Privacy-by-design, Data Protection Regulations, Responsible Data Science, Legal aspects of Data Science)
  • Big Data Technologies (Data Management for Business Intelligence, High Performance & Scalable Analytics, NO-SQL Big Data Platforms).

A wide variety of Ph.D. courses focusing on the multi-disciplinary applications of data science are offered by the participating institutions, also in synergy with existing disciplinary Ph.D. programs. Students also have the opportunity to participate in summer schools organized in collaboration with international research institutions, and to the PhD+ program of the University of Pisa, for the development of entrepreneurial and innovation skills.
Ph.D. students in Data Science will annually agree with the PhD Coordinator a study plan to be presented to the Faculty Board. Such a document will specify the planned research and education activities for the relevant academic year. The courses will be chosen to enlarge and align the student background and deepen specific aspects related to the PhD Thesis project. PhD students are expected to take at least three courses (on top of the alignment activities) and to pass the corresponding exams. All courses will be taught in English. The research labs of partner institutions offer a broad network of international liaisons. is a gateway towards a network of data science centers in Europe and features a transnational student exchange program funded by the European Commission under the H2020 Excellent Science program. The SoBigData network comprises currently: ETH Zurich, King’s College London, Fraunhofer, TU Delft, Aalto University, University of Sheffield, University of Tartu, Leibniz Universität Hannover. International collaborations include Pennsylvania State University, Northeastern University Boston, MIT Cambridge (US), Central European University Budapest, Dalhousie University Halifax (CA).

Corsi mutuabili dall’offerta per il corso ordinario 2017-2018