Useful Resources

Author

Gigi Sung

Published

January 1, 2099

Useful Resources

This page contains a collection of useful resources for data science, machine learning, and artificial intelligence. The resources include books, courses, tutorials, tools, and more to help you learn and grow in the field of data science.

  1. Datasets
    • Kaggle Datasets - A platform for data science and machine learning datasets
    • UCI Machine Learning Repository - A collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms
    • Google Dataset Search - A search engine for datasets
    • ArcGIS World Roads - World Roads represents the major roads, highways, local roads, and ferries of the world.



  1. Books
  2. Courses & Tutorials



  1. People



  1. Tools

    • Jupyter Notebook - An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text
    • Google Colab - A free Jupyter notebook environment provided by Google, offering GPU and TPU support for running machine learning experiments in the cloud
    • Kaggle - A platform for data science competitions, datasets, and notebooks, providing access to powerful computing resources and a community of data enthusiasts
    • Scikit-learn - A comprehensive library for machine learning in Python, offering various algorithms and tools for classification, regression, clustering, and more
    • TensorFlow - An open-source deep learning framework developed by Google, widely used for building and training neural networks
    • PyTorch - Another popular deep learning framework, known for its dynamic computation graph and ease of use
    • Matplotlib - A flexible plotting library for creating static, interactive, and animated visualizations in Python
    • Seaborn - Built on top of Matplotlib, Seaborn provides a high-level interface for creating attractive statistical graphics
    • Flask - A lightweight web framework for building RESTful APIs and web applications in Python
    • Django - A high-level web framework for rapid development and clean design, suitable for building complex web applications



Open data helps create a lot of public datasets that you can access to make data-driven decisions. Here are some resources you can use to start searching for public datasets on your own:

  • The Google Cloud Public Datasets allow data analysts access to high-demand public datasets, and make it easy to uncover insights in the cloud. 

  • The Dataset Search can help you find available datasets online with keyword searches. 

  • Kaggle has an Open Data search function that can help you find datasets to practice with.

  • Finally, BigQuery hosts 150+ public datasets you can access and use. 

Public health datasets

  1. Global Health Observatory data: You can search for datasets from this page or explore featured data collections from the World Health Organization.  

  2. The Cancer Imaging Archive (TCIA) dataset: Just like the earlier dataset, this data is hosted by the Google Cloud Public Datasets and can be uploaded to BigQuery.

  3. 1000 Genomes: This is another dataset from the Google Cloud Public resources that can be uploaded to BigQuery. 

Public climate datasets

  1. National Climatic Data Center: The NCDC Quick Links page has a selection of datasets you can explore. 

  2. NOAA Public Dataset Gallery: The NOAA Public Dataset Gallery contains a searchable collection of public datasets.

Public social-political datasets

  1. UNICEF State of the World’s Children: This dataset from UNICEF includes a collection of tables that can be downloaded.

  2. CPS Labor Force Statistics: This page contains links to several available datasets that you can explore.

  3. The Stanford Open Policing Project: This dataset can be downloaded as a .csv file for your own use.