Useful Resources
Useful Resources
This page contains a collection of useful resources for data science, machine learning, and artificial intelligence. The resources include books, courses, tutorials, tools, and more to help you learn and grow in the field of data science.
- Datasets
- Kaggle Datasets - A platform for data science and machine learning datasets
- UCI Machine Learning Repository - A collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms
- Google Dataset Search - A search engine for datasets
- ArcGIS World Roads - World Roads represents the major roads, highways, local roads, and ferries of the world.
- Books
Python for Data Analysis by Wes McKinney
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
Python Machine Learning Essentials by Bernard Baah
Python Machine Learning Case Studies: Five Case Studies for the Data Scientist by Danish Haroon
- Courses & Tutorials
- Machine Learning by Andrew Ng on Coursera
- TensorFlow Tutorials - Official TensorFlow tutorials for beginners and experts
- Dr. Qiusheng Wu’s YouTube Channel - Various tutorials on GIS, remote sensing, and Python programming
- Upenn’s MUSA 508 Lab - Tutorials and Materials for MUSA 508 - Public Policy Analytics - University of Pennsylvania Weitzman School of Design
- Python Coding for Public Policy @ NYU Wagner - A course on Python programming for public policy at NYU Wagner, especially useful for API and web scraping.
- Python for Public Policy - A course on Python programming for public policy at NYU Wagner, especially useful for API and web scraping.
- People
- Anna Duan - MUSA
Tools
- Jupyter Notebook - An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text
- Google Colab - A free Jupyter notebook environment provided by Google, offering GPU and TPU support for running machine learning experiments in the cloud
- Kaggle - A platform for data science competitions, datasets, and notebooks, providing access to powerful computing resources and a community of data enthusiasts
- Scikit-learn - A comprehensive library for machine learning in Python, offering various algorithms and tools for classification, regression, clustering, and more
- TensorFlow - An open-source deep learning framework developed by Google, widely used for building and training neural networks
- PyTorch - Another popular deep learning framework, known for its dynamic computation graph and ease of use
- Matplotlib - A flexible plotting library for creating static, interactive, and animated visualizations in Python
- Seaborn - Built on top of Matplotlib, Seaborn provides a high-level interface for creating attractive statistical graphics
- Flask - A lightweight web framework for building RESTful APIs and web applications in Python
- Django - A high-level web framework for rapid development and clean design, suitable for building complex web applications
Open data helps create a lot of public datasets that you can access to make data-driven decisions. Here are some resources you can use to start searching for public datasets on your own:
The Google Cloud Public Datasets allow data analysts access to high-demand public datasets, and make it easy to uncover insights in the cloud.
The Dataset Search can help you find available datasets online with keyword searches.
Kaggle has an Open Data search function that can help you find datasets to practice with.
Finally, BigQuery hosts 150+ public datasets you can access and use.
Public health datasets
Global Health Observatory data: You can search for datasets from this page or explore featured data collections from the World Health Organization.
The Cancer Imaging Archive (TCIA) dataset: Just like the earlier dataset, this data is hosted by the Google Cloud Public Datasets and can be uploaded to BigQuery.
1000 Genomes: This is another dataset from the Google Cloud Public resources that can be uploaded to BigQuery.
Public climate datasets
National Climatic Data Center: The NCDC Quick Links page has a selection of datasets you can explore.
NOAA Public Dataset Gallery: The NOAA Public Dataset Gallery contains a searchable collection of public datasets.