Big Data

  • This Page will consist of working with big data using Python, SQL, and other Applications as I continue to learn.

  • Most of the CSV files and code are included

March Madness Machine Learning Project

  • Here is a March Madness Win prediction project I did using different machine learning algorithms

March Madness Project Link

GOLF PGA Data Science Project

  • Here is a PGA Data Science Project that I am currently working on! (in progress)

Golf Project Link

Day One: Introduction to Big Data

Day 1 Link

Day Two: Data Representation

Day 2 Link

Day Three: Data Collection

  • Including scrapping from Twitter API

Day 3 Link

Day Four: Working with SQL and NoSQL

Day 4: SQL Link

  • Working with CitiBikes example using SQL and Python

More on SQL Link

JOINS in SQL

  • INNER, LEFT, RIGHT JOINS, and CROSS JOINS in SQL

Intermediate SQL

  • Topics include:
    • Window Functions
    • PARTITION BY vs. GROUP BY *

Day Five: Data Quality and Working with Pandas Library

Day 5: NoSQL Link

Day Six: Big Data Preprocessing

Day 6 Link

Includes:

  • Entropy
  • Feature Selection
  • Principal Component Analysis

Day Seven: Data Exploration

Day 7 Link

Day Eight: Predictive Modeling

Day 8 Link

  • Classification and Regression
  • Decision Tree and Example
  • 5-Fold Cross Validation Method
  • Model Selection and Overfitting
  • K Nearest Neighbors

Day Nine: Market Basket Analysis

Day 9 Link

Day Ten: Cluster Analysis

  • k-means
  • Cluster Analysis Hierarchical
  • Cluster Analysis Density Based
  • Cluster Analysis Evaluation

Day 10 Link (Cluster Analysis)

Day Eleven: Anomaly Detection

Day 11 Link

Day Twelve: Collaborative Filtering

Day 12 Link

Day Thirteen: Intro to A/B Testing

Day 13 Link

Time Series

Time Series Link

ETL Pipeline

Day 20 Link-> ETL Pipeline

  • ETL Pipeline example using MySQL, Twitter API, and Python

Another ETL Pipeline Example

ETL Pipeline

Coming Soon (uploading Notes later….)

Day : Intro to MapReduce and Hadoop

  • Watch Videos on this…
  • Using Amazon Webservices

Day : Intro to Pig

  • Watch Videos on Pig

  • Main use of Pig is to help users transform data or compute summary statistics from the data

Day : Intro to Hive

  • Watch Videos on Hive