Big Data
-
This Page will consist of working with big data using Python, SQL, and other Applications as I continue to learn.
-
Most of the CSV files and code are included
March Madness Machine Learning Project
- Here is a March Madness Win prediction project I did using different machine learning algorithms
GOLF PGA Data Science Project
- Here is a PGA Data Science Project that I am currently working on! (in progress)
Day One: Introduction to Big Data
Day Two: Data Representation
Day Three: Data Collection
- Including scrapping from Twitter API
Day Four: Working with SQL and NoSQL
- Working with CitiBikes example using SQL and Python
- INNER, LEFT, RIGHT JOINS, and CROSS JOINS in SQL
- Topics include:
- Window Functions
- PARTITION BY vs. GROUP BY *
Day Five: Data Quality and Working with Pandas Library
Day Six: Big Data Preprocessing
Includes:
- Entropy
- Feature Selection
- Principal Component Analysis
Day Seven: Data Exploration
Day Eight: Predictive Modeling
- Classification and Regression
- Decision Tree and Example
- 5-Fold Cross Validation Method
- Model Selection and Overfitting
- K Nearest Neighbors
Day Nine: Market Basket Analysis
Day Ten: Cluster Analysis
- k-means
- Cluster Analysis Hierarchical
- Cluster Analysis Density Based
- Cluster Analysis Evaluation
Day 10 Link (Cluster Analysis)
Day Eleven: Anomaly Detection
Day Twelve: Collaborative Filtering
Day Thirteen: Intro to A/B Testing
Time Series
ETL Pipeline
- ETL Pipeline example using MySQL, Twitter API, and Python
Another ETL Pipeline Example
Coming Soon (uploading Notes later….)
Day : Intro to MapReduce and Hadoop
- Watch Videos on this…
- Using Amazon Webservices
Day : Intro to Pig
-
Watch Videos on Pig
-
Main use of Pig is to help users transform data or compute summary statistics from the data
Day : Intro to Hive
- Watch Videos on Hive