Machine Learning Models for Fake News
Finding Fake news
- Using the Kaggle Dataset ->
- Preprocess Data
- Cleaning Data
- Reviewing Data
Steps to Clean Tweets
-
Remove external inks
-
remove punctuations, numbers, non-alphabetic characters
-
Remove indicators for names of new sources like New York Times, Rueters, Fox News, etc.
-
Use Stemer to identify the same words under different tense of plural forms
Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import string
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
from wordcloud import WordCloud
fake=pd.read_csv('data/Fake_New.csv')
fake.head()
title | text | subject | date | |
---|---|---|---|---|
0 | Donald Trump Sends Out Embarrassing New Year‚... | Donald Trump just couldn t wish all Americans ... | News | 31-Dec-17 |
1 | Drunk Bragging Trump Staffer Started Russian ... | House Intelligence Committee Chairman Devin Nu... | News | 31-Dec-17 |
2 | Sheriff David Clarke Becomes An Internet Joke... | On Friday, it was revealed that former Milwauk... | News | 30-Dec-17 |
3 | Trump Is So Obsessed He Even Has Obama’s Na... | On Christmas day, Donald Trump announced that ... | News | 29-Dec-17 |
4 | Pope Francis Just Called Out Donald Trump Dur... | Pope Francis used his annual Christmas Day mes... | News | 25-Dec-17 |
real=pd.read_csv('data/True_New.csv')
real.head()
title | text | subject | date | |
---|---|---|---|---|
0 | As U.S. budget fight looms, Republicans flip t... | WASHINGTON (Reuters) - The head of a conservat... | politicsNews | 31-Dec-17 |
1 | U.S. military to accept transgender recruits o... | WASHINGTON (Reuters) - Transgender people will... | politicsNews | 29-Dec-17 |
2 | Senior U.S. Republican senator: 'Let Mr. Muell... | WASHINGTON (Reuters) - The special counsel inv... | politicsNews | 31-Dec-17 |
3 | FBI Russia probe helped by Australian diplomat... | WASHINGTON (Reuters) - Trump campaign adviser ... | politicsNews | 30-Dec-17 |
4 | Trump wants Postal Service to charge 'much mor... | SEATTLE/WASHINGTON (Reuters) - President Donal... | politicsNews | 29-Dec-17 |
stemmer = PorterStemmer()
def clean(text):
if "(Reuters)" in text: # real news contains this identifier sometimes
text = text.split("(Reuters)")[1]
text = re.sub(r'@[^s]*', '', text)
text = re.sub(r'https?://\S+|www\.\S+', '', text)
text = " ".join([wd for wd in text.split() if "\\" not in wd and "/" not in wd and wd not in stopwords.words('english')])
text = "".join([c for c in text if c not in string.punctuation])
text = "".join([c for c in text if not c.isdigit()])
text = re.sub('[^a-zA-z\s]', '', text)
text = text.lower()
text = " ".join([stemmer.stem(wd) for wd in text.split()])
return text
alabama offici thursday certifi democrat doug jone winner state us senat race state judg deni challeng republican roy moor whose campaign derail accus sexual misconduct teenag girl jone vacant seat vote percentag point elect offici said that made first democrat quarter centuri win senat seat alabama the seat previous held republican jeff session tap us presid donald trump attorney gener a state canvass board compos alabama secretari state john merril governor kay ivey attorney gener steve marshal certifi elect result seat jone narrow republican major senat seat in statement jone call victori a new chapter pledg work parti moor declin conced defeat even trump urg so he stood claim fraudul elect statement releas certif said regret media outlet report an alabama judg deni moor request block certif result dec elect decis shortli canvass board met moor challeng alleg potenti voter fraud deni chanc victori hi file wednesday montgomeri circuit court sought halt meet schedul ratifi jone win thursday moor could ask recount addit possibl court challeng merril said interview fox news channel he would complet paperwork within time period show money challeng merril said weve notifi yet intent that merril said regard claim voter fraud merril told cnn case report weve adjud those we continu that said republican lawmak washington distanc moor call drop race sever women accus sexual assault misconduct date back teenag earli s moor deni wrongdo reuter abl independ verifi alleg
Lets Clean our Text
- Pass in “text’ into our clean() function and watch the magic happen!
- This is just for practice!!
## Add NEw column to both datasets ("isFake") and add either 0 (real) or 1 (fake)
real["isfake"] = 0
fake["isfake"] = 1
# Combine the two Datasets using concat in Pandas!
allnews = pd.concat([real, fake])
# Cleans dataset Line by Line using Lambda function for every line (text) Column : perform clean Function on the text.
# This can take some time to Clean Depending on how "Large" the dataset is
allnews['text'] = allnews['text'].apply(lambda text: clean(text))
# Save our new combined array, which includes fake and real news including (0 or 1) if its real or fake column!
pd.DataFrame.to_csv(allnews, "output.csv", index=False)
# Read our newly created csv file into the dataframe
cleanedtext=pd.read_csv("output.csv")
cleanedtext
title | text | subject | date | isfake | |
---|---|---|---|---|---|
0 | As U.S. budget fight looms, Republicans flip t... | the head conserv republican faction us congres... | politicsNews | 31-Dec-17 | 0 |
1 | U.S. military to accept transgender recruits o... | transgend peopl allow first time enlist us mil... | politicsNews | 29-Dec-17 | 0 |
2 | Senior U.S. Republican senator: 'Let Mr. Muell... | the special counsel investig link russia presi... | politicsNews | 31-Dec-17 | 0 |
3 | FBI Russia probe helped by Australian diplomat... | trump campaign advis georg papadopoulo told au... | politicsNews | 30-Dec-17 | 0 |
4 | Trump wants Postal Service to charge 'much mor... | presid donald trump call us postal servic frid... | politicsNews | 29-Dec-17 | 0 |
... | ... | ... | ... | ... | ... |
70 | ELECTION FRAUD: If It Happened in Michigan, Wi... | st centuri wire say on recent episod the sunda... | Middle-east | 15-Mar-16 | 1 |
71 | Patrick Henningsen LIVE with guest Ray McGover... | join patrick everi week wiretv news view analy... | US_News | 1-Dec-16 | 1 |
72 | Boiler Room EP #85.5 – Who’s Watching The ... | tune altern current radio network acr anoth li... | US_News | 30-Nov-16 | 1 |
73 | Washington Post attempts to smear Ron Paul Ins... | st centuri wire say as wire report saturday th... | US_News | 28-Nov-16 | 1 |
74 | Episode #162 – SUNDAY WIRE: ‘The Revolutio... | episod sunday wire show resum novemb host patr... | US_News | 27-Nov-16 | 1 |
75 rows × 5 columns