Photo by Sincerely Media on Unsplash

Exploratory Data Analysis With NLP Project

“I hope this article can help someone who interesting in natural language processing (NLP).”

1. import libraries and load data

# basic libraries
import pandas as pd
import numpy as np
# help data frame can show side by side
from IPython.display import display,HTML
# statistic libraries
import seaborn as sns
from scipy import stats
# plot
import matplotlib.pyplot as plt
# loop step
from tqdm import tqdm
# load data
df_train = pd.read_csv('../Data/train.csv', encoding="ISO-8859-1")
df_test = pd.read_csv('../Data/test.csv', encoding="ISO-8859-1")
df_attributes = pd.read_csv('../Data/attributes.csv')
df_product_descriptions = pd.read_csv('../Data/product_descriptions.csv')

2. Exploratory Data Analysis

To-Do List:

list_df = [df_train, df_attributes, df_test,
list_df_name = [‘Traing Data’, ‘Attributes’, ‘Test Data’,
‘Product Descriptions’],
list_number_of_data = [5, 28, 6, 5],
row = 2, col = 2, fill = ‘col’
This data set contains a number of products and real customer search terms from Home Depot’s website.
This data set contains several products and customer search terms from Home Depot’s website.
In this case, we must convert the id and product_uid’s column type to object.
Image displays the training data descriptive statistics index
The image shows the pattern of the target variable.


Learning From Data 😱 Email: