Exploratory Data Analysis With NLP Project

“I hope this article can help someone who interesting in natural language processing (NLP).”

1. import libraries and load data

# basic libraries
import pandas as pd
import numpy as np
# help data frame can show side by side
from IPython.display import display,HTML
# statistic libraries
import seaborn as sns
from scipy import stats
# plot
import matplotlib.pyplot as plt
# loop step
from tqdm import tqdm
# load data
df_train = pd.read_csv('../Data/train.csv', encoding="ISO-8859-1")
df_test = pd.read_csv('../Data/test.csv', encoding="ISO-8859-1")
df_attributes = pd.read_csv('../Data/attributes.csv')
df_product_descriptions = pd.read_csv('../Data/product_descriptions.csv')

2. Exploratory Data Analysis

To-Do List:

list_df = [df_train, df_attributes, df_test,
list_df_name = [‘Traing Data’, ‘Attributes’, ‘Test Data’,
‘Product Descriptions’],
list_number_of_data = [5, 28, 6, 5],
row = 2, col = 2, fill = ‘col’
This data set contains a number of products and real customer search terms from Home Depot’s website.
In this case, we must convert the id and product_uid’s column type to object.
Image displays the training data descriptive statistics index
The image shows the pattern of the target variable.


