Data preparation and exploratory data analysis take a lot of time and effort from data professionals. Wouldn’t it be nice to have a package that lets you explore your data quickly — in a single line of code?
I’m going to show you the four best Python packages that can automate your data exploration and analysis. I’ll go over each one, what it does and how you can use it.
4 Ways to Speed Up Your EDA in Python
- Data preparation
- Profiling pandas
- Sweet Viz
1. Data preparation
DataPrep lets you prepare your data using a single library with a few lines of code. The DataPrep ecosystem currently consists of three components:
The connector enables simple data collection from web APIs by providing a standard set of operations. The AED the component handles exploratory data analysis, and Own API provides functions to efficiently clean and validate data.
For example, using the Philly Parking Violation Datasetwe can call
plot() to preview EDA on dataframe or plot correlations with a single line of code, using
You can also generate a detailed report with one line of code using DataPrep. here is a
create_report() method called on a data frame.
import pandas as pd from dataprep.eda import create_report df = pd.read_csv("parking_violations.csv") create_report(df)
You will get a complete and interactive report for variables and correlations as well as interactions and missing values.
DataPrep facilitates the amount and effort you need as a data scientist to explore the dataset. With just one line of code, you can get an overview of your dataset, missing values, correlations, and statistical description of the dataset, as you can see above.
To install DataPrep, run:
pip install dataprep
Discover Data Prep Documentation for more information.
2. Profiling pandas
Pandas Profiling generates profile reports from a Pandas DataFrame and allows you to perform similar types of EDAs to the other packages I discuss here. It has an extensive use case and more tutorials than any package.
With a single line of code, you can generate an EDA report using Pandas Profiling with descriptive statistics, correlations, missing value, text analysis and more.
ProfileReport() on the Philly data frame to generate an EDA report.
from pandas_profiling import ProfileReport profile = ProfileReport(df, title="Report") profile
Pandas Profiling generates a similar report with an elegant user interface (UI).
You can install using the pip package manager by running:
pip install pandas-profiling[notebook]
Be sure to visit the GitHub repository for more tutorials and documentation.
3. Sweet Viz
SweetViz offers in-depth EDA (target analysis, comparison, feature analysis, correlation) and interactive EDA in two lines of code! Additionally, SweetViz allows you to compare two datasets, such as training and test datasets for your machine learning projects.
To get a report from SweetViz you can run the following command on any data frame and it will generate an HTML report.
import sweetviz as sv analyze_report = sv.analyze(df) analyze_report.show_html(report.html', open_browser=False)
With AutoViz, you can automatically visualize any size data set with a single line of code in much greater detail. Here is a report generated with AutoViz using the Philly parking dataset.
from autoviz.AutoViz_Class import AutoViz_Class AV = AutoViz_Class() df_av = AV.AutoViz('parking.csv')
Note that you don’t even need Pandas to read the data. AutoViz will load it when you provide the dataset path. Here is the report we generated with AutoViz.
In AutoViz you have many more plots (ie, fiddle, boxplots and more) as well as statistical and probability values. However, the UI isn’t as polished as others’ reports, and you don’t get access to interactive plots.
To install AutoViz, run the following command:
pip install autoviz
All four packages offer similar functionality that lets you automate your EDA with simple, intuitive (often one-line!) code.
That said, of the four packages in this article, DataPrep provides a lot more functionality than just EDA. This can help you ingest more data sources and help you navigate large datasets faster.
Additionally, DataPrep’s clean API can help you clean up your dataset without too many hurdles.