Python Libraries Used in Data Analysis | NumPy & Pandas Libraries

İlayda Dastan
3 min readJul 2, 2021

Hello everyone! We know the importance of data analysis for all development processes. In this article, I will explain most important python libraries used in data analysis.

The purpose of Data Analysis is to extract useful information from data and taking the decision based upon the data analysis. There are various data analysis techniques. The most used data analysis techniques are:

  • Text Analysis
  • Statistical Analysis
  • Diagnostic Analysis
  • Predictive Analysis
  • Prescriptive Analysis

Data analysis requires rapid processing and cleaning of data. And Python has libraries to make data analysis easier. The libraries I will talk about in this article are NumPy and Pandas libraries.

NumPy library, which allows us to do many mathematical operations, is one of the best Python libraries for you to process data and do data analysis in detail. It allows us to manipulate arrays.

The pip command is used to install NumPy.

pip install numpy

NumPy array is used for modeling and abstracting vectors, matrices while doing data analysis.

array = np.array([2, 4, 6]) 
print(array)

When working on machine learning and deep learning applications, the shapes of the matrices tell you about your architecture. For this reason, the shape of the arrays is needed.

print(array.shape)

We can think of Pandas as an extension of NumPy. Pandas is in a structure that can produce solutions to the parts where NumPy is missing. With Pandas, data can be extracted from websites by reading html tags, and transferred to the desired platform (sql, excel, csv, etc.).

The pip command is used to install Pandas.

pip install pandas

In order to perform data analysis, pandas supports the following files.

  • CSV, JSON, HTML, Excel

Libraries are loaded with the import command. After typing the library you want to load, you can specify which abbreviation to express the library with the as command.

import pandas as pd
import numpy as np

Now let’s talk a little bit about Panda’s data structures. I say Pandas because Pandas and NumPy are often used together and are complementary to each other. When working with data analysis libraries, attention should be paid to the data types used. Pandas has two core data structures used to store data. These data types are Series and DataFrame.

Series are one-dimensional arrays. DataFrames can be defined as two-dimensional matrices. (or a series with each column)

Let’s simply create a Series sample array.

import pandas as pd

series1 = pd.Series([2.7, 4.3, 2.6, -5.8])

DataFrames are organised into colums (each of which is a Series), and each column can store a single data-type, such as floating point numbers, strings, boolean values etc. DataFrames can be indexed by either their row or column names.

A dataframe can be created in different ways. Let’s look at creating a dataframe using list in this usage. DataFrame can be created using single list.

import pandas as pdlst = ['ilayda', 'dastan', '21', 'student']

# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)

Output:

DataFrame can also be created with ndarray/lists.

data = {‘Name’:[‘Ilayda’, ‘John’],
‘Age’:[20, 21],
‘Job’: [‘engineer’, ‘teacher’]}

df = pd.DataFrame(data)

print(df)

Output:

It will be efficient to use the analytical features of these two libraries to set up the infrastructure for data analysis with Python.

References: https://cloudxlab.com/blog/numpy-pandas-introduction/

--

--