Basics of pandas Series
Understand pandas Series, creation methods, and attributes.
Basics of pandas Series
pandas provides two different data structures called 'Series' and 'DataFrame' and these data structures dictate the format of how data is organized, stored and managed in pandas. Understanding these data structures of pandas is the key to learning pandas.
So, to start off, this lesson focuses on pandas Series. You will learn what is a pandas Series, how to create a pandas Series in Python, and what are the attributes of a pandas Series.
What is a pandas Series?
A pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floats, etc). It has two main components: index and data values.
Consider a pandas Series where values on the left side (0, 1, 2, 3, 4) are the ‘index’ of the pandas Series and the values on the right (London, New York, Tokyo, Paris, Beijing) are the actual ‘data values’ of the Series.
As it can be observed, a pandas Series resembles closely with a column of a spreadsheet table where each value can be identified by an index.
How to create a pandas Series?
A pandas Series can be created using Python and NumPy data structures such as a Python List, Dictionary, NumPy arrays, scalar values, etc. We will be looking at a few methods for creating a pandas Series in the sections below.
The general syntax for creating a pandas Series is by calling the Series() method from pandas:
pandas.Series(data)
Here, data can be any value that is array-like, iterable, dict, or a scalar. It contains the data to be stored in the Series.
Creating a pandas Series from a Python List
# Importing the pandas library as pd
import pandas as pd
# Initializing a Python list
lst = ["Python", "Java", "C", "Ruby"]
# Creating a pandas Series from a Python List
series = pd.Series(lst)
# Printing the Series
print(series)Here, we did not specify any index when creating our Series, so pandas assigned numerical values increasing from 0 as the label. If you want to create Series with meaningful labels, you may specify the index parameter during Series initialization. Similarly, you can also assign a name to the Series by specifying the name parameter.
# Initializing a Python list
lst = ["Python", "Java", "C", "Ruby"]
# Creating a pandas Series from a Python List
series = pd.Series(
lst, index=["1st", "2nd", "3rd", "4th"], name="Programming Languages"
)
# Printing the Series
print(series)Note that we can also convert a pandas Series into a Python List by calling the to_list() method from the Series itself.
# Converting pandas Series into a Python list
print(series.to_list())Creating a pandas Series from a NumPy array
To create a Series from a NumPy array, we need to import the NumPy module.
# Importing the NumPy library as np
import numpy as np
# Initializing a NumPy array
np_array = np.array(["Apple", "Mango", "Grapes", "Pineapple"])
# Creating a pandas Series from a NumPy array
series = pd.Series(np_array, name="Fruits")
# Printing the Series
print(series)Note that we can also convert a pandas Series into a NumPy array by calling the to_numpy() method from the Series itself.
# Converting pandas Series into a NumPy array
print(series.to_numpy())Creating a pandas Series from a Python Dictionary
To create a Series from a Python Dictionary, we need to create a dictionary and pass it to the data parameter. In this case, the array of the index is the keys of the dictionary and values are filled with corresponding values of the dictionary.
# Initializing a Python Dictionary
dictionary = {"1st": "Newton", "2nd": "Einstein", "3rd": "Tesla", "4th": "Edison"}
# Creating a pandas Series from a Python Dictionary
series = pd.Series(dictionary)
# Printing the Series
print(series)Note that we can also convert a pandas Series into a Python Dictionary by calling the to_dict() method from the Series itself.
# Converting pandas Series into a Python Dictionary
print(series.to_dict())Attributes of a pandas Series
The attributes of a pandas Series define the intrinsic information of the Series. The following table shows a list of commonly accessed pandas Series attributes along with their meaning.
Attributes of a pandas Series | Definition |
|---|---|
| dtype | Return the dtype object of the underlying data. |
| shape | Return a tuple of the shape of the underlying data. |
| size | Return the number of elements in the underlying data. |
| values | Return Series as ndarray or ndarray-like depending on the dtype. |
| index | The index (axis labels) of the Series. |
| name | Return the name of the Series. |
| iloc | Purely integer-location based indexing for selection by position. |
| loc | Access a group of rows and columns by label(s) or a boolean array. |
Consider the examples given below that shows you the above-mentioned attributes of a pandas Series.
# Initializing a Python list
lst = [100, 200, 300, 400]
# Creating a pandas Series from a Python List
series = pd.Series(
lst, index=["1st", "2nd", "3rd", "4th"], name="Numbers"
)
# Printing the Series
print(series)
# Returns the data type of the underlying data
print(series.dtype)
# Returns a tuple of the shape of the underlying data
print(series.shape)
# Returns the number of elements in the underlying data
print(series.size)
# Returns Series as ndarray or ndarray-like depending on the dtype
print(series.values)
# Returns the index of the Series
print(series.index)
# Returns the name of the Series
print(series.name)Now, let us also look at attributes such as iloc and loc that is used for selecting a data value in an index of the Series.
iloc is used for purely integer-location based indexing for selection by position. Note that the indexing in pandas Series starts from 0.
# Printing the Series
print(series)
# Accessing value at index=0
print(series.iloc[0])
# Accessing value at index=2
print(series.iloc[2])
# Accessing value at index=-1
print(series.iloc[-1])
# Accessing value at index=-2
print(series.iloc[-2])You can also perform Series slicing (similar to NumPy arrays and Python Lists).
# Accessing values at index=0 to index=2
print(series.iloc[0:3])
# Accessing values at index=0 to index=2 with a step of 2
print(series.iloc[0:3:2])Next, loc is used to access a group of rows and columns by label(s) or a boolean array.
# Printing the Series
print(series)
# Accessing value at label="1st"
print(series.loc["1st"])
# Accessing value at label="3rd"
print(series.loc["3rd"])
# Accessing values at label="1st" and label="3rd"
print(series.loc[["1st", "3rd"]])