Pandas is a widely-used Python library built on top of NumPy. Much of the rest of this course will be dedicated to learning about pandas and how it is used in the world of finance.
What is Pandas?
Pandas is a Python library created by Wes McKinney, who built pandas to help work with datasets in Python for his work in finance at his place of employment.
According to the library's website, pandas is "a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language."
Pandas stands for 'panel data'. Note that pandas is typically stylized as an all-lowercase word, although it is considered a best practice to capitalize its first letter at the beginning of sentences.
Pandas is an open source library, which means that anyone can view its source code and make suggestions using pull requests. If you are curious about this, visit the pandas source code repository on GitHub
The Main Benefit of Pandas
Pandas was designed to work with two-dimensional data (similar to Excel spreadsheets). Just as the NumPy library had a built-in data structure called an array
with special attributes and methods, the pandas library has a built-in two-dimensional data structure called a DataFrame
.
What We Will Learn About Pandas
As we mentioned earlier in this course, advanced Python practitioners will spend much more time working with pandas than they spend working with NumPy.
Over the next several lessons, we will cover the following information about the pandas library:
- Pandas Series
- Pandas DataFrames
- How To Deal With Missing Data in Pandas
- How To Merge DataFrames in Pandas
- How To Join DataFrames in Pandas
- How To Concatenate DataFrames in Pandas
- Common Operations in Pandas
- Data Input and Output in Pandas
- How To Save Pandas DataFrames as Excel Files for External Users
Moving On
To start, let's move to our next lesson and begin learning about pandas Series
, a special data structure available in the pandas library.