Pandas Basics I: Series and DataFrames

What is Pandas?

Pandas is a free software (software libre) data analysis library for the Python programming language. The library provides analysts and programmers data structures optimized for working with large data sets, and methods for examining and manipulating that data. It uses another free software library, NumPy, for underlying data structures, and Pyplot to generate plots – graphs, histograms, etc.
Continue reading

SQL to Pandas Translation

I’m experienced in working with SQL for data wrangling and analysis, but have recently started using the Python Pandas library for similar tasks. The thing I really like about Pandas is the ability to (combined with matplotlib) to plot/visualize the data once it’s been successfully curated. Coming from the SQL background, I’ve been approaching problems thinking in terms of SQL. So I’m documenting here some translations between SQL and Pandas’ queries. I’ll try to keep updating this as I continue to use Pandas.

Continue reading