I’m experienced in working with SQL for data wrangling and analysis, but have recently started using the Python Pandas library for similar tasks. The thing I really like about Pandas is the ability to (combined with matplotlib) to plot/visualize the data once it’s been successfully curated. Coming from the SQL background, I’ve been approaching problems thinking in terms of SQL. So I’m documenting here some translations between SQL and Pandas’ queries. I’ll try to keep updating this as I continue to use Pandas.

select x from data where y = 'z' data.x[data['y'] == 'z'] select x from data where y like 'z' data.x[data['y'].str.contains('z')] select x from data where y <= n data.x[data['y'] <= n]

## Select

Here are some examples of select statements. Notice the different strategies used to get desired data. These strategies can often be combined.

### Select Single Column

# select x from data where y = 'z'

data.x[data['y'] == 'z']

# select x from data where y like 'z'

data.x[data['y'].str.contains('z')]

# select x from data where y <= n

data.x[data['y'] <= n]

### Select Multiple Columns

# select n columns (the ... is not python code)

data[['col 1', 'col 2', ..., 'col n']]

# select cols 1 & 2 where col 2 is greater than 0

data[['col 1', 'col 2']][data['col 2' > 0]]

# select all columns where col 1 equals 'x' and col 2 is greater than 0

data[(data['col 1'] == 'x') & (data['col 2'] > 0)]