I had been using SQL extensively to work on the tabular data format. So, when I started learning python, it was essential for me to write SQL query in python. Knowing this, really helped me in using python extensively for data exploration process. I hope that people who are learning python for data analysis and are familiar with SQL will find this article very helpful in writing SQL query in Python.
We will be replicating SELECT query in this post.
I have used IMDB data from 2006 to 2016 (https://www.kaggle.com/PromptCloudHQ/imdb-data) for this purpose.
Importing the data set in Jupyter notebook
Now, we have imported the data set and will assume that same data is in a table format in SQL too. We will write SQL queries and python code on this data set to perform simple selecting activities.
- Let us look at the table so that we can see the columns and its values. We will look at few rows of the table.
SQL: Select top 5 * from data.
Python:
- Now, we will try to reference the whole table.
SQL: Select * from data
Python:
- If you want to look at a particular column of the table for e.g. Title
SQL: Select title from data.
Python:
- If you want to reference multiple columns at a time e.g. Title and Genre
SQL: Select title, genre from data
Python:
- While selecting values, most of the time a condition is necessary. In SQL we do this by using WHERE command e.g. Select all the values from the year 2014
SQL: Select * from data where year = 2014
Python:
- We can also pass more than one condition while selecting values e.g. Select all the rows for which year is 2014 and rating is more than 8.
SQL: Select * from data where year = 2014 and rating > 8
Python:
- We can also implement either or condition e.g. select all the values for which either year is 2014 or rating is more than 8
SQL: Select * from data where year = 2014 or rating > 8
Python:
- Checking for the null and not null values.
SQL: Select * from data where year is null
SQL: select * from data where year is not null
These are the basic commands everyone performs while doing data exploration and knowing this will make the whole process easier.