Load Twitter Data Into SQL Database

Problem Statement

The requirement is to extract the data from Twitter and load the extracted data into SQL Database. For this demonstration, I am going to read the Indian Film actor Kamal Haasan's tweets and load that into SQL Server 2016 Database.

Solution

The solution is to use Curl program to read from Twitter using Rest API then Load the output data into SQL Database

Let’s see the solution step by step

Step 1

First, we have to install and configure the curl.

“Curl ('Client for URLs') is a tool to transfer data from or to a server, using one of the supported protocols (HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP, DICT, TELNET, LDAP or FILE). The command is designed to work without user interaction“

Go to -> https://curl.haxx.se/download.html, download the latest version of Curl software under Win 32 - Generic or Win 64 – Generic based on your Windows OS bit.

SQL Server

SQL Server

Step 2

We don’t need to install the curl because the package is already built. Extract the downloaded zip file and copy curl.exe and ca-bundle.crt, paste that in “C:\Curl” (Create folder wherever you want).

SQL Server

Step 3

We have to setup the Environment variable path for Curl.

Go to My Computer -> Properties -> Advanced system settings -> Advanced -> Environment Variables -> Edit Path under System Variables section.

SQL Server

Click New -> Add the curl.exe folder location.

SQL Server

Run the below command in Windows command prompt to check whether Curl is installed and configured correctly.

SQL Server

Step 4

In order to read data from Twitter application, we have to create App in Twitter, Go to -> https://apps.twitter.com

Prerequisites

  • The user should have a Twitter account.
  • The user should have provided a mobile number in the Twitter account.

Create App by filling the required fields. If you do not have website details, give https://google.com/ a shot.

SQL Server

Click “Create your Twitter Application” button.

Step 5

Go to "Keys and Access Tokens" tab and note down Consumer Key (API Key) and Consumer Secret (API Secret).

SQL Server

Twitter supports OATH 2.0 Authentication so we have to generate Bearer token. For that, we need to convert API Key and API Secret in Base64 format. Go to https://www.base64encode.org/

Syntax

<Consumer Key (API Key)> : <Consumer Secret (API Secret)>

For example

ABCD:EFGH

SQL Server

Click Encode, then copy the encoded code - QUJDRDpFRkdI

Step 6

Enter the below command in Windows Command Prompt.

Syntax

Curl --compressed "https://api.twitter.com/oauth2/token" 

--header "Accept-Encoding: identity" --header "Authorization: Basic <Base64Code>"

--header "Content-Type: application/x-www-form-urlencoded; charset=UTF-8"

--data "grant type=client credentials"

Example

Curl --compressed "https://api.twitter.com/oauth2/token"  --header "Accept-Encoding: identity" --header "Authorization: Basic QUJDRDpFRkdI" --header "Content-Type: application/x-www-form-urlencoded;charset=UTF-8" --data "grant type=client credentials"

https://dev.twitter.com/oauth/application-only

SQL Server

The response will be in JSON format like below.

{"token_type":"bearer","access_token":"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"}

Note down the access token.

Step 6

Now, I am going to fetch tweets of the film actor Kamal Haasan. So I want to know his twitter account name

SQL Server

Account name - iKamalHaasan

Curl Command

Curl --get --compressed "https://api.twitter.com/1.1/statuses/user_timeline.json" --data "screen_name=ikamalhaasan" --data "count=200" --header "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" -o Kamal.txt

Note

Enter your bearer token.

Refer

https://dev.twitter.com/rest/reference/get/statuses/user_timeline

SQL Server

Output file is saved in the C:\Users\Madhan\Kamal_Tweets.txt

Step 7

Run the below SQL statement to load JSON Data into SQL Table.

  1. CREATE TABLE[dbo].[Tweets_JSON]  
  2.     (  
  3.         [JSON_STRING][varchar](maxNULL)  
  4. BULK INSERT[dbo].[Tweets_JSON]  FROM 'C:\Users\Madhan\Kamal_Tweets.txt'

SQL Server

Step 8

Run the below query to parse JSON String.

  1. SELECT J.*FROM[dbo].[Tweets_JSON]  
  2. CROSS APPLY OPENJSON(JSON_STRING)  
  3. With(  
  4.     [id_str] Varchar(100), [created_at] Varchar(100), [text] NVarchar(600), [source] Varchar(100), [retweet_count] Varchar(100), [favorite_count] Varchar(100), [lang] Varchar(100)  

SQL Server

Thank You!!

Next Recommended Readings