Imbalance of data can cause issues.

In real world many times we come across classification problem of the data that has imbalance. The data is skewed towards specific label or class only. What it means that one specific label is present in the data in much less quantity as compared to the other label(s). There are many real world classification problems that have imbalanced target variable (class) distribution such as fraud detection (large number of genuine transactions), spam filtering (large number of good emails) and many medical diagnosis cases like cancer detection (large number of patients do not have cancer). …


I had to write a python script to calling a series of APIs based on business logic. The issue was that these APIs needed an authentication token to execute. In absence of the token I got 403 Forbidden error because the API was not able to identify that the call (from program) is coming from authentic source.

I googled on how to call API from python code but every article I saw assumed an open, public API that did not need any authorization through token. …


Engineer creating Data Pipeline :D

In today’s world we are getting data from variety of sources. The data is generated from internal sources like IoT devices fitted in an automobile, POS machines in a store, inventory data of a retail store and from external sources like https://data.gov.in/ for India, https://www.data.gov/ for US etc. (I am not considering the data that is collected from direct sources like physical surveys and interviews but only that is generated or stored digitally in an internal or external system. Eventually the survey and interview data stored in a digital format will become part of either internal or external system.) …


Most of the Data Scientist coding in Python are comfortable using Jupyter Notebook. One of the most basic skills that most of the Data Scientist lack is writing a production ready code.

As Jupyter Notebook gives a seamless IDE for examining the variables and displaying the print statements right below the code, the Data Scientist find it difficult to write a production script. A production ready script should be generic and write the debug or error statements in a file.

What is a generic code? A code should have minimal hard-coding. The program should not be changed if there is…


I wanted to display statistics for a specific district on the map. The first step towards this was to find the latitude and longitude for the district so that pins can be displayed on map. Based on the pins, I wanted to display various data related to the district.

In this article I will discuss about the steps taken to get the latitude and longitude using Google’s Geocoding API.

Getting API Key

Before we write the script for getting latitude and longitude for districts we have to get an API key for Google Geocoding API. To get the API key, first login to…


One of the first things done in EDA is to analyze missing values. Depending on the data, many types of values can be considered as missing values. Sometimes cells with spaces only are considered as missing values and other times empty cells are considered as missing values. At times, there are blank lines present in files that will be considered as missing values in the program.

In this article, I will discuss about how to convert cells with spaces only and empty cell to NaN values (np.nan) so that they can be identified easily during exploration of missing values. …


Introduction

In this article, I will walk through the Python code for Web Scraping, how to deal with tables in webpage and downloading files from FTP server.

Web Scraping is technique to extract data from website and store it in a logical format in either local file or cloud. Web Scraping works by traversing though the HTML code of website and extracting data from it based on various tags of the website.

We will be scraping the Rail Road Corporation of Texas website to download some files from there FTP server. This is a fairly complex website with different datasets present…

Divij Sharma

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store