Quantcast
Channel: ListenData
Viewing all 425 articles
Browse latest View live

Case Study : Sentiment analysis using Python

$
0
0
In this article, we will walk you through an application of topic modelling and sentiment analysis to solve a real world business problem. This approach has a onetime effort of building a robust taxonomy and allows it to be regularly updated as new topics emerge. This approach is widely used in topic mapping tools. Please note that this is not a replacement of the topic modelling methodologies such as Latent Dirichlet allocation (LDA) and it is beyond them.
Text Mining Case Study using Python


Case Study : Topic Modeling and Sentiment Analysis

Suppose you are head of the analytics team with a leading Hotel chain “Tourist Hotel”. Each day, you receive hundreds of reviews of your hotel on the company’s website and multiple other social media pages. The business has a challenge of scale in analysing such data and identify areas of improvements. You use a taxonomy based approach to identify topics and then use a built-in functionality of Python NLTK package to attribute sentiment to the comments. This will help you in identifying what the customers like or dislike about your hotel.

Data Structure

The customer review data consists of a serial number, an arbitrary identifier to identify each review uniquely and a text field that has the customer review.
Example : Sentiment Analysis

Steps to topic mapping and sentiment analysis

1. Identify Topics and Sub Topics
2. Build Taxonomy
3. Map customer reviews to topics
4. Map customer reviews to sentiment

Step 1 : Identifying Topics
The first step is to identify the different topics in the reviews. You can use simple approaches such as Term Frequency and Inverse Document Frequency or more popular methodologies such as LDA to identify the topics in the reviews. In addition, it is a good practice to consult a subject matter expert in that domain to identify the common topics. For example, the topics in the “Tourist Hotel” example could be “Room booking”, “Room Price”, “Room Cleanliness”, “Staff Courtesy”, “Staff Availability ”etc.

Step 2 : Build Taxonomy

I. Build Topic Hierarchy

Based on the topics from Step 1, Build a Taxonomy. A Taxonomy can be considered as a network of topics, sub topics and key words.
Topic Hierarchy
II. Build Keywords
The taxonomy is built in a CSV file format. There are 3 levels of key words for each sub topic namely, Primary key words, Additional key words and Exclude key words. The keywords for the topics need to be manually identified and added to the taxonomy file. The TfIDf, Bigram frequencies and LDA methodologies can help you in identifying the right set of keywords. Although there is no one best way for building key words, below is a suggested approach.

i. Primary key words are the key words that are mostly specific to the topic. These key words need to be mutually exclusive across different topics as far as possible.

ii. Additional key words are specific to the sub topic. These key words need not be mutually exclusive between the topics but it is advised to maintain exclusivity between sub topics under the same sub topic. To explain further, let us say, there is a sub topic “Price” under the topics “Room” as well as “Food”, then the additional key words will have an overlap. This will not create any issue as the primary key words are mutually exclusive.

iii. Exclude key words are key words that are used relatively less than the other two types. If there are two sub topics that have some overlap of additional words OR for example, if the sub topic “booking” is incorrectly mapping comments regarding taxi bookings as room booking, such key words could be used in exclude words to solve the problem.

Snapshot of sample taxonomy:

Sample Taxonomy

Note: while building the key word list, you can put an “*” at the end as it helps as wild character. For example, all the different inflections of “clean” such as “cleaned”, “cleanly”, “cleanliness” can be handled by one keyword “clean*”. If you need to add a phrase or any keyword with a special character in it, you can wrap it in quotes. For example, “online booking”, Wi-Fi” etc need to be in double quotes.


Benefits of using taxonomic approach
Topic modelling approaches identify topics based on the keywords that are present in the content. For novel keywords that are similar to the topics but may come up in the future are not identified. There could be use cases where businesses want to track certain topics that may not always be identified as topics by the topic modelling approaches.

Step 3 : Map customer reviews to topic

Each customer comment is mapped to one or more sub topics. Some of the comments may not be mapped to any comment. Such instances need to be manually inspected to check if we missed any topics in the taxonomy so that it can be updated. Generally, about 90% of the comments have at least one topic. The rest of the comments could be vague. For example: “it was good experience” does not tell us anything specific and it is fine to leave it unmapped.
Snapshot of how the topics are mapped:
Topic Mapping

Below is the python code that helps in mapping reviews to categories. Firstly, import all the libraries needed for this task. Install these libraries if needed.
import pandas as pd
import numpy as np
import re
import string
import nltk
from nltk.tokenize import word_tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer

Download Datafiles
Customer Review
Taxonomy

Import reviews data
df = pd.read_csv("D:/customer_reviews.csv");
Import taxonomy
df_tx = pd.read_csv("D:/ taxonomy.csv");

Build functions for handling the various repetitive tasks during the mapping exercise. This function identifies taxonomy words ending with (*) and treats it as a wild character. This takes the Keywords as input and uses regular expression to identify all the other keyword matches as output.
def asterix_handler(asterixw, lookupw):
mtch = "F"
for word in asterixw:
for lword in lookupw:
if(word[-1:]=="*"):
if(bool(re.search("^"+ word[:-1],lword))==True):
mtch = "T"
break
return(mtch)
This function removes all punctuations. This is helpful in terms of data cleaning. You can edit the list of punctuations for your own custom punctuation removal at the place highlighted in amber.
def remov_punct(withpunct):
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
without_punct = ""
char = 'nan'
for char in withpunct:
if char not in punctuations:
without_punct = without_punct + char
return(without_punct)

Function to remove just the quotes(""). This is different from the above as this only handles double quotes. Recall that we wrap phrases or key words with special characters in double quotes.
def remov_quote(withquote):
quote = '"'
without_quote = ""
char = 'nan'
for char in withquote:
if char not in quote:
without_quote = without_quote + char
return(without_quote)

Split each document by sentences and append one below the other for sentence level topic mapping.
sentence_data = pd.DataFrame(columns=['slno','text'])
for d in range(len(df)):
doc = (df.iloc[d,1].split('.'))
for s in ((doc)):
temp = {'slno': [df['slno'][d]], 'text': [s]}
sentence_data= pd.concat([sentence_data,pd.DataFrame(temp)])
temp = ""

Drop empty text rows if any and export data
sentence_data['text'].replace('',np.nan,inplace=True);      
sentence_data.dropna(subset=['text'], inplace=True);

data = sentence_data
cat2list = list(set(df_tx['Subtopic']))
#data = pd.concat([data,pd.DataFrame(columns = list(cat2list))])
data['Category'] = 0
mapped_data = pd.DataFrame(columns = ['slno','text','Category']);
temp=pd.DataFrame()
for k in range(len(data)):
comment = remov_punct(data.iloc[k,1])
data_words = [str(x.strip()).lower() for x in str(comment).split()]
data_words = filter(None, data_words)
output = []

for l in range(len(df_tx)):
key_flag = False
and_flag = False
not_flag = False
if (str(df_tx['PrimaryKeywords'][l])!='nan'):
kw_clean = (remov_quote(df_tx['PrimaryKeywords'][l]))
if (str(df_tx['AdditionalKeywords'][l])!='nan'):
aw_clean = (remov_quote(df_tx['AdditionalKeywords'][l]))
else:
aw_clean = df_tx['AdditionalKeywords'][l]
if (str(df_tx['ExcludeKeywords'][l])!='nan'):
nw_clean = remov_quote(df_tx['ExcludeKeywords'][l])
else:
nw_clean = df_tx['ExcludeKeywords'][l]
Key_words = 'nan'
and_words = 'nan'
and_words2 = 'nan'
not_words = 'nan'
not_words2 = 'nan'

if(str(kw_clean)!='nan'):
key_words = [str(x.strip()).lower() for x in kw_clean.split(',')]
key_words2 = set(w.lower() for w in key_words)

if(str(aw_clean)!='nan'):
and_words = [str(x.strip()).lower() for x in aw_clean.split(',')]
and_words2 = set(w.lower() for w in and_words)

if(str(nw_clean)!= 'nan'):
not_words = [str(x.strip()).lower() for x in nw_clean.split(',')]
not_words2 = set(w.lower() for w in not_words)

if(str(kw_clean) == 'nan'):
key_flag = False
else:
if set(data_words) & key_words2:
key_flag = True
else:
if(asterix_handler(key_words2, data_words)=='T'):
key_flag = True

if(str(aw_clean)=='nan'):
and_flag = True
else:
if set(data_words) & and_words2:
and_flag = True
else:
if(asterix_handler(and_words2,data_words)=='T'):
and_flag = True
if(str(nw_clean) == 'nan'):
not_flag = False
else:
if set(data_words) & not_words2:
not_flag = True
else:
if(asterix_handler(not_words2, data_words)=='T'):
not_flag = True
if(key_flag == True and and_flag == True and not_flag == False):
output.append(str(df_tx['Subtopic'][l]))
temp = {'slno': [data.iloc[k,0]], 'text': [data.iloc[k,1]], 'Category': [df_tx['Subtopic'][l]]}
mapped_data = pd.concat([mapped_data,pd.DataFrame(temp)])
#data['Category'][k] = ','.join(output)

#output mapped data
mapped_data.to_csv("D:/ mapped_data.csv",index = False)

Step 4: Map customer reviews to sentiment
#read category mapped data for sentiment mapping
catdata = pd.read_csv("D:/mapped_data.csv")
#Build a function to leverage the built-in NLTK functionality of identifying sentiment. The output 1 means positive, 0 means neutral and -1 means negative. You can choose your own set of thresholds for positive, neutral and negative sentiment.

def findpolar(test_data):
sia = SentimentIntensityAnalyzer()
polarity = sia.polarity_scores(test_data)["compound"];
if(polarity >= 0.1):
foundpolar = 1
if(polarity <= -0.1):
foundpolar = -1
if(polarity>= -0.1 and polarity<= 0.1):
foundpolar = 0
return(foundpolar)

Output the sentiment mapped data
catdata.to_csv("D:/sentiment_mapped_data.csv",index = False)
Output : Sentiment Analysis

Additional Reading

Polarity Scoring Explained: 

NLTK offers Valence Aware Dictionary for sEntiment Reasoning(VADER) model that helps in identifying both the direction (polarity) as well as the magnitude(intensity) of the text. Below is the high-level explanation of the methodology.

VADER is a combination of lexical features and rules to identify sentiment and intensity. Hence, this does not need any training data.  To explain further, if we take an example of the sentence “the food is good”, it is easy to identify that it is positive in sentiment. VADER goes a step ahead and identifies intensity based on rule based approach such as punctuation, capitalised words and degree modifications.

The polarity scores for the different variations of similar sentences is as follows:

Polarity Score

Use cases where training sentiment models is suggested over Sentiment Intensity Analyzer:

Although VADER works well on multiple domains, there are could be some domains where it is preferred to build one’s own sentiment training models. Below are the two examples of such use cases.
  1. Customer reviews on alcoholic beverages:
  2. It is common to observe people using otherwise negative sentiment words to describe positive experience. For example, the sentence “this sh*t is fu**ing good” means that this drink is good but VADER approach gives it a “-10” suggesting negative sentiment

  3. Patient reviews regarding hospital treatment
  4. Patient’s description of their problem is a neutral sentiment but VADER approach considers it as negative sentiment. For example, the sentence “I had an unbearable back pain and your medication cured me in no time” is given “-0.67” suggesting negative sentiment.

Take Screenshot of Webpage using R

$
0
0
Programmatically taking screenshots of a web page is very essential in a testing environment to see about the web page. But the same can be used for automation like getting the screenshot of the news website every morning into your Inbox or generating a report of candidates’ github activities. But this wasn’t possible in command line until the rise of headless browsers and javascript libraries supporting them. Even when such JavaScript libraries where made available, R programmers did not have any option to integrate such functionality in their code.
That is when webshot an R package that helps R programmers take web screenshots programmatically with the help of phantomJS running in the backend.
Take Screenshot from R


What is PhantomJS?

PhantomJS is a headless webkit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

PhantomJS is an optimal solution for the following:
  • Headless website testing
  • Screen Capture
  • Page Automation
  • Network Monitoring

Webshot : R Package 

The webshot package allows users to take screenshots of web pages from R with the help of PhantomJS. It also can take screenshots of R Shiny App and R Markdown Documents (both static and interactive).

Install and Load Package

The stable version of webshot is available on CRAN hence can be installed using the below code:
install.packages('webshot')
library('webshot')

Also, the latest development version of webshot is hosted on github and can be installed using the below code:
#install.packages('devtools')
devtools::install_github('wch/webshot')

Initial Setup

As we saw above, the R package webshot works with PhantomJS in the backend, hence it is essential to have PhantomJS installed on the local machine where webshot package is used. To assist with that, webshot itself has an easy function to get PhantomJS installed on your machine.
webshot::install_phantomjs()
The above function automatically downloads PhantomJS from its website and installs it. Please note this is only a first time setup and once both webshot and PhantomJS are installed these above two steps can be skipped for using the package as mentioned in the below sections.

Now, webshot package is installed and setup and is ready to use. To start with let us take a PDF copy of a web page.

Screenshot Function

webshot package provides one simple function webshot() that takes a webpage url as its first argument and saves it in the given file name that is its second argument. It is important to note that the filename includes the file extensions like '.jpg', '.png', '.pdf' based on which the output file is rendered. Below is the basic structure of how the function goes:
library(webshot)

#webshot(url, filename.extension)
webshot("https://www.listendata.com/", "listendata.png")

If no folder path is specified along with the filename, the file is downloaded in the current working directory which can be checked with getwd().

Now that we understood the basics of the webshot() function, It is time for us to begin with our cases - starting with downloading/converting a webpage as a PDFcopy.

Case #1: PDF Copy of WebPage

Let us assume, we would like to download Bill Gates' notes on Best Books of 2017 as a PDF copy.

#loading the required library
 library(webshot)

#PDF copy of a web page / article
 webshot("https://www.gatesnotes.com/About-Bill-Gates/Best-Books-2017",
 "billgates_book.pdf",
 delay = 2)

The above code generates a PDF whose (partial) screenshot is below:
Snapshot of PDF Copy

Dissecting the above code, we can see that the webshot( ) function has got 3 arguments supplied with it.
  1. URL from which the screenshot has to be taken. 
  2. Output Filename along with its file extensions. 
  3. Time to wait before taking screenshot, in seconds. Sometimes a longer delay is needed for all assets to display properly.
Thus, a webpage can be converted/downloaded as a PDF programmatically in R.

Case #2: Webpage Screenshot (Viewport Size)

Now, I'd like to get an automation script running to get screenshot of a News website and probably send it to my inbox for me to see the headlines without going to the browser. Here we will see how to get a simple screenshot of livemint.com an Indian news website.
#Screenshot of Viewport
webshot('https://www.livemint.com/','livemint.png', cliprect = 'viewport')
While the first two arguments are similar to the above function, there's a new third argument cliprect which specifies the size of the Clipping rectangle.

If cliprectis unspecified, the screenshot of the complete web page is taken (like in the above case). Since we are updated in only the latest news (which is usually on the top of the website), we use cliprect with the value 'viewport'which clips only the viewport part of the browser, as below.

Screenshot of Viewport of Browser

Case #3: Multiple Selector Based Screenshots

All the while we have seen taking simple screenshots of the whole pages and we dealt with one screenshot and one file, but that is not what usually happens when you are dealing with automation or perform something programmatically. In most of the cases we end up performing more than one action, hence this case deals with taking multiple screenshotsand saving multiple files. But instead of taking multiple screenshots of different urls (which is quite straightforward), we will screenshots of different sections of the same web page with different CSS selector and save them in respective files.
#Multiple Selector Based Screenshots
webshot("https://github.com/hadley",
 file = c("organizations.png","contributions.png"),
 selector = list("div.border-top.py-3.clearfix","div.js-contribution-graph"))
In the above code, we take screenshot of two CSS Selectors from the github profile page of  Hadley Wickham and save them in two PNG files - organizations.png and contributions.png.

Contributions.png

Organizations.png
Thus, we have seen how to use the R package webshot for taking screenshots programmatically in R. Hope, this post helps fuel your automation needs and helps your organisation improve its efficiency.

References

Identify Person, Place and Organisation in content using Python

$
0
0
This article outlines the concept and python implementation of Named Entity Recognition using StanfordNERTagger. The technical challenges such as installation issues, version conflict issues, operating system issues that are very common to this analysis are out of scope for this article.

NER NLP using Python

Table of contents:

1. Named Entity Recognition defined
2. Business Use cases
3. Installation Pre-requisites
4. Python Code for implementation
5. Additional Reading: CRF model, Multiple models available in the package
6. Disclaimer

1. Named Entity Recognition Defined
The process of detecting and classifying proper names mentioned in a text can be defined as Named Entity Recognition (NER). In simple words, it locates person name, organization and location etc. in the content. This is generally the first step in most of the Information Extraction (IE) tasks of Natural Language Processing.
NER Sample

2. Business Use Cases

There is a need for NER across multiple domains. Below are a few sample business use cases for your reference.
  1. Investment research: To identify the various announcements of the companies, people’s reaction towards them and its impact on the stock prices, one needs to identify people and organisation names in the text
  2. Chat-bots in multiple domains: To identify places and dates for booking hotel rooms, air tickets etc.
  3. Insurance domain: Identify and mask people’s names in the feedback forms before analyzing. This is needed for being regulatory compliant(example: HIPAA)

3. Installation Prerequisites
2.Unzip the zipped folder and save in a drive.
3.Copy the “stanford-ner.jar” from the folder and save it just outside the folder as shown in the image
4.Download the caseless models from https://stanfordnlp.github.io/CoreNLP/history.html by clicking on “caseless” as given below. The models in the first link work as well. However, the caseless models help in identifying named entities even when they are not capitalised as required by formal grammar rules. 
5.Save the folder in the same location as the Stanford NER folder for ease of access
Stanford NER Installation - Step1





NER Installation - Step2







4. Python Code for implementation:
#Import all the required libraries.
import os
from nltk.tag import StanfordNERTagger
import pandas as pd

#Set environmental variables programmatically.
#Set the classpath to the path where the jar file is located
os.environ['CLASSPATH'] = "<path to the file>/stanford-ner-2015-04-20/stanford-ner.jar"

#Set the Stanford models to the path where the models are stored
os.environ['STANFORD_MODELS'] = '<path to the file>/stanford-corenlp-caseless-2015-04-20-models/edu/stanford/nlp/models/ner'

#Set the java jdk path
java_path = "C:/Program Files/Java/jdk1.8.0_161/bin/java.exe"
os.environ['JAVAHOME'] = java_path


#Set the path to the model that you would like to use
stanford_classifier  =  '<path to the file>/stanford-corenlp-caseless-2015-04-20-models/edu/stanford/nlp/models/ner/english.all.3class.caseless.distsim.crf.ser.gz'

#Build NER tagger object
st = StanfordNERTagger(stanford_classifier)

#A sample text for NER tagging
text = 'srinivas ramanujan went to the united kingdom. There he studied at cambridge university.'

#Tag the sentence and print output
tagged = st.tag(str(text).split())
print(tagged)

Output
[(u'srinivas', u'PERSON'), 
(u'ramanujan', u'PERSON'),
(u'went', u'O'),
(u'to', u'O'),
(u'the', u'O'),
(u'united', u'LOCATION'),
(u'kingdom.', u'LOCATION'),
(u'There', u'O'),
(u'he', u'O'),
(u'studied', u'O'),
(u'at', u'O'),
(u'cambridge', u'ORGANIZATION'),
(u'university', u'ORGANIZATION')]

5. Additional Reading

StanfordNER algorithm leverages a general implementation of linear chain Conditional Random Fields (CRFs) sequence models. CRFs seem very similar to Hidden Markov Model but are very different.

Below are some key points to note about the CRFs in general.
  1. It is a discriminative model unlike the HMM model and thus models the conditional probability
  2. It does not assume independence of features unlike the HMM model. This means that the current word, previous word, next word are all considered for model as features
  3. Relative to HMM or Max ent Markov Models, CRFs are the slowest

6. Disclaimer
This article explains the implementation of StanfordNER algorithm for research purposes and does not promote it for commercial gains. For any questions on commercial aspects of implementing this algorithm, please contact Stanford University

Regex Tutorial with Examples

$
0
0
This tutorial covers various concepts of regular expression (regex) with hands-on examples. It also includes usage of regex using various tools such as R and Python.

Introduction

regex is an acronym for 'Regular Expression'. It is mainly used in extracting sub-string from string by searching a specific search pattern. The search pattern is defined by regular expression.

The search pattern can be finding a single letter, a fixed string or a complex pattern which consists of numeric, punctuation and character values.
Regular expressions can be used to search and replace text.
Regex Made Easy


Uses of Regular expression

There are several use-cases of regular expression in real-world. Some of them are as follows -
  1. Fetch email addresses mentioned in the long paragraph
  2. Validate 10-digit phone number, Social Security Number and email address
  3. Extract text from HTML or XML code
  4. Rename multiple files at a single run
  5. Remove punctuation specified in the text
  6. Web scraping : Searching specific content from all the web pages that contain a specific string
  7. Replace complex pattern with blank or specific character


Lets start with the basics

1. Anchor and Word Boundaries

SymbolDescription
^Beginning of line
$End of line
\bWhole word

Examples

1. ^abc matches the string that begins with abc in text 'abcd'
Test it yourself!

2. ^the matches the string that starts with the in text 'the beginning'
Test it yourself!

3. done$ matches the string that ends with done in text 'I am done'
Test it yourself!

4. \ban\b matches the whole word an in text 'Elephant an animal'
\ban\b does not match an from Elephant and animal as it only perform the whole word searching.
 Test it yourself!

2. OR Condition

OR condition can be defined by symbols | or [ ]. See the examples below.

1. the[m|n] matches strings them or then in text 'them then there theme'
Test it yourself!

2. the[mn] is equivalent to the[m|n]
Test it yourself!

3. \bthe[mn]\b matches the complete them or then in text 'them then there theme'
Test it yourself!

3. Case Insensitive

Search patterns mentioned in all of the above examples are case-sensitive. To make it case insensitive, we have to use the expression (?i)

1. (?i)abc matches both abc and ABC in text 'abc ABC'
Test it yourself!

2. (?i)a[bd]a performs insensitive match 'a' followed by either b or d and then a in text 'abc ABA Ada'
Test it yourself!

4. Quantifiers

It talks about quantity of element(s). In simple words, it means how often a particular regex element can occur.
ExpressionDescription
*Item occurs zero or more times
+Item occurs one or more times
?Item occurs zero or one time
{A}Item occurs A number of times
{A,B}Item occurs between A and B times
.Any character
.*Matches zero or more of any character

1. def* matches strings that contains de then followed by f zero or more times. Example - dedefdeff defff
Test it yourself!

2. def+ matches strings having de then followed by f at least 1 time. Example - def deff defff
Test it yourself!

3. \bdef?\b matches strings having exact match of whole de then followed by f zero or one time. Example - de def
Test it yourself!

4. \bdef{2}\b matches strings having exact match of de then followed by exactly two times. Example - deff
Test it yourself!

5. \bdef{2,}\b matches strings having exact match of de then followed by two or more times. Example - deff defff
Test it yourself!

6. \bdef{3,4}\b matches strings having exact match of de then followed by either 3 or 4 times. Example - deff defff
Test it yourself!

7. a.* matches all characters after a
Test it yourself!

5. Create Grouping

By using regular expression inside ( ), you can create a group which would let you apply OR condition to portion of regex or you can put in quantifier to the entire group.

It also helps to extract a portion of information from strings.

ab(cd|de)* matches strings having ab then followed by either cd or de zero or more times.
Test it yourself!

6. Back Reference

(name)\1 matches text 'name' that is matched first.
Test it yourself!

Replace (Substitution) using Back-reference

(ab|cd)e(fg|hi) matches either ab or cd then followed by e then either fg or hi
Enter \1\2 in substitution, it will return values of first and second group.
Test it yourself!

7. Lazy Quantifier

Lazy (or non-greedy) quantifier matches a regex element as few times as possible. However greedy quantifier matches a regex element as many as possible.
You can covert a greedy quantifier into a lazy quantifier by simply adding a ?

<.*?> matches strings having <character(s) >.
Regex lazy quantifier


8. How to program literal meaning of dot, asterisk

By using backslash \  you can avoid asterisk and dot. In other words, it makes regex understand the literal meaning of character.
abc\* means abc* not abcc

9. POSIX Regular Expressions

POSIX expressions use square brackets. Like regular expressions, it matches characters, digits, punctuations and many more
POSIXDescriptionASCII
[:digit:]Digits[0-9]
[:lower:]Lowercase letters[a-z]
[:upper:]Uppercase letters[A-Z]
[:alpha:]Lower and uppercase letters[a-zA-Z]
[:alnum:]Lower and uppercase letters and digits[a-zA-Z0-9]
[:blank:]Space and tab[ \t]
[:space:]All whitespace characters, including line breaks[ \t\r\n\v\f]
[:punct:]Punctuations"[!\#$%()*+,\-./:;?@\\\\]^_'{|}~]"

Select string having first letter character followed by numeric
[[:alpha:]][[:digit:]]+
  1. [[:alpha:]] means any letter character
  2. [[:digit:]] means any digit
  3. +  means previous one or more time
Test it yourself!

How to use regex with R and Python

R

1. grep(pattern, x)
Search for a particular pattern in each element of a vector x

2. gsub(pattern, replacement, x)
Replace a particular pattern in each element of a vector x
x = "sample text B2 testing B52"
gsub('[[:alpha:]][[:digit:]]+', '',x)

Python

The package re can be used for regular expressions in Python.

1. re.search(pattern, x)
Search for a particular pattern in each element of a vector x

2. re.sub(pattern, replacement, x)
Replace a particular pattern in each element of a vector x
import re
x = 'Welcome to Python3.6'
re.sub( '[a-zA-Z]+[0-9|.]+','', x)

Exercises : Regular Expression

1. Replace abbreviation of thousand (K) with 000?

x = "K 25K 2K"
Desired Output : K 25000 2000

Show Solution
gsub('([0-9])K', '\\1000',x)

Using two backward slash as a single backward slash not allowed in R

2. Remove extra characters

x = "var1_avg_a1 var1_a_avg_7"
Desired Output :var1 var1_a

Show Solution
gsub('_avg_.*?[0-9]', '',x)

? making the regular expression non-greedy (lazy) quantifier

Install and Load Multiple R Packages

$
0
0
In enterprise environment, we generally need to automate the process of installing multiple R packages so that user does not have to install them separately before submitting your program.

The function below performs the following operations -
  1. First it finds all the already installed R packages
  2. Check packages which we want to install are already installed or not.
  3. If package is already installed, it doesnot install it again.
  4. If package is missing (not installed), it installs the package.
  5. Loop through steps 2, 3 and 4 for multiple packages we want to install
  6. Load all the packages (both already available and new ones).

Install_And_Load <- function(packages) {
  k <- packages[!(packages %in% installed.packages()[,"Package"])];
  if(length(k))
  {install.packages(k, repos='https://cran.rstudio.com/');}

  for(package_name in packages)
  {library(package_name,character.only=TRUE, quietly = TRUE);}
}
Install_And_Load(c("fuzzyjoin", "quanteda", "stringdist", "stringr", "stringi"))

Explanation

1. installed.packages() returns details of all the already installed packages. installed.packages()[,"Package"] returns names of these packages.

To see version of the packages, submit the following command
installed.packages()[,c("Package","Version")]
2.  You can use any of the following repositories (URL of a CRAN mirror). You can experiment with these 3 repositories if one of them is blocked in your company due to firewall restriction.
https://cloud.r-project.org
https://cran.rstudio.com
http://www.stats.ox.ac.uk/pub/RWin
3. quietly = TRUE tells R not to print errors/warnings if package attaching (loading) fails.

How to check version of R while installation

In the program below, the package RDCOMClient refers repository - http://www.omegahat.net/R if R version is greater than or equal to 3.5. Else refers the repository http://www.stats.ox.ac.uk/pub/RWin
if (length("RDCOMClient"[!("RDCOMClient" %in% installed.packages()[,"Package"])])) {
  if (as.numeric(R.Version()$minor)>= 5)
    install.packages("RDCOMClient", repos = "http://www.omegahat.net/R")
  else
    install.packages("RDCOMClient", repos = "http://www.stats.ox.ac.uk/pub/RWin")
}
library("RDCOMClient")

Add JavaScript and CSS in Shiny

$
0
0
In this tutorial, I will cover how to include your own JavaScript, CSS and HTML code in your R shiny app. By including them, you can make a very powerful professional web app using R.

First let's understand the basics of a Webpage

In general, web page contains the following section of details.
  1. Content (Header, Paragraph, Footer, Listing)
  2. Font style, color, background, border
  3. Images and Videos
  4. Popups, widgets, special effects etc.

HTML, CSS and JavaScript

These 3 web programming languages in conjunction  take care of all the information webpage contains (from text to adding special effects).
  1. HTML determines the content and structure of a page (header, paragraph, footer etc.)
  2. CSS controls how webpage would look like (color, font type, border etc.)
  3. JavaScript decides advanced behaviors such as pop-up, animation etc.
Make JavaScript, CSS work for Shiny
Fundamentals of Webpage
One of the most common web development term you should know : rendering. It is the act of putting together a web page for presentation.
Shiny Dashboard Syntax

In this article, I will use shinydashboard library as it gives more professional and elegant look to app. The structure of shinydashboard syntax is similar to shiny library. Both requires ui and server components. However, functions are totally different. Refer the code below. Make sure to install library before using the following program.
# Load Library
library(shiny)
library(shinydashboard)

# User Interface
ui =
dashboardPage(
dashboardHeader(title = "Blank Shiny App"),
dashboardSidebar(),
dashboardBody()
)

# Server
server = function(input, output) { }

# Run App
runApp(list(ui = ui, server = server), launch.browser =T)

Example : Create Animation Effect

The program below generates animation in the web page. To test it, you can check out this link. When user hits "Click Me" button, it will trigger demojs() JavaScript which will initiate animation. It's a very basic animation. You can edit the code and make it as complex as you want.

HTML

CSS

#sampleanimation {
width: 50px;
height: 50px;
position: absolute;
background-color: blue;
}

#myContainer {
width: 400px;
height: 400px;
position: relative;
background: black;
}

JS

function demojs() {
var elem = document.getElementById('sampleanimation');
var position = 0;
var id = setInterval(frame, 10);
function frame() {
if (position == 350) {
clearInterval(id);
} else {
position++;
elem.style.top = position + 'px';
elem.style.left = position + 'px';
}
}
}

There are several ways to include custom JavaScript and CSS codes in Shiny. Some of the common ones are listed below with detailed explanation -

Method I : Use tags to insert HTML, CSS and JS Code in Shiny


HTML
tags$body(HTML("Your HTML Code"))
CSS
tags$head(HTML("<style type='text/css'>
Your CSS Code
</style>"))
OR

CSS code can also be defined using tags$style. 
tags$head(tags$style(HTML(" Your CSS Code ")))

JS
tags$head(HTML("<script type='text/javascript'>
Your JS Code
</script>"))

OR

JS code can be described with tags$script.
tags$head(tags$script(HTML(" Your JS Code ")))

Code specified in tags$head means it will be included and executed under <head> </head>. Similarly tags$body can also be used to make shiny run code within <body> </body>

tags$head vs. tags$body

In general, JavaScript and CSS files are defined inside <head> </head>. Things which we want to display under body section of the webpage should be defined within <body> </body>.

Animation Code in Shiny



Important Note
In JS, CSS and HTML code, make sure to replace double quotation mark with single quotation mark under shiny's HTML("") function as it considers double quotation mark as closing the function.

Method II : Call JavaScript and CSS files in Shiny

You can use includeScript( ) and includeCSS( ) functions to refer JS and CSS codes from files saved in your local directory. You can save the files anywhere and mention the file location of them in the functions.

How to create JS and CSS files manually
Open notepad and paste JS code and save it with .js file extension and file type "All files" (not text document). Similarly you can create css file using .css file extension.


When to use Method 2?
When you want to include a big (lengthy) JS / CSS code, use method 2. Method 1 should be used for small code snippets as RStudio does not support coloring and error-checking of JS / CSS code. Also it makes code unnecessary lengthy which makes difficult to maintain.

Method III : Add JS and CSS files under www directory

Step 1 : 
Create an app using shinyApp( ) function and save it as app.R. Refer the code below.



Step 2 :
Create a folder named www in your app directory (where your app app.r file is stored) and save .js and .css files under the folder. Refer the folder structure below.
├── app.R
└── www
└── animate.js
└── animation.css

Step 3 :
Submit runApp( ) function. Specify path of app directory.
runApp(appDir = "C:/Users/DELL/Documents", launch.browser = T)

Method IV : Using Shinyjs R Package

The shinyjs package allows you to perform most frequently used JavaScript tasks without knowing JavaScript programming at all. For example, you can hide, show or toggle element. You can also enable or disable input.

Example : Turn content on and off by pressing the same button

Make sure to install shinyjs package before loading it. You can install it by using install.packages("shinyjs").

Important Point : Use function useShinyjs( ) under dashboardBody( ) to initialize shinyjs library



In the above program, we have used toggle( ) function to turn content on and off.


Example : Enable or disable Numeric Input based on checkbox selection



Communication between R and JavaScript

You can also define and call your own JavaScript function using shinyjs package with the use of extendShinyjs( ) function inside dashboardBody( ).
  1. Make sure to define custom JavaScript function beginning with word shinyjs
  2. JS function should be inside quotes
  3. In server, you can call the function by writing js$function-name
The program below closes app when user clicks on action button.



End Notes

With the huge popularity of JavaScript and many recent advancements, it is recommended to learn basics of JavaScript so that you can use them in R Shiny app. According to latest survey, JavaScript is used by 95% of websites. Its huge popularity is because of active broad JS developers community and being used by big players like Google, Facebook, Microsoft, etc.
Do comment on how you use shiny app in the comment box below. If you are beginner and want to learn building webapp using shiny, check out this tutorial

Install Python Package

$
0
0
Python is one of the most popular programming language for data science and analytics. It is widely used for a variety of tasks in startups and many multi-national organizations. The beauty of this programming language is that it is open-source which means it is available for free and has very active community of developers across the world. Python developers share their solutions in the form of package or module with other python users. This tutorial explains various ways how to install python package.

Ways to Install Python Package


Method 1 : If Anaconda is already installed on your System

Anaconda is the data science platform which comes with pre-installed popular python packages and powerful IDE (Spyder) which has user-friendly interface to ease writing of python programming scripts.

If Anaconda is installed on your system (laptop), click on Anaconda Prompt as shown in the image below.

Anaconda Prompt

To install a python package or module, enter the code below in Anaconda Prompt -
pip install package-name
Install Python Package using PIP Windows

Method 2 : NO Need of Anaconda


1. Open RUN box using shortcut Windows Key + R

2. Enter cmd in the RUN box
Command Prompt

Once you press OK, it will show command prompt screen.



3. Search for folder named Scripts where pip applications are stored.

Scripts Folder

4. In command prompt, type cd <file location of Scripts folder>

cd refers to change directory.

For example, folder location is C:\Users\DELL\Python37\Scripts so you need to enter the following line in command prompt :
cd C:\Users\DELL\Python37\Scripts 

Change Directory

5. Type pip install package-name

Install Package via PIP command prompt


Syntax Error : Installing Package using PIP

Some users face error "SyntaxError: invalid syntax"in installing packages. To workaround this issue, refer the command line below -
python -m pip install package-name
python -m pip tells python to import a module for you, then run it as a script.

Install Specific Versions of Python Package
python -m pip install Packagename==1.3     # specific version
python -m pip install "Packagename>=1.3"  # version greater than or equal to 1.3

How to load or import package or module

Once package is installed, next step is to make the package in use. In other words, it is required to import package once installed. There are several ways to load package or module in Python :

1. import math loads the module math. Then you can use any function defined in math module using math.function. Refer the example below -
import math
math.sqrt(4)

2. from math import * loads the module math. Now we don't need to specify the module to use functions of this module.
from math import *
sqrt(4)

3. from math import sqrt, cos imports the selected functions of the module math.

4.import math as m imports the math module under the alias m.
m.sqrt(4)

Other Useful Commands
DescriptionCommand
To uninstall a packagepip uninstall package
To upgrade a packagepip install --upgrade package
To search a packagepip search "package-name"
To check all the installed packagespip list

PIP connection Error : SSL CERTIFICATE VERIFY FAILED

$
0
0
The most common issue in installing python package in a company's network is failure of verification of SSL Certificate. Sometimes company blocks some websites in their network so employees can't access these websites. Whenever they try to visit these websites, it shows "Access Denied because of company's policy". It causes connection error in reachingmain python website.

Error looks like this :

Could not fetch URL https://pypi.python.org/: connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)

PIP SSL Certification Issue


Solution :

Run the following command. Make sure to specify package name in <package_name>
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org <package_name> -vvv
Suppose you want to install pandas package, you should submit the following line of command
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org pandas -vvv

The --trusted-host option mark the host as trusted, even though it does not have valid or any HTTPS

Create Dummy Data in Python

$
0
0
This article explains various ways to create dummy or random data in Python for practice. Like R, we can create dummy data frames using pandas and numpy packages. Most of the analysts prepare data in MS Excel. Later they import it into Python to hone their data wrangling skills in Python. This is not an efficient approach. The efficient approach is to prepare random data in Python and use it later for data manipulation.

1. Enter Data Manually in Editor Window

The first step is to load pandas package and use DataFrame function
import pandas as pd
data = pd.DataFrame({"A" : ["John","Deep","Julia","Kate","Sandy"],
"MonthSales" : [25,30,35,40,45]})
       A  MonthSales
0 John 25
1 Deep 30
2 Julia 35
3 Kate 40
4 Sandy 45

Note : Character values should be defined in single or double quotes.

2. Prepare Data using sequence of numeric and character values

Let's import two popular python packages for this task - string and numpy. The package string is used to generate series of alphabets. Whereas numpy package is used to generate sequence of numbers incremented by a specific value.
import pandas as pd
import string
import numpy as np
data2 = pd.DataFrame({"A": np.arange(1,10,2),
"B" : list(string.ascii_lowercase)[0:5],
})
   A  B
0 1 a
1 3 b
2 5 c
3 7 d
4 9 e

Explanation
1. np.arange(1,10,2) tells python to generate values between 1 and 10, incremented by 2.
2.string.ascii_lowercase returns abcdefghijklmnopqrstuvwxyz. list(string.ascii_lowercase)[0:5] is used to pick first 5 letters.

3. Generate Random Data

In numpy, there are many functions to generate random values. The two most popular random functions are random.randint( ) and random.normal( )
import pandas as pd
import numpy as np
np.random.seed(1)
data3 = pd.DataFrame({"C" : np.random.randint(low=1, high=100, size=10),
"D" : np.random.normal(0.0, 1.0, size=10)
})
    C         D
0 38 -0.528172
1 13 -1.072969
2 73 0.865408
3 10 -2.301539
4 76 1.744812
5 6 -0.761207
6 80 0.319039
7 65 -0.249370
8 17 1.462108
9 2 -2.060141

Explanation
np.random.seed(1) tells python to generate same random values with this seed when you run it next time. np.random.randint(low=1, high=100, size=10) returns 10 random values between 1 and 100. np.random.normal(0.0, 1.0, size=10) returns 10 random values following standard normal distribution having mean 0 and standard deviation 1.

Check mean and standard deviation of normal distribution
np.round(np.std(np.random.normal(0.0, 1.0, size=1000)))
np.round(np.mean(np.random.normal(0.0,1.0, size=1000)))


4. Create Categorical Variables

In this step, we will create two types of categorical variables :
  • Categories ranging from 1 to 4
  • Binary variable (0 / 1)
import pandas as pd
import numpy as np
np.random.seed(1)
data4 =pd.DataFrame({"X" : np.random.choice(range(1,5), 20, replace=True),
"X1" : np.where(np.random.normal(0.0, 1.0, size=20)<=0,0,1)})
    X  X1
0 2 1
1 4 0
2 1 1
3 1 0
4 4 1
5 2 0
6 4 0
7 2 0
8 4 1
9 1 0
10 1 0
11 2 0
12 1 1
13 4 1
14 2 0
15 1 1
16 3 1
17 2 1
18 3 1
19 1 0

Explanation
  1. np.random.choice(range(1,5), 20, replace=True) means generating 20 values from 1 to 4 (excluding 5) with replacement (i.e. repeated values).
  2. np.where(np.random.normal(size=20)<=0,0,1) implies if random value is either zero or negative, make it 0. Otherwise 1. np.where( ) is used to construct IF-ELSE statement in python.
Like R's factor( ) function, you can define variable(s) as categorical variables. See the code below.
data4.X  = data4.X.astype("category")
data4.X1 = data4.X1.astype("category")

5. Import CSV or Excel File

Using pandas functions read_csv( ) and read_excel( ) functions, you can read data from excel or CSV to Python.
import pandas as pd
mydata= pd.read_csv("C:\\Users\\Deepanshu\\samplefile.csv")
mydata = pd.read_excel("C:\\Users\\Deepanshu\\samplefile.xlsx")

Detailed Tutorial : How to import data in Python

Loops in Python explained with examples

$
0
0
This tutorial covers various ways to execute loops in python. Loops is an important concept of any programming language which performs iterations i.e. run specific code repeatedly until a certain condition is reached.

1. For Loop

Like R and C programming language, you can use for loop in Python. It is one of the most commonly used loop method to automate the repetitive tasks.

How for loop works?

Suppose you are asked to print sequence of numbers from 1 to 9, increment by 2.
for i in range(1,10,2):
print(i)
Output
1
3
5
7
9
range(1,10,2) means starts from 1 and ends with 9 (excluding 10), increment by 2.

Iteration over list
This section covers how to run for in loop on a list.
mylist = [30,21,33,42,53,64,71,86,97,10]
for i in mylist:
print(i)
Output
30
21
33
42
53
64
71
86
97
10

Suppose you need to select every 3rd value of list.
for i in mylist[::3]:
print(i)
Output
30
42
71
10
mylist[::3] is equivalent to mylist[0::3] which follows this syntax style list[start:stop:step]

Python Loop Explained with Examples

Example 1 : Create a new list with only items from list that is between 0 and 10
l1 = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]

new = [] #Blank list
for i in l1:
if i > 0 and i <= 10:
new.append(i)

new
Output: [1, 10, 2, 3, 5, 8]
It can also be done via numpy package by creating list as numpy array. See the code below.
import numpy as np
k=np.array(l1)
new=k[np.where(k<=10)]

Example 2 : Check which alphabet (a-z) is mentioned in string

Suppose you have a string named k and you want to check which alphabet exists in the string k.
k = "deepanshu"

import string
for n in string.ascii_lowercase:
if n in k:
print(n + ' exists in ' + k)
else:
print(n + ' does not exist in ' + k)
string.ascii_lowercase returns 'abcdefghijklmnopqrstuvwxyz'.

Practical Examples : for in loop in Python

Create sample pandas data frame for illustrative purpose.
import pandas as pd
np.random.seed(234)
df = pd.DataFrame({"x1" : np.random.randint(low=1, high=100, size=10),
"Month1" : np.random.normal(size=10),
"Month2" : np.random.normal(size=10),
"Month3" : np.random.normal(size=10),
"price" : range(10)
})

df
1. Multiple each month column by 1.2
for i in range(1,4):
print(df["Month"+str(i)]*1.2)
range(1,4) returns 1, 2 and 3. str( ) function is used to covert to string."Month" + str(1) means Month1.
2. Store computed columns in new data frame
import pandas as pd
newDF = pd.DataFrame()
for i in range(1,4):
data = pd.DataFrame(df["Month"+str(i)]*1.2)
newDF=pd.concat([newDF,data], axis=1)
pd.DataFrame( ) is used to create blank data frame. The concat() function from pandas package is used to concatenate two data frames.

3. Check if value of x1 >= 50, multiply each month cost by price. Otherwise same as month.
import pandas as pd
import numpy as np
for i in range(1,4):
df['newcol'+str(i)] = np.where(df['x1'] >= 50,
df['Month'+str(i)] * df['price'],
df['Month'+str(i)])
In this example, we are adding new columns named newcol1, newcol2 and newcol3.np.where(condition, value_if condition meets, value_if condition does not meet) is used to construct IF ELSE statement.

4. Filter data frame by each unique value of a column and store it in a separate data frame
mydata = pd.DataFrame({"X1" : ["A","A","B","B","C"]})

for name in mydata.X1.unique():
temp = pd.DataFrame(mydata[mydata.X1 == name])
exec('{} = temp'.format(name))
The unique( ) function is used to calculate distinct values of a variable. The exec( ) function is used for dynamic execution of Python program. See the usage of string format( ) function below -
s= "Your Input"
"i am {}".format(s)

Output: 'i am Your Input'

Loop Control Statements

Loop control statements change execution from its normal iteration. When execution leaves a scope, all automatic objects that were created in that scope are destroyed.

Python supports the following control statements.
  1. Continue statement
  2. Break statement

Continue Statement
When continue statement is executed, it skips the further code in the loop and continue iteration.
In the code below, we are avoiding letters a and d to be printed.
for n in "abcdef":
if n =="a" or n =="d":
continue
print("letter :", n)
letter : b
letter : c
letter : e
letter : f
Break Statement
When break statement runs, it breaks or stops the loop.
In this program, when n is either c or d, loop stops executing.
for n in "abcdef":
if n =="c" or n =="d":
break
print("letter :", n)
letter : a
letter : b

for loop with else clause

Using else clause with for loop is not common among python developers community.
The else clause executes after the loop completes. It means that the loop did not encounter a break statement.
The program below calculates factors for numbers between 2 to 10. Else clause returns numbers which have no factors and are therefore prime numbers:

for k in range(2, 10):
for y in range(2, k):
if k % y == 0:
print( k, '=', y, '*', round(k/y))
break
else:
print(k, 'is a prime number')
2 is a prime number
3 is a prime number
4 = 2 * 2
5 is a prime number
6 = 2 * 3
7 is a prime number
8 = 2 * 4
9 = 3 * 3

While Loop

While loop is used to execute code repeatedly until a condition is met. And when the condition becomes false, the line immediately after the loop in program is executed.
i = 1
while i < 10:
print(i)
i += 2 #means i = i + 2
print("new i :", i)
Output:
1
new i : 3
3
new i : 5
5
new i : 7
7
new i : 9
9
new i : 11

While Loop with If-Else Statement

If-Else statement can be used along with While loop. See the program below -

counter = 1 
while (counter <= 5):
if counter < 2:
print("Less than 2")
elif counter > 4:
print("Greater than 4")
else:
print(">= 2 and <=4")
counter += 1

Python Lambda Function with Examples

$
0
0
This article covers detailed explanation of lambda function of Python. You will learn how to use it in some of the common scenarios with examples.

Table of Contents

Introduction : Lambda Function

In non-technical language, lambda is an alternative way of defining function. You can define function inline using lambda. It means you can apply a function to some data in one line of python code and then join the result. It is called anonymous function as the function can be defined without a name.

Syntax of Lambda Function

lambda arguments: expression
Lambda function can have more than one argument but expression cannot be more than 1. The expression is evaluated and returned.
Example
addition = lambda x,y: x + y
addition(2,3) returns 5
In the above python code, x,y are the arguments and x + y is the expression that gets evaluated and returned.

Difference between Lambda and Def Function

By using both lambda and def, you can create your own user-defined function in python.
def square(x):
return x**2

square(2) returns 4
square = lambda x:x**2

square(2) returns 4

There are some difference between them as listed below.

1. lambda is a keyword that returns a function object and does not create a 'name'. Whereas def creates name in the local namespace
2. lambda functions are good for situations where you want to minimize lines of code as you can create function in one line of python code. It is not possible using def
3. lambda functions are somewhat less readable for most Python users.
4. lambda functions can only be used once, unless assigned to a variable name.

Lambda Function : Examples

In this section of tutorial, we will see various practical examples of lambda functions. Let's create a pandas data frame for illustration purpose.
import pandas as pd
np.random.seed(12)
df = pd.DataFrame(np.random.randn(5, 3), index=list('abcde'), columns=list('XYZ'))
          X         Y         Z
a 0.472986 -0.681426 0.242439
b -1.700736 0.753143 -1.534721
c 0.005127 -0.120228 -0.806982
d 2.871819 -0.597823 0.472457
e 1.095956 -1.215169 1.342356
Example 1 : Add 2 to each value of Data Frame
def add2(x):
return x+2
df.apply(add2)
square = df.apply(lambda x: x+2)
Both returns the same output but lambda function can be defined within apply( ) function.
          X         Y         Z
a 2.472986 1.318574 2.242439
b 0.299264 2.753143 0.465279
c 2.005127 1.879772 1.193018
d 4.871819 1.402177 2.472457
e 3.095956 0.784831 3.342356
Example 2 : Create function that returns result of number raised to power
def power(x,n):
return x**n
df.apply(power, n=3)
df.apply(lambda x : x**3)
              X         Y         Z
a 1.058143e-01 -0.316414 0.014250
b -4.919381e+00 0.427201 -3.614836
c 1.347751e-07 -0.001738 -0.525523
d 2.368489e+01 -0.213657 0.105460
e 1.316375e+00 -1.794361 2.418820
Example 3 : Conditional Statement (IF-ELSE)
Suppose you want to create a new variable which is missing or blank if value of an existing variable is less than 90. Else copy the same value of existing variable. Let's create a dummy data frame called sample which contains only 1 variable named var1. Condition : If var1 is less than 90, function should return missing else value of var1.
import numpy as np
sample = pd.DataFrame({'var1':[10,100,40] })
sample['newvar1'] = sample.apply(lambda x: np.nan if x['var1'] < 90 else x['var1'], axis=1)
How to read the above lambda function
x: value_if_condition_true if logical_condition else value_if_condition_false
axis=1 tells python to apply function to each row of a particular column. By default, it is 0 which means apply function to each column of a row.

There is one more way to write the above function without specifying axis option. It will be applied to series sample['var1']
sample['newvar1'] = sample['var1'].apply(lambda x: np.nan if x < 90 else x)

The same function can also be written using def. See the code below.
def miss(x):
if x["var1"] < 90:
return np.nan
else:
return x["var1"]

sample['newvar1'] = sample.apply(miss, axis=1)
   var1  newvar1
0 10 NaN
1 100 100.0
2 40 NaN
Example 4 : Multiple or Nested IF-ELSE Statement
Suppose you want to create a flag wherein it is yes when value of a variable is greater than or equal to 1 but less than or equal to 5. Else it is no if value is equal to 7. Otherwise missing.
mydf = pd.DataFrame({'Names': np.arange(1,10,2)}) 
mydf["flag"] = mydf["Names"].apply(lambda x: "yes" if x>=1 and x<=5 else "no"
if x==7 else np.nan)
   Names flag
0 1 yes
1 3 yes
2 5 yes
3 7 no
4 9 NaN
Lambda functions are used along with built-in functions like filter(), map().

map() function

map functions executes the function object (i.e. lambda or def) for each element and returns a list of the elements modified by the function object. In the code below, we are multiplying each element by 2.
mylist = [1, 2, 3, 4]
map(lambda x : x*2, mylist)
It returns map object. You cannot see the returned values directly. To view the result, you need to wrap it in list( )
list(map(lambda x : x*2, mylist))
Output : [2, 4, 6, 8]

filter() function

It returns the items where function is true. If none of the element meets condition, it will return nothing. In the code below, we are checking if value is greater than 2.
list(filter(lambda x : x > 2 , mylist))
Output : [3, 4]
It returns filter object. To see the output values, you need to wrap filter( ) function within list( )

Create Animation in R : Learn by Examples

$
0
0
This tutorial covers various ways you can create animated charts or plots using R. Animation is a very important element of data visualization. Animated charts are visually appealing and it fetches attention of audience. There are many online data visualization tools available in market which can generate animated charts but most of them are paid tools. Also problem with the online animation tools is that it asks you to upload data to their server, which is a data breach if you work on a real-world data of your client. Since R is open-source, you can download it for free and can create animated charts without moving data to server of any external server.

Simple Animation in R

Let's create dummy data for illustration. In the program below, we are generating 3 columns containing some random observations. First column named A contains 50 observations ranging from 1 to 75. Similarly second column contains similar number of observations but range interval is different.
df = data.frame(A=sample(1:75, 50, replace=TRUE),
B=sample(1:100, 50, replace=TRUE),
stringsAsFactors = FALSE)
gganimate package is used for animation in R. It is an extension of popular package for graphics - ggplot2 package.
library(ggplot2)
library(tidyverse)
library(gganimate)
library(directlabels)
library(png)
library(transformr)
library(grid)

ggplot(df, aes(A, B)) +
geom_line() +
transition_reveal(A) +
labs(title = 'A: {frame_along}')
Animation R
geom_line() is used for creating line chart. transition_reveal(A) allows you to let data gradually appear.frame_along gives the position that the current frame corresponds to.

What is frame and rendering in animation?

In animation, a frame is one of the many still images which compose the complete moving picture. Rendering is a kind of computing to output the final result. In gganimate package, it is by default 100 frames to render. You can change the number of frames under nframes= parameter in animatefunction.
p = ggplot(df, aes(A, B, group = C)) +
geom_line() +
transition_reveal(A) +
labs(title = 'A: {frame_along}')

animate(p, nframes=40)

How to save animated plot in GIF format file?

You can use anim_save(file_location,plot) function to export animated chart in GIF format.
anim_save("basic_animation.gif", p)

Frames per Second (fps)

It is the amount of time spend on each frame per second. You can use parameter fps in animate() function. By default, it is 10 frames per second.
animate(p, nframes=40, fps = 2)
Decreasing fps from 10 means slowing down speed of animation.

How to stop loop in animation?

Loop means continuously repeating animation over and over again. To end loop, you can use renderer = gifski_renderer(loop = FALSE) option in animate function.
animate(p, renderer = gifski_renderer(loop = FALSE))

How to change layout of plot?

You can change height and width of plot by mentioning the size in animate( ) function.
animate(p, fps = 10, duration = 14, width = 800, height = 400)

Advanced Animation in R : Examples

Prepare Data for Example
In this example, we will create bar chart showing change in monthly sales figure of different products.
set.seed(123)
dates = paste(rep(month.abb[1:10], each=10), 2018)
df = data.frame(Product=rep(sample(LETTERS[1:10],10), 10),
Period=factor(dates, levels=unique(dates)),
Sales=sample(1:100,100, replace = TRUE))
head(df)
Product Period Sales order
1 E Jan 2018 15 1
2 H Jan 2018 34 2
3 F Jan 2018 42 3
4 E Jan 2018 49 4
5 J Jan 2018 49 5
6 C Jan 2018 60 6
# Ranking by Period and Sales
df = df %>%
arrange(Period, Sales) %>%
mutate(order = 1:n())

# Animation
p = df %>%
ggplot(aes(order, Sales)) +
geom_bar(stat = "identity", fill = "#ff9933") +
labs(title='Total Sales in {closest_state}', x=NULL) +
theme(plot.title = element_text(hjust = 0.5, size = 18)) +
scale_x_continuous(breaks=df$order, labels=df$Product, position = "top") +
transition_states(Period, transition_length = 1, state_length = 2) +
view_follow(fixed_y=TRUE) +
ease_aes('cubic-in-out')

animate(p, nframes=50, fps=4)
anim_save("bar_animation.gif", p)
Detailed Explanation
  1. transition_states() animates plot by categorical or discrete variable. "States" are the animation sequences which plays. When a state transition is triggered, there will be a new state whose animation sequence will then run. In this case, state is Period column. state_length refers to relative length of the pause at the states. transition_length refers to relative length of the transition.
  2. view_follow(fixed_y=TRUE) means y-axis would be fixed when animation is running.
  3. ease_aes( ) refers to motion in animation that starts quickly and then decelerates. Or vice-versa.
  4. You can set theme using theme_set(theme_minimal())

Indian General Election (1984 to 2019) Study

Recently BJP secured majority in Lok Sabha Election. In 1984, they contested first time in Lok Sabha Election. INC (Indian National Congress) used to be the biggest political party in India a decade ago. Here we will see the trend analysis on "% of seats won by these two parties) from 1984 to 2019. Source of Data : Election Commission of India
library(ggplot2)
library(tidyverse)
library(gganimate)
library(directlabels)
library(png)
library(transformr)
library(grid)

# Read Data
df = read.table(text =
" Year Perc_Seats Party
1984 0.79 INC
1989 0.38 INC
1991 0.45 INC
1996 0.27 INC
1998 0.27 INC
1999 0.22 INC
2004 0.28 INC
2009 0.4 INC
2014 0.09 INC
2019 0.1 INC
1984 0 BJP
1989 0.17 BJP
1991 0.23 BJP
1996 0.31 BJP
1998 0.35 BJP
1999 0.35 BJP
2004 0.27 BJP
2009 0.23 BJP
2014 0.52 BJP
2019 0.56 BJP
", header=TRUE)

# Set Theme
theme_set(theme_minimal())

# Plot and animate
p =
ggplot(data = df, aes(x= factor(Year), y=Perc_Seats, group=Party, colour=Party)) +
geom_line(size=2, show.legend = FALSE) +
scale_color_manual(values=c("#ff9933", "#006400")) +
scale_x_discrete(position = "top") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(title = 'Lok Sabha Election : % of seats won',
x = NULL, y = NULL) +
geom_text(aes(label=scales::percent(Perc_Seats, accuracy = 1),
vjust= -2), show.legend = FALSE) +
theme(plot.title = element_text(hjust = 0.5)) +
geom_dl(aes(label=Party), method="last.points") +
transition_reveal(Year) +
coord_cartesian(clip = 'off') +
ease_aes('cubic-in-out')

animate(p, fps = 10, width = 800, height = 400)
anim_save("election.gif", p)

How to save animated plot as video

Make sure ffmpeg is installed on your system before using the code below. It is available for download for all the operating systems.

animate(nations_plot, renderer = ffmpeg_renderer(), width = 800, height = 450)
anim_save("nations.mp4")

String Functions in Python with Examples

$
0
0
This tutorial outlines various string (character) functions used in Python. To manipulate strings and character values, python has several in-built functions. It means you don't need to import or have dependency on any external package to deal with string data type in Python. It's one of the advantage of using Python over other data science tools. Dealing with string values is very common in real-world. Suppose you have customers' full name and you were asked by your manager to extract first and last name of customer. Or you want to fetch information of all the products that have code starting with 'QT'.
Table of Contents

Python String Functions

List of frequently used string functions

The table below shows many common string functions along with description and its equivalent function in MS Excel. We all use MS Excel in our workplace and familiar with the functions used in MS Excel. The comparison of string functions in MS EXCEL and Python would help you to learn the functions quickly and mug-up before interview.
FunctionDescriptionMS EXCEL FUNCTION
mystring[:N]Extract N number of characters from start of string.LEFT( )
mystring[-N:]Extract N number of characters from end of stringRIGHT( )
mystring[X:Y]Extract characters from middle of string, starting from X position and ends with YMID( )
str.split(sep='')Split Strings-
str.replace(old_substring, new_substring)Replace a part of text with different sub-stringREPLACE( )
str.lower()Convert characters to lowercaseLOWER( )
str.upper()Convert characters to uppercaseUPPER( )
str.contains('pattern', case=False)Check if pattern matches  (Pandas Function)SQL LIKE Operator
str.extract(regular_expression)Return matched values (Pandas Function)-
str.count('sub_string')Count occurence of pattern in string-
str.find( )Return position of sub-string or patternFIND( )
str.isalnum()Check whether string consists of only alphanumeric characters-
str.islower()Check whether characters are all lower case-
str.isupper()Check whether characters are all upper case-
str.isnumeric()Check whether string consists of only numeric characters-
str.isspace()Check whether string consists of only whitespace characters-
len( )Calculate length of stringLEN( )
cat( )Concatenate Strings (Pandas Function)CONCATENATE( )
separator.join(str)Concatenate StringsCONCATENATE( )

LEFT, RIGHT and MID Functions

If you are intermediate MS Excel users, you must have used LEFT, RIGHT and MID Functions. These functions are used to extract N number of characters or letters from string.
1. Extract first two characters from beginning of string
mystring = "Hey buddy, wassup?"
mystring[:2]
Out[1]: 'He'
  1. string[start:stop:step] means item start from 0 (default) through (stop-1), step by 1 (default).
  2. mystring[:2] is equivalent to mystring[0:2]
  3. mystring[:2] tells Python to pull first 2 characters from mystring string object.
  4. Indexing starts from zero so it includes first, second element and excluding third.
2. Find last two characters of string
mystring[-2:]
The above command returns p?.The -2 starts the range from second last position through maximum length of string.
3. Find characters from middle of string
mystring[1:3]
Out[1]: 'ey'
mystring[1:3] returns second and third characters. 1 refers to second character as index begins with 0.
4. How to reverse string?
mystring[::-1]
Out[1]: '?pussaw ,yddub yeH'
-1 tells Python to start it from end and increment it by 1 from right to left.
5. How to extract characters from string variable in Pandas DataFrame?
Let's create a fake data frame for illustration. In the code below, we are creating a dataframe named df containing only 1 variable called var1
import pandas as pd df = pd.DataFrame({"var1": ["A_2", "B_1", "C_2", "A_2"]})

var1
0 A_2
1 B_1
2 C_2
3 A_2
To deal text data in Python Pandas Dataframe, we can use str attribute. It can be used for slicing character values.
df['var1'].str[0]
In this case, we are fetching first character from var1 variable. See the output shown below.

Output
0 A
1 B
2 C
3 A

Extract Words from String

Suppose you need to take out word(s) instead of characters from string. Generally we consider one blank space as delimiter to find words from string.
1. Find first word of string
mystring.split()[0]
Out[1]: 'Hey'
How it works?
  1. split() function breaks string using space as a default separator
  2. mystring.split() returns ['Hey', 'buddy,', 'wassup?']
  3. 0 returns first item or word Hey
2. Comma as separator for words
mystring.split(',')[0]
Out[1]: 'Hey buddy'
3. How to extract last word
mystring.split()[-1]
Out[1]: 'wassup?'
4. How to extract word in DataFrame
Let's build a dummy data frame consisting of customer names and call it variable custname

mydf = pd.DataFrame({"custname": ["Priya_Sehgal", "David_Stevart", "Kasia_Woja", "Sandy_Dave"]})

custname
0 Priya_Sehgal
1 David_Stevart
2 Kasia_Woja
3 Sandy_Dave

#First Word
mydf['fname'] = mydf['custname'].str.split('_').str[0]

#Last Word
mydf['lname'] = mydf['custname'].str.split('_').str[1]
Detailed Explanation
  1. str.split( ) is similar to split( ). It is used to activate split function in pandas data frame in Python.
  2. In the code above, we created two new columns named fname and lname storing first and last name.

  3. Output
    custname fname lname
    0 Priya_Sehgal Priya Sehgal
    1 David_Stevart David Stevart
    2 Kasia_Woja Kasia Woja
    3 Sandy_Dave Sandy Dave

SQL LIKE Operator in Pandas DataFrame

In SQL, LIKE Statement is used to find out if a character string matches or contains a pattern. We can implement similar functionality in python using str.contains( ) function.

df2 = pd.DataFrame({"var1": ["AA_2", "B_1", "C_2", "a_2"],
"var2": ["X_2", "Y_1", "Z_2", "X2"]})

var1 var2
0 AA_2 X_2
1 B_1 Y_1
2 C_2 Z_2
3 a_2 X2
How to find rows containing either A or B in variable var1?
df2['var1'].str.contains('A|B')
str.contains(pattern) is used to match pattern in Pandas Dataframe.

Output
0 True
1 True
2 False
3 False
The above command returns FALSE against fourth row as the function is case-sensitive. To ignore case-sensitivity, we can use case=False parameter. See the working example below.
df2['var1'].str.contains('A|B', case=False)
How to filter rows containing a particular pattern?
In the following program, we are asking Python to subset data with condition - contain character values either A or B. It is equivalent to WHERE keyword in SQL.
df2[df2['var1'].str.contains('A|B', case=False)]

Output
var1 var2
0 AA_2 X_2
1 B_1 Y_1
3 a_2 X2
Suppose you want only those values that have alphabet followed by '_'

df2[df2['var1'].str.contains('^[A-Z]_', case=False)]
^ is a token of regular expression which means begin with a particular item.

var1 var2
1 B_1 Y_1
2 C_2 Z_2
3 a_2 X2

Find position of a particular character or keyword

str.find(pattern) is used to find position of sub-string. In this case, sub-string is '_'.

df2['var1'].str.find('_')

0 2
1 1
2 1
3 1

Replace substring

str.replace(old_text,new_text,case=False) is used to replace a particular character(s) or pattern with some new value or pattern. In the code below, we are replacing _ with -- in variable var1.

df2['var1'].str.replace('_', '--', case=False)

Output
0 AA--2
1 B--1
2 C--2
3 A--2
We can also complex patterns like the following program. + means item occurs one or more times. In this case, alphabet occurring 1 or more times.

df2['var1'].str.replace('[A-Z]+_', 'X', case=False)

0 X2
1 X1
2 X2
3 X2

Find length of string

len(string) is used to calculate length of string. In pandas data frame, you can apply str.len() for the same.

df2['var1'].str.len()

Output
0 4
1 3
2 3
3 3
To find count of occurrence of a particular character (let's say, how many time 'A' appears in each row), you can use str.count(pattern) function.
df2['var1'].str.count('A')

Convert to lowercase and uppercase

str.lower() and str.upper() functions are used to convert string to lower and uppercase values.

#Convert to lower case
mydf['custname'].str.lower()

#Convert to upper case
mydf['custname'].str.upper()

Remove Leading and Trailing Spaces

  1. str.strip() removes both leading and trailing spaces.
  2. str.lstrip() removes leading spaces (at beginning).
  3. str.rstrip() removes trailing spaces (at end).

df1 = pd.DataFrame({'y1': [' jack', 'jill ', ' jesse ', 'frank ']})
df1['both']=df1['y1'].str.strip()
df1['left']=df1['y1'].str.lstrip()
df1['right']=df1['y1'].str.rstrip()

y1 both left right
0 jack jack jack jack
1 jill jill jill jill
2 jesse jesse jesse jesse
3 frank frank frank frank

Convert Numeric to String

With the use of str( ) function, you can convert numeric value to string.

myvariable = 4
mystr = str(myvariable)

Concatenate or Join Strings

By simply using +, you can join two string values.

x = "Deepanshu"
y ="Bhalla"
x+y

DeepanshuBhalla
In case you want to add a space between two strings, you can use this - x+''+y returns Deepanshu BhallaSuppose you have a list containing multiple string values and you want to combine them. You can use join( ) function.

string0 = ['Ram', 'Kumar', 'Singh']
''.join(string0)

Output
'Ram Kumar Singh'
Suppose you want to combine or concatenate two columns of pandas dataframe.
mydf['fullname'] = mydf['fname'] + '' + mydf['lname']
OR
mydf['fullname'] = mydf[['fname', 'lname']].apply(lambda x: ''.join(x), axis=1)

custname fname lname fullname
0 Priya_Sehgal Priya Sehgal Priya Sehgal
1 David_Stevart David Stevart David Stevart
2 Kasia_Woja Kasia Woja Kasia Woja
3 Sandy_Dave Sandy Dave Sandy Dave

SQL IN Operator in Pandas

We can use isin(list) function to include multiple values in our filtering or subsetting criteria.

mydata = pd.DataFrame({'product': ['A', 'B', 'B', 'C','C','D','A']})
mydata[mydata['product'].isin(['A', 'B'])]

product
0 A
1 B
2 B
6 A
How to apply NOT criteria while selecting multiple values?
We can use sign ~ to tell python to negate the condition.

mydata[~mydata['product'].isin(['A', 'B'])]

Extract a particular pattern from string

str.extract(r'regex-pattern') is used for this task.

df2['var1'].str.extract(r'(^[A-Z]_)')
r'(^[A-Z]_)' means starts with A-Z and then followed by '_'

0 NaN
1 B_
2 C_
3 NaN
To remove missing values, we can use dropna( ) function.

df2['var1'].str.extract(r'(^[A-Z]_)').dropna()

Free SQL Download to practice queries

$
0
0
In this tutorial, we will cover how you can download relational database management system for free to practice SQL queries at home. Most of the people always have a question "Like Python and R, Is there any free software where they can learn and practice SQL queries?". Answer is yes. Before getting into details of installation process, we need to understand what is SQL and how it is connected to relational database management system.
What is SQL (Structured Query Language)?
SQL is a programming language, mainly used for data manipulation on data stored in a relational database management system. We can select, create, modify data (rows and columns) from tables using SQL queries. We can also modify and delete tables using queries.
What is RDBMS (Relational Database Management System)?
Relational Database Management System (RDBMS) is a software system that stores data in a tabular form. Most databases used in businesses these days are relational databases, as opposed to a CSV or Excel Files. SQL is the language used for communicating with data in RDBMS.

How to download SQL Server for free?

Microsoft SQL Server is a powerful relational database management system owned by Microsoft. It is the most popular RBDMS used in both small and big organizations. It is an enterprise system which is not available for free but Microsoft offers a free version of it which is called SQL Server Express edition.
Benefits of using SQL SERVER Express Edition
  1. You can create SQL tables by simply importing CSV files. You don't need to create sample data manually.
  2. You can create and execute stored procedures.
  3. You will get feel of how SQL is used in companies.
  4. It supports Window Functions like ROW_NUMBER, RANK, NTILE and DENSE_RANK etc.
Steps to download and install SQL Server Express Edition
  1. Go to Microsoft website and download SQL Server 2017 Express Edition. Click on Download now button as shown below.
    SQL Server Express
  2. After completing above step, click on the downloaded file. It will take you to the screen shown in the following image.

    Select the Basic install option. This new installation feature selects all of the most commonly used configuration options and is ideal for the beginning MSSQL user.
  3. It will install the software. When installation is completed, it will show information like connection settings, and file locations.
    Next step is to install SQL Server Management Studio (SSMS) by pressing Install SSMS button. It is IDE like RStudio or Spyder which helps to manage database and code with ease.
  4. Once you click on the above Install SSMS button, it will take you to the page as shown below. Click on Download SQL Server Management Studio 18.0 (GA) link. Downloading of the software will begin after that. It may take some time as it's 0.5 GB sized file.
  5. Install SSMS Software. Open it once installation is completed. It will show the screen where it asks you to connect to Server. Click on Connectbutton.
    Server Name : PC_Name\SQLEXPRESS. In the following image, DELL is the PC_NAME.
Are you facing issue in connection and getting the following error?
Cannot connect to XXXXXX. A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections.
To fix this issue, follow the steps below.
  1. Open SQL Server 2017 Configuration Manager and then go to SQL Server Network configuration and then click on Protocols for SQLEXPRESS and make sure both Named Pipes and TCP/IP are enabled. Right-click to enable them.
  2. Right-click on TCP/IP and go to Properties. Now Select IP Addresses Tab and then go to the last item IP All and enter 1433 in TCP Port.
  3. Press Window + R shortcut to open run window and then type services.msc. It will open Services window and then search for SQL SERVER(SQLEXPRESS) and then Start the service by right click on it.
  4. Open SQL Server Management Studio again. If it's already opened, reopen it.
How to use SQL Server Management Studio?
1. Press CTRL + N to open New Query where you can write your SQL query.
2. Check databases by clicking on Databases folder shown under Object Explorer
3. How to check current database in use?

SELECT DB_NAME() AS [Current Database]
Run the above command and press F5 shortcut to execute or submit sql query.
4. How to create a fake dummy table

USE tempdb;
create table employeetbl (employee_id integer, first_name varchar(10), salary float)
insert into employeetbl (employee_id,first_name,salary) values (123, 'Deep', 44561)
Select * from employeetbl
USE tempdb refers to database you want to use.
5. How to import CSV File?
1. Right click on Databases folder and click on New Database option and then type any name you want to assign (let's say newdb.

2. Right click on newdb >> Tasks >> Import Flat File
3. Select CSV file and import it.
6. How to create a simple stored procedure?

CREATE PROCEDURE sampleproc
AS
BEGIN
SELECT AGE, Attrition, HourlyRate
FROM
Employee_details
ORDER BY
HourlyRate;
END;

EXECUTE sampleproc;
Limitations of SQL SERVER Express Edition
  1. Maximum database size of 10 GB per database
  2. No SQL Server Agent service
  3. SQL Server Integration Services and Analysis Services are not available.

How to build login page in R Shiny App

$
0
0
This tutorial covers how you can build login page where user needs to add username and password for authentication in shiny app. RStudio offers paid products like Shiny Server or RStudio Connect which has authentication feature to verify the identify of user. But if you want this feature for free, you can follow the steps mentioned below.
Features of R Program shown in the tutorial below
  1. Dashboard will be opened only when user enters correct username and password
  2. You can hide or show functionalities of dashboard (like tabs, widgets etc) based on type of permission
  3. Encrypt password with hashing algorithm which mitigates brute-force attacks
login form shiny

Steps to add login authentication feature in Shiny

Step 1 : Install the following packages by using the command install.packages(package-name)
  • shiny
  • shinydashboard
  • DT
  • shinyjs
  • sodium

Step 2 : Run the program below
library(shiny)
library(shinydashboard)
library(DT)
library(shinyjs)
library(sodium)

# Main login screen
loginpage <- div(id = "loginpage", style = "width: 500px; max-width: 100%; margin: 0 auto; padding: 20px;",
wellPanel(
tags$h2("LOG IN", class = "text-center", style = "padding-top: 0;color:#333; font-weight:600;"),
textInput("userName", placeholder="Username", label = tagList(icon("user"), "Username")),
passwordInput("passwd", placeholder="Password", label = tagList(icon("unlock-alt"), "Password")),
br(),
div(
style = "text-align: center;",
actionButton("login", "SIGN IN", style = "color: white; background-color:#3c8dbc;
padding: 10px 15px; width: 150px; cursor: pointer;
font-size: 18px; font-weight: 600;"),
shinyjs::hidden(
div(id = "nomatch",
tags$p("Oops! Incorrect username or password!",
style = "color: red; font-weight: 600;
padding-top: 5px;font-size:16px;",
class = "text-center"))),
br(),
br(),
tags$code("Username: myuser Password: mypass"),
br(),
tags$code("Username: myuser1 Password: mypass1")
))
)

credentials = data.frame(
username_id = c("myuser", "myuser1"),
passod = sapply(c("mypass", "mypass1"),password_store),
permission = c("basic", "advanced"),
stringsAsFactors = F
)

header <- dashboardHeader( title = "Simple Dashboard", uiOutput("logoutbtn"))

sidebar <- dashboardSidebar(uiOutput("sidebarpanel"))
body <- dashboardBody(shinyjs::useShinyjs(), uiOutput("body"))
ui<-dashboardPage(header, sidebar, body, skin = "blue")

server <- function(input, output, session) {

login = FALSE
USER <- reactiveValues(login = login)

observe({
if (USER$login == FALSE) {
if (!is.null(input$login)) {
if (input$login > 0) {
Username <- isolate(input$userName)
Password <- isolate(input$passwd)
if(length(which(credentials$username_id==Username))==1) {
pasmatch <- credentials["passod"][which(credentials$username_id==Username),]
pasverify <- password_verify(pasmatch, Password)
if(pasverify) {
USER$login <- TRUE
} else {
shinyjs::toggle(id = "nomatch", anim = TRUE, time = 1, animType = "fade")
shinyjs::delay(3000, shinyjs::toggle(id = "nomatch", anim = TRUE, time = 1, animType = "fade"))
}
} else {
shinyjs::toggle(id = "nomatch", anim = TRUE, time = 1, animType = "fade")
shinyjs::delay(3000, shinyjs::toggle(id = "nomatch", anim = TRUE, time = 1, animType = "fade"))
}
}
}
}
})

output$logoutbtn <- renderUI({
req(USER$login)
tags$li(a(icon("fa fa-sign-out"), "Logout",
href="javascript:window.location.reload(true)"),
class = "dropdown",
style = "background-color: #eee !important; border: 0;
font-weight: bold; margin:5px; padding: 10px;")
})

output$sidebarpanel <- renderUI({
if (USER$login == TRUE ){
sidebarMenu(
menuItem("Main Page", tabName = "dashboard", icon = icon("dashboard"))
)
}
})

output$body <- renderUI({
if (USER$login == TRUE ) {
tabItem(tabName ="dashboard", class = "active",
fluidRow(
box(width = 12, dataTableOutput('results'))
))
}
else {
loginpage
}
})

output$results <- DT::renderDataTable({
datatable(iris, options = list(autoWidth = TRUE,
searching = FALSE))
})

}

runApp(list(ui = ui, server = server), launch.browser = TRUE)
How to customize the program
  1. In the above program, two user names and passwords are defined
    Username : myuser Password : mypassUsername : myuser1 Password : mypass1. To change them, you can edit the following code in R program.

    credentials = data.frame(
    username_id = c("myuser", "myuser1"),
    passod = sapply(c("mypass", "mypass1"),password_store),
    permission = c("basic", "advanced"),
    stringsAsFactors = F
    )
  2. In order to modify sidebar section, you can edit the following section of code.
        if (USER$login == TRUE ){ 
    sidebarMenu(
    menuItem("Main Page", tabName = "dashboard", icon = icon("dashboard"))
    )
    }
    In order to edit main body of the app, you can make modification in the following section of code.
      if (USER$login == TRUE ) {
    tabItem(tabName ="dashboard", class = "active",
    fluidRow(
    box(width = 12, dataTableOutput('results'))
    ))
    }
    else {
    loginpage
    }
  3. Suppose you want to show multiple tabs if permission level is set "advanced". Otherwise show a single tab. If you login with credentials Username : myuser1 Password : mypass1, you would find two tabs. Else it would show only one tab named "Main Page". Replace renderUI function of output$sidebarpanel and output$body with the following script.
      output$sidebarpanel <- renderUI({
    if (USER$login == TRUE ){
    if (credentials[,"permission"][which(credentials$username_id==input$userName)]=="advanced") {
    sidebarMenu(
    menuItem("Main Page", tabName = "dashboard", icon = icon("dashboard")),
    menuItem("About Page", tabName = "About", icon = icon("th"))
    )
    }
    else{
    sidebarMenu(
    menuItem("Main Page", tabName = "dashboard", icon = icon("dashboard"))
    )

    }
    }
    })


    output$body <- renderUI({
    if (USER$login == TRUE ) {
    if (credentials[,"permission"][which(credentials$username_id==input$userName)]=="advanced") {
    tabItems(
    tabItem(
    tabName ="dashboard", class = "active",
    fluidRow(
    box(width = 12, dataTableOutput('results'))
    ))
    ,
    tabItem(
    tabName ="About",
    h2("This is second tab")
    )
    )
    }
    else {
    tabItem(
    tabName ="dashboard", class = "active",
    fluidRow(
    box(width = 12, dataTableOutput('results'))
    ))

    }

    }
    else {
    loginpage
    }
    })
Note
Docker-based shinyproxy package is available for free which has an authentication feature along with some other great enterprise features. But you need to know docker to use this package and many users find it complicated.

Create Infographics with R

$
0
0
This tutorial explains how to create charts used for Infographics in R. The word Infographics is made up of two words Information and Graphics. It simply means graphical visual representation of information. They are visually appealing and attracts attention of audience. In presentations, it adds WOW factor and makes you stand out in a crowd.
Install the packages used for Infographic Charts
You can install these packages by running command install.packages(). The package echarts4r.assets is not available on CRAN so you need to install it from github account by running this command devtools::install_github("JohnCoene/echarts4r.assets")
  1. waffle
  2. extrafont
  3. tidyverse
  4. echarts4r
  5. echarts4r.assets

Waffle (Square Pie Chart)

In this section we will see how to create waffle chart in R. Waffle charts are also known as square pie or matrix charts. They show distribution of a categorical variable. It's an alternative to pie chart. It should be used when number of categories are less than 4. Higher the number of categories, more difficult would be read this chart. In the following example, we are showing percentage of respondents who answered 'yes' or 'no' in a survey.

library(waffle)
waffle(
c('Yes=70%' = 70, 'No=30%' = 30), rows = 10, colors = c("#FD6F6F", "#93FB98"),
title = 'Responses', legend_pos="bottom"
)
waffle in r
Use Icon in Waffle
Steps to download and install fontawesome fonts
  1. First step is to load extrafont library by running this command library(extrafont)
  2. Download and install fontawesome fonts from this URL https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/fonts/fontawesome-webfont.ttf
  3. Import downloaded fontawesome font by using this command extrafont::font_import (path="C:\\Users\\DELL\\Downloads", pattern = "awesome", prompt = FALSE)
  4. Load fonts by using the command loadfonts(device = "win")
  5. Check whether font awesome is installed successfully by running this command fonts()[grep("Awesome", fonts())]. It should return FontAwesome
In the example below, we are showing performance of girls in a particular subject. The option use_glyph= refers to icon you want to show in the chart and glyph_size= refers to size of the icon.

waffle(
c(`Poor=10` =10, `Average=18` = 18, `Excellent=7` =7), rows = 5, colors = c("#FD6F6F", "#93FB98", "#D5D9DD"),
use_glyph = "female", glyph_size = 12 ,title = 'Girls Performance', legend_pos="bottom"
)
waffle icon
How to align multiple waffle charts
By using iron( ) function you can left-align waffle plots. You can use ggplot2 functions to customize the plot (like I did in the program below to center align the title using plot.title = )

iron(
waffle(
c('TRUE' = 7, 'FALSE' = 3),
colors = c("pink", "grey70"),
use_glyph = "female",
glyph_size = 12,
title = "Female vs Male",
rows = 1,
legend_pos = "none"
) + theme(plot.title = element_text(hjust = 0.5))
,
waffle(
c('TRUE' = 8, 'FALSE' = 2),
colors = c("skyblue", "grey70"),
use_glyph = "male",
glyph_size = 12,
rows = 1,
legend_pos = "none"
)
)
multiple waffle plots

Pictorial Charts in R

Pictorial charts show data scaled in picture or image form instead of bars or columns. They are also called pictogram charts. Let's create fake data for illustrative purpose.

df22 <- data.frame(
x = sort(LETTERS[1:5], decreasing = TRUE),
y = sort(sample(20:80,5))
)

x y
1 E 27
2 D 29
3 C 45
4 B 46
5 A 78
e_pictorial(value, symbol) function is used for pictorial plots. The second parameter symbol refers to built-in symbols like circle, rect, roundRect, triangle, diamond, pin, arrow, icon, images and SVG Path. Built-in symbols can be used like symbol = "rect"

library(echarts4r)
library(echarts4r.assets)

df22 %>%
e_charts(x) %>%
e_pictorial(y, symbol = ea_icons("user"),
symbolRepeat = TRUE, z = -1,
symbolSize = c(20, 20)) %>%
e_theme("westeros") %>%
e_title("People Icons") %>%
e_flip_coords() %>%
# Hide Legend
e_legend(show = FALSE) %>%
# Remove Gridlines
e_x_axis(splitLine=list(show = FALSE)) %>%
e_y_axis(splitLine=list(show = FALSE)) %>%
# Format Label
e_labels(fontSize = 16, fontWeight ='bold', position = "right", offset=c(10, 0))
Add Images in Chart
If you are using images, make sure to precede it with image:// before image address. In the code below, we have used paste0( ) function to concatenate it before image address.

Unity <- "https://im.rediff.com/news/2018/oct/29statue-of-unity.png"
Buddha <-"http://im.rediff.com/news/2018/oct/29spring-temple-buddha-china.png"

data <- data.frame(
x = c("Statue of Unity", "Spring Temple Buddha"),
value = c(182, 129),
symbol = c(paste0("image://", Unity),
paste0("image://", Buddha))
)

data %>%
e_charts(x) %>%
e_pictorial(value, symbol) %>%
e_theme("westeros") %>%
e_legend(FALSE) %>%
# Title Alignment
e_title("Statues Height", left='center', padding=10) %>%
e_labels(show=TRUE) %>%
e_x_axis(splitLine=list(show = FALSE)) %>%
e_y_axis(show=FALSE, min=0,max=200, interval=20, splitLine=list(show = FALSE))
Pencil Chart in R
Instead of bars, we are using pencil to show comparison of values.

df02 <- data.frame(
x = LETTERS[1:10],
y = sort(sample(10:80,10), decreasing = TRUE)
)

df02 %>%
e_charts(x) %>%
e_pictorial(y, symbol = paste0("image://","https://1.bp.blogspot.com/-klwxpFekdEQ/XOubIhkalyI/AAAAAAAAHlE/25psl9x4oNkbJoLc2CKTXgV2pEj6tAvigCLcBGAs/s1600/pencil.png")) %>%
e_theme("westeros") %>%
e_title("Pencil Chart", padding=c(10,0,0,50))%>%
e_labels(show = TRUE)%>%
e_legend(show = FALSE) %>%
e_x_axis(splitLine=list(show = FALSE)) %>%
e_y_axis(show=FALSE, splitLine=list(show = FALSE))

Fill Male, Female Icons based on percentage

To find SVG Path, download desired SVG file from https://iconmonstr.com/ and open it in chrome and then find path in page source.

gender = data.frame(gender=c("Male", "Female"), value=c(65, 35),
path = c('path://M18.2629891,11.7131596 L6.8091608,11.7131596 C1.6685112,11.7131596 0,13.032145 0,18.6237673 L0,34.9928467 C0,38.1719847 4.28388932,38.1719847 4.28388932,34.9928467 L4.65591984,20.0216948 L5.74941883,20.0216948 L5.74941883,61.000787 C5.74941883,65.2508314 11.5891201,65.1268798 11.5891201,61.000787 L11.9611506,37.2137775 L13.1110872,37.2137775 L13.4831177,61.000787 C13.4831177,65.1268798 19.3114787,65.2508314 19.3114787,61.000787 L19.3114787,20.0216948 L20.4162301,20.0216948 L20.7882606,34.9928467 C20.7882606,38.1719847 25.0721499,38.1719847 25.0721499,34.9928467 L25.0721499,18.6237673 C25.0721499,13.032145 23.4038145,11.7131596 18.2629891,11.7131596 M12.5361629,1.11022302e-13 C15.4784742,1.11022302e-13 17.8684539,2.38997966 17.8684539,5.33237894 C17.8684539,8.27469031 15.4784742,10.66467 12.5361629,10.66467 C9.59376358,10.66467 7.20378392,8.27469031 7.20378392,5.33237894 C7.20378392,2.38997966 9.59376358,1.11022302e-13 12.5361629,1.11022302e-13',
'path://M28.9624207,31.5315864 L24.4142575,16.4793596 C23.5227152,13.8063773 20.8817445,11.7111088 17.0107398,11.7111088 L12.112691,11.7111088 C8.24168636,11.7111088 5.60080331,13.8064652 4.70917331,16.4793596 L0.149791395,31.5315864 C-0.786976655,34.7595013 2.9373074,35.9147532 3.9192135,32.890727 L8.72689855,19.1296485 L9.2799493,19.1296485 C9.2799493,19.1296485 2.95992025,43.7750224 2.70031069,44.6924335 C2.56498417,45.1567684 2.74553639,45.4852068 3.24205501,45.4852068 L8.704461,45.4852068 L8.704461,61.6700801 C8.704461,64.9659872 13.625035,64.9659872 13.625035,61.6700801 L13.625035,45.360657 L15.5097899,45.360657 L15.4984835,61.6700801 C15.4984835,64.9659872 20.4191451,64.9659872 20.4191451,61.6700801 L20.4191451,45.4852068 L25.8814635,45.4852068 C26.3667633,45.4852068 26.5586219,45.1567684 26.4345142,44.6924335 C26.1636859,43.7750224 19.8436568,19.1296485 19.8436568,19.1296485 L20.3966199,19.1296485 L25.2043926,32.890727 C26.1862111,35.9147532 29.9105828,34.7595013 28.9625083,31.5315864 L28.9624207,31.5315864 Z M14.5617154,0 C17.4960397,0 19.8773132,2.3898427 19.8773132,5.33453001 C19.8773132,8.27930527 17.4960397,10.66906 14.5617154,10.66906 C11.6274788,10.66906 9.24611767,8.27930527 9.24611767,5.33453001 C9.24611767,2.3898427 11.6274788,0 14.5617154,0 L14.5617154,0 Z'))

gender %>%
e_charts(gender) %>%
e_x_axis(splitLine=list(show = FALSE),
axisTick=list(show=FALSE),
axisLine=list(show=FALSE),
axisLabel= list(show=FALSE)) %>%
e_y_axis(max=100,
splitLine=list(show = FALSE),
axisTick=list(show=FALSE),
axisLine=list(show=FALSE),
axisLabel=list(show=FALSE)) %>%
e_color(color = c('#69cce6','#eee')) %>%
e_pictorial(value, symbol = path, z=10, name= 'realValue',
symbolBoundingData= 100, symbolClip= TRUE) %>%
e_pictorial(value, symbol = path, name= 'background',
symbolBoundingData= 100) %>%
e_labels(position = "bottom", offset= c(0, 10),
textStyle =list(fontSize= 20, fontFamily= 'Arial',
fontWeight ='bold',
color= '#69cce6'),
formatter="{@[1]}% {@[0]}") %>%
e_legend(show = FALSE) %>%
e_theme("westeros")

Show icon as label in plot

In label =, mention unicode of the fontawesome icon.

library(ggplot2)
ggplot (mtcars) +
geom_text( aes ( mpg , wt , colour = factor ( cyl )),
label = "\uf1b9" ,
family = "FontAwesome" ,
size = 7)

Python Pandas : Drop columns from Dataframe

$
0
0
In this tutorial, we will cover how to remove or drop one or multiple columns from pandas dataframe.
What is pandas in Python?
pandas is a python package for data manipulation. It has several functions for the following data tasks:
  1. Drop or Keep rows and columns
  2. Aggregate data by one or more columns
  3. Sort or reorder data
  4. Merge or append multiple dataframes
  5. String Functions to handle text data
  6. DateTime Functions to handle date or time format columns
drop columns python
Import or Load Pandas library
To make use of any python library, we first need to load them up by using import command.
import pandas as pd
import numpy as np
Let's create a fake dataframe for illustration
The code below creates 4 columns named A through D.
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
          A         B         C         D
0 -1.236438 -1.656038 1.655995 -1.413243
1 0.507747 0.710933 -1.335381 0.832619
2 0.280036 -0.411327 0.098119 0.768447
3 0.858730 -0.093217 1.077528 0.196891
4 -0.905991 0.302687 0.125881 -0.665159
5 -2.012745 -0.692847 -1.463154 -0.707779

Drop a column in python

In pandas, drop( ) function is used to remove column(s).axis=1 tells Python that you want to apply function on columns instead of rows.
df.drop(['A'], axis=1)
Column A has been removed. See the output shown below.
          B         C         D
0 -1.656038 1.655995 -1.413243
1 0.710933 -1.335381 0.832619
2 -0.411327 0.098119 0.768447
3 -0.093217 1.077528 0.196891
4 0.302687 0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779
In order to create a new dataframe newdf storing remaining columns, you can use the command below.
newdf = df.drop(['A'], axis=1)
To delete the column permanently from original dataframe df, you can use the option inplace=True
df.drop(['A'], axis=1, inplace=True)
#Check columns in df after dropping column A
df.columns

Output
Index(['B', 'C', 'D'], dtype='object')

Remove Multiple Columns in Python

You can specify all the columns you want to remove in a list and pass it in drop( ) function.
Method I
df2 = df.drop(['B','C'], axis=1)
Method II
cols = ['B','C']
df2 = df.drop(cols, axis=1)
Select or Keep Columns
If you wish to select a column (instead of drop), you can use the command
df['A']
To select multiple columns, you can submit the following code.
df[['A','B']]

How to drop column by position number from pandas Dataframe?

You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.
df.drop(df.columns[0], axis =1)
To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].
cols = [0,2]
df.drop(df.columns[cols], axis =1)

Drop columns by name pattern

df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})
   X1  X_2  YX  Y_1  Z
0 1 2 3 2 5
1 2 3 4 3 6
2 3 4 5 4 7
3 4 5 6 5 8
4 5 6 7 6 9

Drop column whose name starts with letter 'X'

df.loc[:,~df.columns.str.contains('^X')]
How it works?
  1. ^X is a expression of regex language which refers to beginning of letter 'X'
  2. df.columns.str.contains('^X') returns array [True, True, False, False, False].
    True where condition meets. Otherwise False
  3. Sign ~ refers to negate the condition.
  4. df.loc[ ] is used to select columns
It can also be written like :
df.drop(df.columns[df.columns.str.contains('^X')], axis=1)
Other Examples
#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]

#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]

#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]

Drop columns where percentage of missing values is greater than 50%

df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
'B':[4,np.nan,np.nan,5,np.nan]
})
% of missing values can be calculated by mean of NAs in each column.
cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)

Python Matplotlib Tutorial – Learn Plotting in 3 hours

$
0
0
This tutorial outlines how to perform plotting and data visualization in python using Matplotlib library. The objective of this post is to get you familiar with the basics and advanced plotting functions of the library. It contains several examples which will give you hands-on experience in generating plots in python.
Table of Contents

What is Matplotlib?

It is a powerful python library for creating graphics or charts. It takes care of all of your basic and advanced plotting requirements in Python. It took inspiration from MATLAB programming language and provides a similar MATLAB like interface for graphics. The beauty of this library is that it integrates well with pandas package which is used for data manipulation. With the combination of these two libraries, you can easily perform data wrangling along with visualization and get valuable insights out of data. Like ggplot2 library in R, matplotlib library is the grammar of graphics in Python and most used library for charts in Python.
visualization python

Basics of Matplotlib

First step you need to install and load matplotlib library. It must be already installed if you used Anaconda for setting up Python environment.
Install library
If matplotlib is not already installed, you can install it by using the command
pip install matplotlib
Import / Load Library
We will import Matplotlib’s Pyplot module and used alias or short-form as plt
from matplotlib import pyplot as plt
Elements of Graph
Different elements or parts of a standard graph are shown in the image below -
basics of plot
Figure
You can think of the figure as a big graph consisting of multiple sub-plots. Sub-plot can be one or more than one on a figure. In graphics world, it is called 'canvas'.
figure vs axes
Axes
You can call them 'sub-plots'.
Axis
It's the same thing (x or y-axis) which you studied in school or college. A standard graph shows the marks on the axis. In matplotlib library, it is called ticks and text or value in ticks is called ticklabels.
Basic Plot
x = [1, 2, 3, 4, 5]
y = [5, 7, 3, 8, 4]
plt.bar(x,y)
plt.show()
bar plot python
If you are using Jupyter Notebook, you can submit this command %matplotlib inline once to display or show plots automatically without need to enter plt.show() after generation of each plot.

Functions used for different types of plots

The following tables explain different graphs along with functions defined for these graphs in matplotlib library.
Type of PlotFunction
line plot (Default)plt.plot(  )
vertical bar plotsplt.bar(  )
horizontal bar plotsplt.barh(  )
histogramplt.hist(  )
boxplotplt.box(  )
area plotsplt.area(  )
scatter plotsplt.scatter(  )
pie plotsplt.pie(  )
hexagonal bin plotsplt.hexbin(  )
READ MORE »

Python : How to read CSV file with pandas

$
0
0
This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file.
Table of Contents

Install and Load Pandas Package
Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command pip install pandas. Next step is to load the package by running the following command. pd is an alias of pandas package. We will use it instead of full name "pandas".
import pandas as pd
Create Sample Data for Import
The program below creates a sample pandas dataframe which can be used further for demonstration.

dt = {'ID': [11, 12, 13, 14, 15],
'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary'])
The sample data looks like below -

ID first_name company salary
0 11 David Aon 74
1 12 Jamie TCS 76
2 13 Steve Google 96
3 14 Stevart RBS 71
4 15 John . 78
Save data as CSV in the working directory
Check working directory before you save your datafile.

import os
os.getcwd()
Incase you want to change the working directory, you can specify it in under os.chdir( ) function. Single backslash does not work in Python so use 2 backslashes while specifying file location.

os.chdir("C:\\Users\\DELL\\Documents\\")
The following command tells python to write data in CSV format in your working directory.

mydt.to_csv('workingfile.csv', index=False)

Example 1 : Read CSV file with header row

It's the basic syntax of read_csv() function. You just need to mention the filename. It assumes you have column names in first row of your CSV file.

mydata = pd.read_csv("workingfile.csv")
It stores the data the way It should be as we have headers in the first row of our datafile. It is important to highlight that header=0 is the default value. Hence we don't need to mention the header= parameter. It means header starts from first row as indexing in python starts from 0. The above code is equivalent to this line of code. pd.read_csv("workingfile.csv", header=0)
Inspect data after importing

mydata.shape
mydata.columns
mydata.dtypes
It returns 5 number of rows and 4 number of columns. Column Names are ['ID', 'first_name', 'company', 'salary']

See the column types of data we imported. first_name and company are character variables. Remaining variables are numeric ones.


ID int64
first_name object
company object
salary int64

Example 2 : Read CSV file with header in second row

Suppose you have column or variable names in second row. To read this kind of CSV file, you can submit the following command.
mydata = pd.read_csv("workingfile.csv", header = 1)
header=1 tells python to pick header from second row. It's setting second row as header. It's not a realistic example. I just used it for illustration so that you get an idea how to solve it. To make it practical, you can add random values in first row in CSV file and then import it again.

11 David Aon 74
0 12 Jamie TCS 76
1 13 Steve Google 96
2 14 Stevart RBS 71
3 15 John . 78
Define your own column names instead of header row from CSV file

mydata0 = pd.read_csv("workingfile.csv", skiprows=1, names=['CustID', 'Name', 'Companies', 'Income'])
skiprows = 1 means we are ignoring first row and names= option is used to assign variable names manually.

CustID Name Companies Income
0 11 David Aon 74
1 12 Jamie TCS 76
2 13 Steve Google 96
3 14 Stevart RBS 71
4 15 John . 78
READ MORE »

Python list comprehension with Examples

$
0
0
This tutorial covers how list comprehension works in Python. It includes many examples which would help you to familiarize the concept and you should be able to implement it in your live project at the end of this lesson.
Table of Contents

What is list comprehension?

Python is an object oriented programming language. Almost everything in them is treated consistently as an object. Python also features functional programming which is very similar to mathematical way of approaching problem where you assign inputs in a function and you get the same output with same input value. Given a function f(x) = x2, f(x) will always return the same result with the same x value. The function has no "side-effect" which means an operation has no effect on a variable/object that is outside the intended usage. "Side-effect" refers to leaks in your code which can modify a mutable data structure or variable.
List comprehension is a part of functional programming which provides a crisp way to create lists without writing a for loop.
list comprehension python
In the image above, the for clause iterates through each item of list. if clause filters list and returns only those items where filter condition meets. if clause is optional so you can ignore it if you don't have conditional statement.

[i**3 for i in [1,2,3,4] if i>2] means take item one by one from list [1,2,3,4] iteratively and then check if it is greater than 2. If yes, it takes cube of it. Otherwise ignore the value if it is less than or equal to 2. Later it creates a list of cube of values 3 and 4. Output : [27, 64]

List Comprehension vs. For Loop vs. Lambda + map()

All these three have different programming styles of iterating through each element of list but they serve the same purpose or return the same output. There are some differences between them as shown below.
1. List comprehension is more readable than For Loop and Lambda function.
List Comprehension

[i**2 for i in range(2,10)]
For Loop

sqr = []
for i in range(2,10):
sqr.append(i**2)
sqr
Lambda + Map

list(map(lambda i: i**2, range(2, 10)))

Output
[4, 9, 16, 25, 36, 49, 64, 81]
List comprehension is performing a loop operation and then combines items to a list in just a single line of code. It is more understandable and clearer than for loop and lambda.

range(2,10) returns 2 through 9 (excluding 10).

**2 refers to square (number raised to power of 2). sqr = [] creates empty list. append( ) function stores output of each repetition of sequence (i.e. square value) in for loop.

map( ) applies the lambda function to each item of iterable (list). Wrap it in list( ) to generate list as output

READ MORE »
Viewing all 425 articles
Browse latest View live