Quantcast
Channel: ListenData
Viewing all 425 articles
Browse latest View live

Excel : Count Unique values in a column

$
0
0
This tutorial explains how to count unique values in a column.

Sample Data
Sample Data
Formula
=SUMPRODUCT(1/COUNTIF(B3:B15,B3:B15))
Logic
The text "Jhonson" appears 3 times so the unique value would be equal to  (1/3) + (1/3) + (1/3) = 1

How Formula works

COUNTIF counts the number of times each value appears.
COUNTIF Formula Evaluation
Then all the values are divided by 1 and SUMPRODUCT sums all the fraction values.

Count Unique Values (Ignoring Blank Cells)
Data with Blank cells
Formula (Ignoring Blank cells)
=SUMPRODUCT((B3:B15<>"")/COUNTIF(B3:B15,B3:B15&""))

Related Articles
1. 3 Ways to extract unique values
2. Count Unique values in multiple columns
3. Select and Count Duplicate values in Excel

Count Unique values based on multiple columns

$
0
0
This tutorial explains how to count unique values based on multiple columns in Excel.

Sample Data
Sample Data
Formula
=SUMPRODUCT((1/COUNTIFS(B3:B15,B3:B15,C3:C15,C3:C15)))
Logic
The combination 1 and Jhonson appears 2 times so the unique value would be equal to  (1/2) + (1/2) = 1
How Formula Works

COUNTIFS counts the number of times the values appear based on multiple criteria.
COUNTIFS Formula Evaluation
Then all the values are divided by 1 and SUMPRODUCT sums all the fraction values.

Count Unique Values (More than 2 columns)
Sample Data
=SUMPRODUCT((1/COUNTIFS(B3:B15,B3:B15,C3:C15,C3:C15,D3:D15,D3:D15)))

Related Articles
1. 3 Ways to extract unique values
2. Count Unique values in a column
3. Select and Count Duplicate values in Excel

Excel : Select and Count Duplicate Values

$
0
0
This tutorial explains how to select and count duplicate values in Excel.

Sample Data
Sect Duplicate Values

Formula : See the Duplicate Values

Step I :
Paste =COUNTIF($B$3:$B$15,B3) in cell C3 and paste it down till cell C15
Step II :
 Apply filter on column C and uncheck 1. It filters duplicate values.
Count Duplicate Values
=SUMPRODUCT((1/COUNTIF(B3:B15,B3:B15)<1)*((1/COUNTIF(B3:B15,B3:B15))))
Related Articles
1. 3 Ways to extract unique values
2. Count Unique values in a column
3. Count Unique values in multiple columns

Send SAS Output to Excel

$
0
0
This tutorial explains how to send SAS results (output) to Excel.

Example 1 :

Create a new sheet for each unique value in the grouping variable (By Group)
ods tagsets.excelxp file="C:\Users\Deepanshu\test.xls"
options(embedded_titles="yes"
autofilter="1-3"
frozen_headers="3"
frozen_rowheaders="1"
absolute_column_width="8.5,11,7,9,8,8"
autofit_height="yes"
sheet_interval="bygroup"sheet_label=""
suppress_bylines="yes") style=normal;

proc print data=sashelp.shoes noobs;
title "Detail of Region #byval(region)";
by region;
run;

ods tagsets.excelxp close;
Create Multi-Sheet Excel File
The SHEET_INTERVAL= option is used to define the interval in which to create new worksheets.

Example 2 :

Define names of sheets manually 
ods tagsets.excelxp file='C:\Users\Deepanshu\Documents\multitable.xls' style=STATISTICAL
options(sheet_name='Summary'skip_space='1,0,0,0,1' EMBEDDED_TITLES='yes'sheet_interval='none');

Title " First File";
proc freq data = sashelp.class;
table sex;
run;

Title " Second File";
proc print data = sashelp.cars;
run;

ods tagsets.excelxp options(sheet_name='FREQ' skip_space='1,0,0,0,1' EMBEDDED_TITLES='yes'sheet_interval='none');
Title " Third File";

proc freq data = sashelp.cars;
table make;
run;

ods tagsets.excelxp close;
Example 3 :

Apply Custom Format of Excel
data temp;
pct= 0.75;
number= -45;
run;

ods tagsets.excelxp file="C:\Users\Deepanshu\temp.xls";

proc print data=temp noobs;
var pct;
var number / style(data)={tagattr="format:$#,##0_);[Red]($#,##0)"};
format pct percent5.2;
run;

ods tagsets.excelxp close;
Excel's Custom Format via SAS
Important Note
ODS TAGSETS.EXCELXP does not support graphs (charts). From SAS 9.4, SAS added new ODS called ODS EXCEL that supports both graphs and tables.

ODS EXCEL
ods excel file="c:\test.xlsx"
options(start_at="B5“
tab_color="red"
absolute_row_height="15"
embedded_titles="yes");

ods text="Sales report for company X";
proc print data=sashelp.orsales;
title "Sample title showing new
features";
run;

ods excel close;
ODS Excel- PROC MSChart
proc sql;
create table summary as
select(region), sum(sales) format=dollar14.2 as sales
from sashelp.shoes
group by region;
run;
quit; 
ods excel file="c:\temp.xlsx";
title "Sales by Region";
proc mschart data=work.summary category=region width=4in position="$D$1";
where region in("Africa","Asia","Canada","Pacific","United States");
vcolumn sales;
run;
ods excel close; 

Excel Array Formulas Examples

$
0
0
This tutorial explains how to use array formulas in real-world data problems.

Array Formulas
Array formulas are confirmed by pressing Ctrl+Shift+Enter rather than simple ENTER.
Download Workbook

Example I

Suppose you are asked to calculate the sum of the products of column A and column B.
Array Formula Examples
=SUM((A3:A8)*(B3:B8)) is equivalent to [(A3*B3)+ (A4*B4)+....+ (A8*B8)]
Hit CTRL SHIFT ENTER to confirm =SUM((A3:A8)*(B3:B8)). If you hit it correctly, you would see the formula wrapped in curley brackets {}.
 Example II

Suppose you are asked to maximum value based on multiple criteria. The data are shown below -
Maximum Value based on Multiple Conditions
=MAX(IF((A3:A11="High")*(B3:B11="Low"),C3:C11))
Example III

Suppose you are asked to find out the top3 scorers in a class given the students can have same score.
Array Formulas Examples
=INDEX($A$3:$A$9,MATCH(LARGE($B$3:$B$9-ROW($B$3:$B$9)/10^5,ROWS(A$1:A1)),$B$3:$B$9-ROW($B$3:$B$9)/10^5,0))

1. INDEX- MATCH looks for a value in the right column of a table and returns a value in the same row from a column you specify.

2. MATCH always return the first occurrence so we first need to create unique values in range.
$B$3:$B$9-ROW($B$3:$B$9)/10^5
See how the above formula evaluates -
B3 - ROW(B3)/10^5 = 77-(3/100000)
B4 - ROW(B4)/10^5 = 95-(4/100000)
.
.
B9 - ROW(B9)/10^5 = 85-(9/100000)

It gives us unique values -
(76.99997, 94.99996,88.99995,52.99994,94.99993,48.99992,84.99991)

Example IV

We need to extract UNIQUE names from the list.
Extract Unique List
Enter the following formula in cell C3
=INDEX($A$3:$A$9,MATCH(0,COUNTIF($C$2:$C2,$A$3:$A$9),0))

Example V : Sum of Digits
Sum of Digits
=SUM(MID(B3,ROW(INDIRECT("1:"& LEN(B3))),1)*1)
 The INDIRECT function returns a set of text values, in this case the values 1 through number of digits in a cell. The ROW function in turn generates a  multi-cell column array.
ROW(INDIRECT("1:"& LEN(B3)) evaluates to ROW(1:5) as the number of digits in cell B3 is 5.

Example VI : Count Uppercase and lowercase letters
Count Uppercase and Lowercase Letters
Count Uppercase letters
=SUMPRODUCT(LEN(A3)-LEN(SUBSTITUTE(A3,CHAR(ROW(INDIRECT("65:90"))),""))) 
CHAR(ROW(INDIRECT("65:90"))) generates capital A through Z.

Count Lowercase letters
=SUMPRODUCT(LEN(A3)-LEN(SUBSTITUTE(A3,CHAR(ROW(INDIRECT("97:122"))),""))) 
CHAR(ROW(INDIRECT("97:122"))) generates lowercase a through z.

Download Workbook

Send Emails with Attachments via SAS

$
0
0
This tutorial explains how to send emails with attachments via SAS.

Send Emails with Attachments and cc Options
data _null_;
file sendit email
from="abc@provider.com"
to=("xyz@provider.com")
cc=("uvxyz@provider.com"
"pqrs@provider.com")
subject="Important Document"
importance="High"
attach=("C:\xxxx\Proc_logistic.xls");
put "Please find attached the file";
put;
put "Thanks!";
run;
Note : You can also add BCC option. For eg. bcc="abc@site.com"

If Error sending XLSX files
"Excel found unreadable content in "temp.xlsx". Do you want to recover the contents of this workbook? If you trust the source of this workbook, click yes. Excel cannot open the file "temp.xlsx" because the file format or the file extension is not valid. Verify that the file has not been corrupted and the the file extension matches the format of the file."

Solution : Use content_type option in the attach statement.
attach=("C:\xxxx\Proc_logistic.xlsx"content_type="application/xlsx")
List of Content_type options with file extensions

ExtensionContent_type
bmpimage/bmp
csvapplication/vnd.ms-excel
docapplication/msword
exeapplication/octet-stream
gifimage/gif
htmapplication/html
htmlapplicaton/html
jpegimage/jpeg
jpgimage/jpeg
logtext/plain
pdfapplication/pdf
pngimage/png
pptapplication/vnd.ms-powerpoint
sas7bdatapplication/sas
tarapplication/x-tar
texttext/plain
txttext/plain
xlsapplication/excel
xlsxapplication/xlsx
zipapplication/x-zip-compressed

Send Multiple Attachments
data _null_;
file sendit email
from="abc@provider.com"
to=("xyz@provider.com")
cc=("uvxyz@provider.com"
"pqrs@provider.com")
subject="Important Document"
importance="High"
attach=("C:\Deepanshu\Proc_logistic.xlsx" content_type="application/xlsx"
"C:\Deepanshu\Summary.pdf" content_type="application/pdf"
"C:\Deepanshu\Summary.doc" content_type="application/word");
put "Please find attached the file";
put;
put "Thanks!";
put;
run;

Excel : Custom Number Formats Examples

$
0
0
In this tutorial, you will learn some useful custom formats in excel.

1. Add prefix "X" to cells.
Custom Format : "X"#
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.
Custom Format in Excel

2. Add suffix "X" to cells.
Custom Format : #"X"
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

3. Round the number to nearest whole number.
Custom Format : #
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

4. Color numeric value as red and text as green.
Custom Format : [Red]#;[Green]@
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

5. Number >= 100 colored as green else colored as red
Custom Format : [Green][>=100]#;[Red][<100]#
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.


6. Amount showed in whole number and percent in percent format
Custom Format : [>=1] #,##0;[>0] 0%;0
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

7. Large Numbers showed in millions
Custom Format : 0.0,," millions"
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

8. Large Numbers showed in thousands
Custom Format : 0.0," K"
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

9. Format Number in thousands only when they are >= 1000
Custom Format : [>=1000]#,##0,"K";0
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

10. Format number (20152210) as date (2015-22-10)
Custom Format : #-##-##
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

11. Format number 22102015 as date (22-10-2015)
Custom Format : ##-##-####
How to Use : Press CTRL 1 and Select Custom Tab under Number Category and type the above custom format.

12. Convert a number to real date value format

Suppose 20111208 is entered in cell A1. Enter the formula =--TEXT(A1,"0000\-00\-00")
It will return 40885 and change the format to Date.

Automatically Create Model Formula in R

$
0
0
The following method creates model formula in an automated fashion.
names(mydata) <- make.names(names(mydata))
y <- "Class"
x <- names(mydata)[!names(mydata) %in% y]
mymodel <- as.formula(paste(y, paste(x, collapse="+"), sep="~"))
lm(mymodel, data=mydata)

Connect to Teradata using SAS

$
0
0
This tutorial explains how to connect to teradata using SAS.

Write queries with Teradata SQL syntax

In simple words, we are creating Teradata SQL statements and then pass them to the Teradata server for execution. Only Teradata SQL functions would work within "connection to teradata" code. For example, INPUT PUT functions would not work. Instead, cast function would be used for changing variable type.
proc sql;
   connect to teradata (user="youruserid" password="yourpassword" server="servername" mode=teradata);
   create table temp as
   select * from connection to teradata (
      select a.ID
           , a.Product
           , b.Income
      from tdbase.customer a
      join tdbase.income b
      on a.ID=b.ID
      );
  disconnect from teradata;
quit;
Note :
  1. user = provide username of your teradata account.
  2. password =  provide password of your teradata account.
  3. server = provide server name

Creating Teradata Volatile Tables
proc sql;
 connect to teradata (user="youruserid" password="yourpassword" mode=teradata  server="servername" connection=global);
 execute(
 create volatile table temp as (
 select id
 , region
 , sector
 , income
 from ls_policy_inter
 group by 1,2
 )
 with data primary index (id)
 on commit preserve rows
 ) by teradata;
quit;

Important Teradata Functions inside SAS

The following code would work inside the EXECUTE BY function.
qualify rank() over ( partition by region order by income desc ) = 1
  1. QUALIFY - similar to HAVING clause
  2. RANK()- rank values
  3. OVER - define the criteria
  4. PARTITION - similar to GROUP BY
  5. ROW_NUMBER - row number (similar to _N_ in data step)

SAS : WildCard Character

$
0
0
In this tutorial, you will learn how to use wildcard character in SAS.

Example 1 : Keep all the variables start with 'X'

DATA READIN;
INPUT ID X1 X_T $;
CARDS;
2 3 01
3 4 010
4 5 022
5 6 021
6 7 032
;
RUN;
DATA READIN2;
SET READIN (KEEP = X:);
RUN;

The COLON (:) tells SAS to select all the variables starting with the character 'X'.

Example 2 : Subset data using wildcard character
DATA READIN2;
SET READIN;
IF X_T =: '01';
RUN;
In this case, the COLON (:) tells SAS to select all the cases starting with the character '01'.

Example 3 : Use of WildCard in IN Operator
DATA READIN2;
SET READIN;
IF X_T IN: ('01', '02');
RUN;
In this case, the COLON (:) tells SAS to select all the cases starting with the character '01' and '02'.

Example 4 : Use of WildCard in GT LT (> <) Operators
DATA READIN2;
SET READIN;
IF X_T >: '01';
RUN;
In this case, the COLON (:) tells SAS to select all the cases from character '01' up alphabetically.

Example 5 : WildCard in Function
data example3;
set temp2;
total =sum(of height:);
run;

Example 6 : WildCard in Array

proc transpose data = sashelp.class out=temp;
by name sex;
var height weight;
run;

proc transpose data = temp delimeter=_ out=temp2(drop=_name_);
by name;
var col1;
id _name_ sex;
run;

proc sql noprint;
select CATS('new_',name) into: newnames separated by ""
from dictionary.columns
where libname = "WORK" and memname = "TEMP2" and name like "height_%";
quit;

data temp2;
set temp2;
array h(*) height:;
array newh(*) &newnames.;
do i = 1 to dim(h);
newh{i} = h{i}*2;
end;
drop i;
run;

SAS : Count Distinct Values of Variables

$
0
0
This tutorial explains how to count distinct values of variables using PROC SQL and PROC FREQ. We will also check the performance of these two approaches.

PROC SQL : Count Distinct Values
proc sql;
create table t as
select count(distinct make) as n_make, count(distinct type) as n_type
,count(distinct origin) as n_origin
from sashelp.cars;
quit;

PROC FREQ : Count Distinct Values
ods output nlevels = charlevels;
proc freq data=sashelp.cars (keep = make type origin) nlevels;
tables make type origin / nopercent nocol nocum nofreq noprint;
run;
ods output close;

Performance Testing : PROC SQL vs. PROC FREQ

I have stacked raw data set multiple times (500 to 5000 times) to check the performance of these two procedures. In addition, i have added NOPRINT and ODS OUTPUT CLOSE to improve efficiency of the PROC FREQ code.
Count Distinct Values of a Variable
%macro stack (inputdata = sashelp.cars, iterations=, out=);

data temp;
length x $45.;
do i = 1 to &iterations.;
if i then x = "&inputdata.";
output;
end;
drop i;
run;

proc sql noprint;
select x into: n separated by ''
from temp;
quit;

data &out.;
set &n;
run;

ods output nlevels = charlevels;
proc freq data=outdata (keep = make type origin) nlevels;
tables make type origin / nopercent nocol nocum nofreq noprint;
run;
ods output close;

proc sql;
create table t as
select count(distinct make) as n_make, count(distinct type) as n_type
,count(distinct origin) as n_origin
from outdata;
quit;

%mend;

%stack (inputdata = sashelp.cars, iterations= 500, out= outdata);
%stack (inputdata = sashelp.cars, iterations= 1000, out= outdata);
%stack (inputdata = sashelp.cars, iterations= 2000, out= outdata);
%stack (inputdata = sashelp.cars, iterations= 5000, out= outdata);

Excel : OFFSET Function with Examples

$
0
0
In this tutorial, you will learn how to apply OFFSET function in MS Excel.

The Excel syntax for the OFFSET function is:

=OFFSET(referencerowscolumns, [height], [width])
It returns a reference to a range, from a given starting point with given height and width in cells.

1.Reference - Starting point in range.

2. Rows - Number of rows you want Excel to move from the starting point.

3. Columns - Number of columns you want Excel to move from the starting point.

4. Height [Optional] - Size of range with number of rows tall you want to return.

5. Width  [Optional] - Size of range with number of columns wide you want to return.


Examples : 

1. OFFSET(A1,1,0)

It means move down 1 row i.e. A2. Since height and width components are optional in OFFSET formula, you can skip their references.

2. OFFSET(A1,0,2)

It means move right 2 columns i.e. C1.

3. OFFSET(A1,0,0,2)

From A1 returns 2 rows tall range. That means A1:A2 

4. OFFSET(A1,0,0,1,2)

From A1 returns 2 columns wide range. That means A1:B1 

5. OFFSET(A1,0,0,2,2)

From A1 returns 2 rows tall and 2 columns wide range. That means A1:B2 

6. SUM(OFFSET(A1,1,2,2,1))

It means move down 1 row, 2 columns and then select 2 X 1 range. That means C2:C3

Real World Examples

Suppose you want to extract alternate values from a list.
Excel OFFSET Function
Formula

Suppose data starts from cell B3 (as shown in the image above). Type the following formula in cell C3.
=SUM(OFFSET(B$3,ROW(A1)*2-2,0))

ROW Function

It returns the row number of a reference.

Examples :

1. ROW(A1) returns 1.

2. ROW(B1) returns 2.

Scenario 2

Suppose you are asked to calculate cumulative sale. And the figure should be displayed in columns. Hence the formula should increment by row when you copy it horizontally.


Download the sample workbook

OFFSET - COLUMN

=SUM(OFFSET($A$1,0,0,COLUMN(B1)))

Evaluate Formula :

Let's translate the following formula into English :

=SUM(OFFSET($A$1,0,0,COLUMN(B1)))

Since COLUMN(B1) returns 2, the above formula means = SUM (A1 through 2 rows wide). That is equivalent to =SUM(A1:A3).

When you copy the above formula across columns, the column formula increments by one.

=SUM(OFFSET($A$1,0,0,COLUMN(C1)))

Creating Infographics with Excel

$
0
0
In this tutorial, you will learn how to create infographics with Excel.
Create Infographics with Excel

How to create Infographics with Excel

1. Type 1 in cell B2

2. Select cell B2 and Insert Bar Chart [Go to Insert tab >> Click on Bar >> Select 2D Clustered Chart]

3. Click on Bar and then select Format Data Point

4. Under 'Series Options', make Gap Width 0%
Format Data Point
5. Change the Chart Background to Light White [First Shade of White].
[ Go to Format tab >> Shape Fill]

6. Resize the height of chart so that image fits on the chart completely.

7. Copy Paste Image to Chart 10 times to cover the bar chart.

8. Align the images to chart [Select all the images >>  Go to Format tab >> Align Top]

9. Change cell B2 value to 0.8

Download Workbook

Creating Infographics with Excel Part II

$
0
0
In this tutorial, you will learn how to create Infographics with Excel. Check out Part I of this series -
Creating Infographics with Excel
Infographics with Excel
Steps

1. Enter data in excel (as shown below)
Enter Data
2. Go to Insert tab and then click on Bar Chart >> 2D Bar

3. Right Click on Chart and then click on Select Data

4. Under Legend Entries (Series) box >> Add >> Under Series Values :, box, select cells where in percentages (40% 60% 70%) are entered

5. Under Horizontal Axis Labels section, click on Edit button and then select cells wherein 2012 2013 2014 values are entered.

6. Select Picture Icon and copy

7. Select Bars and right click on it >> Format Data Series >> Click on Picture >> Clipboard

8. Select 'Stack and Scale with ' section and type 0.05 in the box

Free Data Sources for Predictive Modeling and Text Mining

$
0
0
The following is a list of free data sources that can be used for predictive modeling, machine learning and text mining projects.

1. Datasets for Regression and Classification

It contains many datasets that can be used for solving regression and classification problem statements. It is maintained by toronto university.


2. Datasets for Text Mining

It contains data and R code for a book named "R and Data Mining".

Link : http://www.rdatamining.com/data


3. Kaggle Competition

It is the best place to discover and analyze public available data. It is a repository of hundreds of public available datasets.

Link : https://goo.gl/fHKuII

4. UCI Machine Learning Repository

It is one of the biggest repository of public data sets that can be used for regression, classification and machine learning projects.

5. Amazon Public Datasets

It provides a centralized repository of public data sets at free of cost for the analytics community.

Link : http://goo.gl/jRMHro

6. Microsoft Research

It is a repository of many useful big datasets that can be used for practicing any data science and machine learning technique. For example, there is a dataset that identifies 38M tweets collected for the analysis of social media messages related to the 2012 U.S. Presidential election.


7. Yahoo Datasets

Yahoo has released the largest ever machine learning dataset for researchers and engineers.


8. IMDB Database

It is a database of IMDB files which can be used for text mining and other data science projects.

Link : http://www.imdb.com/interfaces

9. AppliedPredictiveModeling (R package)

It is a R package containing datasets mentioned in book "Applied Predictive Modeling" written by developer of one of the most popular R package 'caret'.


10. Machine Learning Data Set Repository

It is a repository of machine learning data. It contains hundreds of datasets for various streams.


11. Million Song Datasets

It is a collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio),


12. Doing Data Science Book

It is the sample dataset that accompanies Doing Data Science by Cathy O'Neil and Rachel Schutt.


13. Revolution R Datasets

It is a repository of sample datasets used in Revolution R (now Microsoft R).


Please provide the data source name in the comment box below that you found it useful and would like to add to this list

Free Ebooks on R, Python and Data Science

$
0
0
The following is a list of free ebooks (PDFs with data sets and codes) on R programming, Python and data science.
Please read the disclaimer about the Free Ebooks in this article at the bottom.

Free R Ebooks with Data Sets

1. R in a Nutshell

It provides a quick and practical guide to just about everything you can do with the open source R language and software environment. You’ll learn how to write R functions and use R packages to help you prepare, visualize, and analyze data.

Ebook - Link
Data sets - Link

2. Introduction to Statistical Learning with R

It covers some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more.

Ebook : Statistical Modeling with R
Data Sets and R Codes : Data Sets for Data Analysis


3. Machine Learning with R

It helps you to learn R from the machine learning perspective. It introduces R-Weka package – Weka is another open source software used extensively in academic research.

Ebook : Machine Learning with R

4. Elegant graphics with R

It covers the most popular package of R on graphics ‘ggplot 2’. It covers the package in detail.

Documentation and Data sets

Free Python Ebooks with Data Sets

1. Python for Data Analysis


It covers topics on data preparation, data munging, data wrangling. It introduces a friendly interface IPython to code. In addition, it also covers NumPy and Pandas.

Ebook - Python for Data Analysis

Data and Code

2. Practical Data Science Codebook

It covers a variety of situations with examples in the two most popular programming languages for data analysis - R and Python.

Ebook - Practical Data Science Codebook

3. Text Processing in Python

It is illustrative guide, which will walk you through all the text processing techniques in a step-by-step manner. It will demystify the advanced features of text analysis and text mining.

Ebook - Text Processing in Python

4. Natural Language Processing with Python

It offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies ranging from predictive text and email filtering to automatic summarization and translation.


Disclaimer :
Deepanshu Bhalla or ListenData has no affiliation to either the authors of the books or the web-sites hosting these PDF books shared in this post. We are not responsible for any content on other sites we link to. Most of the PDF links were gathered via Google search results in the first or second page and we assume they are hosted on either the authors' webpages or university sites. Please let me know if you think any PDF link posted is a copyright infringement, I will remove that link.

SAS Visual Analytics : Change Variable Classification

$
0
0
This tutorial explains how to change variable type (classification) from category to measure in SAS Visual Analytics.

Question
Suppose you have a numeric (continuous) variable that are classified as "Category" in data set. You want to convert it to "measure". By default, you cannot change "category" variables to "measure" variables.

Solution

Step I :
Select Data >> New Calculated Item (See the image below)

SAS Visual Analytics : New Calculated Item
Step 2 :
Enter a Name for the calculated item
Step 3 :

Enter the Parse function under the 'Text' section. In the following function, 'Order'is the variable that we want to change it from category to measure.
Parse('Order'n, 'F12.')
Change Variable Classification

SAS Visual Analytics : Dynamic Bar Chart

$
0
0
This tutorial explains how to create a dynamic bar chart with drop down in SAS Visual Analytics. It is very easy to use drop-down list under "section" prompt area and make interactive graphs or tables. However, it is a little tricky to use drop-down below section prompt area.

Steps to create drop down below section prompt area

1. Create a new parameter

Create New Parameter
2. Enter a name of the parameter and select Character under Type: box and make Current Value blank.
SAS Visual Analytics : Dynamic Dropdown

3. Select "Drop-Down list" under Containers option in Objects section.

4. Drag and drop "drop-down list" to area below section prompt

5. Select the Drop-Down and assign roles (See the image below)
SAS VA : Dynamic DropDown
6. Create a bar chart

7. Go to Filters and then click on Add Filter

8. Write a function 'Product Brand'n = 'Parameter 1'p (See the image below)
SAS Visual Analytics : Filters


Difference between WHERE and HAVING Clause in GROUP BY

$
0
0
This tutorial explains the difference between WHERE and HAVING clause in GROUP BY in SQL.

Sample Data
SQL : WHERE vs. HAVING
Task

First we need to filter out all the product codes having value greater than 100 and then sum up sale by ID. Then keep only those IDs having sum of sales less than or equal to 5000.

Create Sample Data in SAS
data temp;
input ID Sale ProductCode;
cards;
1 2500 35
1 3000 75
2 5000 65
2 3500 125
3 2500 25
3 2000 255
;
run;
SQL Code : Subsetting Data 
proc sql;
select ID, sum(Sale) as total_sale
from temp
where ProductCode <= 100
group by ID
having total_sale <= 5000;
quit;
Output Data
Key Difference
The WHERE condition is applied before the grouping occurs. Whereas, the HAVING condition is applied after the grouping occurs.

Import CSV File in Python

$
0
0
This tutorial explains how to import a CSV file into python. It outlines many examples of loading a CSV file into Python.

Import Module
import pandas as pd
Create Dummy Data for Import
dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary'])
Sample Data
Save data as csv in the working directory
mydt.to_csv('workingfile.csv', index=False)
Example 1 : Read CSV file with header row
mydata  = pd.read_csv("workingfile.csv")
Example 2 : Read CSV file without header row
mydata0  = pd.read_csv("workingfile.csv", header = None)
If you specify "header = None", python would assign a series of numbers starting from 0 to (number of columns - 1). See the output shown below -
Output
Example 3 : Specifying missing values
mydata00  = pd.read_csv("workingfile.csv", na_values=['.'])
Example 4 : Setting Index Column to ID
mydata01  = pd.read_csv("workingfile.csv", index_col ='ID')
Python : Setting Index Column

Example 5 : Read CSV File from URL
mydata02  = pd.read_csv("http://winterolympicsmedals.com/medals.csv")

Example 6 : Skip First 5 Rows While Importing CSV
mydata03  = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skiprows=5)
It reads data from 6th row (6th row would be a header row)

Example 7 : Skip Last 5 Rows While Importing CSV
mydata04  = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skip_footer=5)
It excludes last5 rows.

Example 8 : Read only first 5 rows
mydata05  = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=5)
Example 9 : Interpreting "," as thousands separator
mydata06 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", thousands=",")
Example 10 : Read only specific columns
mydata07 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=(1,5,7))
The above code reads only columns placed at first, fifth and seventh position. 
Viewing all 425 articles
Browse latest View live