Create Dummy Data in Python

This article explains various ways to create dummy or random data in Python for practice. Like R, we can create dummy data frames using pandas and numpy packages. Most of the analysts prepare data in MS Excel. Later they import it into Python to hone their data wrangling skills in Python. This is not an efficient approach. The efficient approach is to prepare random data in Python and use it later for data manipulation.

1. Enter Data Manually in Editor Window

The first step is to load pandas package and use DataFrame function

import pandas as pd
data = pd.DataFrame({"A" : ["John","Deep","Julia","Kate","Sandy"], 
"MonthSales" : [25,30,35,40,45]})

       A  MonthSales
0   John          25
1   Deep          30
2  Julia          35
3   Kate          40
4  Sandy          45

Note : Character values should be defined in single or double quotes.

2. Prepare Data using sequence of numeric and character values

Let's import two popular python packages for this task - string and numpy. The package string is used to generate series of alphabets. Whereas numpy package is used to generate sequence of numbers incremented by a specific value.

import pandas as pd
import string
import numpy as np
data2 = pd.DataFrame({"A": np.arange(1,10,2),
"B" : list(string.ascii_lowercase)[0:5],
                     })

Explanation
1. np.arange(1,10,2) tells python to generate values between 1 and 10, incremented by 2.
2.string.ascii_lowercase returns abcdefghijklmnopqrstuvwxyz. list(string.ascii_lowercase)[0:5] is used to pick first 5 letters.

3. Generate Random Data

In numpy, there are many functions to generate random values. The two most popular random functions are random.randint( ) and random.normal( )

import pandas as pd
import numpy as np
np.random.seed(1)
data3 = pd.DataFrame({"C" : np.random.randint(low=1, high=100, size=10),
"D"  : np.random.normal(0.0, 1.0, size=10)
                     })

    C         D
0  38 -0.528172
1  13 -1.072969
2  73  0.865408
3  10 -2.301539
4  76  1.744812
5   6 -0.761207
6  80  0.319039
7  65 -0.249370
8  17  1.462108
9   2 -2.060141

Explanation
np.random.seed(1) tells python to generate same random values with this seed when you run it next time. np.random.randint(low=1, high=100, size=10) returns 10 random values between 1 and 100. np.random.normal(0.0, 1.0, size=10) returns 10 random values following standard normal distribution having mean 0 and standard deviation 1.

Check mean and standard deviation of normal distribution

np.round(np.std(np.random.normal(0.0, 1.0, size=1000)))
np.round(np.mean(np.random.normal(0.0,1.0, size=1000)))

4. Create Categorical Variables

In this step, we will create two types of categorical variables :

Categories ranging from 1 to 4
Binary variable (0 / 1)

import pandas as pd
import numpy as np
np.random.seed(1)
data4 =pd.DataFrame({"X"  : np.random.choice(range(1,5), 20, replace=True),
"X1" : np.where(np.random.normal(0.0, 1.0, size=20)<=0,0,1)})

Explanation

np.random.choice(range(1,5), 20, replace=True) means generating 20 values from 1 to 4 (excluding 5) with replacement (i.e. repeated values).
np.where(np.random.normal(size=20)<=0,0,1) implies if random value is either zero or negative, make it 0. Otherwise 1. np.where( ) is used to construct IF-ELSE statement in python.

Like R's factor( ) function, you can define variable(s) as categorical variables. See the code below.

data4.X = data4.X.astype("category")
data4.X1 = data4.X1.astype("category")

5. Import CSV or Excel File

Using pandas functions read_csv( ) and read_excel( ) functions, you can read data from excel or CSV to Python.

import pandas as pd
mydata= pd.read_csv("C:\\Users\\Deepanshu\\samplefile.csv")
mydata = pd.read_excel("C:\\Users\\Deepanshu\\samplefile.xlsx")

Detailed Tutorial : How to import data in Python

Create Dummy Data in Python

Trending Articles

Benson Boone – Sorry I’m Here For Someone Else – Single [iTunes Plus M4A]

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Font Brazil World Cup 2004 kits

REQ: ReFX Nexus 5 Latin House Expansion

Practice Sheet of Right form of verbs for HSC Students

Moondru Mudichu 31-08-2017 – Polimer tv Serial

Chai Status, Funny Tea Quotes in Hindi, चाय पर शायरी

sunstar exam scanner pdf

SOFT COPY ZA NGAIZA CHEMISTRY

Shale Hill Secrets (Love-Joint) (ENG+RUS) [L] [10.56GB]

[GET] Dickie Bush and Nicholas Cole – Ghostwriter GPT ($350.00)

Tinny — Dzormo (Prod by Hammer)

Notorious Naushad of Ippa gang nabbed

Brunei reaffirms healthcare commitment

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Aaron Powers

Mp3 Download: Mr Raw - Adamma ft. Flavour & Harry B

collect2: error: ld returned 1 exit status while compiling openssl

Building Instruments With Logic Pro Samplers TUTORiAL-HiDERA

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB