Python - Geek went Freak!


Loading sklearn datasets

Datasets or toy datasets, as sklearn calls it, reside in sklearn.datasets package.

A dataset can be loaded by using sklearn.datasets.load_*() function.

In this post, let us consider iris dataset. iris dataset can be loaded using sklearn.datasets.load_iris().

By default sklearn provides datasets as sklearn.datasets.base.Bunch.

from sklearn.datasets import load_iris
irisData = load_iris()

The Bunch structure is convenient since it holds data, target, feature_names and and target fields are both numpy.ndarray containing independent and dependent variables respectively.

from sklearn.datasets import load_iris
irisData = load_iris()
print(type(, type(

sklearn datasets’ load methods can also provide the features and targets directly as numpy.ndarray by using the return_X_y argument.

from sklearn.datasets import load_iris
irisData = load_iris(return_X_y=True)

Loading statsmodels datasets

statsmodels comes with some sample datasets built-in. In this tutorial, we are going to learn how to use datasets in statsmodels.

The built-in datasets are available in package statsmodels.api.datasets.

In this tutorial lets explore statsmodels.api.datasets.fair.

One can load data from the datasets either as numpy.recarray or pandas.core.frame.DataFrame.

statsmodels.api.datasets.fair.load().data provides data as numpy.recarray.

statsmodels.api.datasets.fair.load_pandas().data provides data as pandas.core.frame.DataFrame.

The following code will display the dataset as table in ipython notebook.

import statsmodels.api as sm
dta = sm.datasets.fair.load_pandas().data

Negative values in numpy's randn

If you are used to rand function, which generates neat uniformly distributed random numbers in the range of [0, 1), you will be surprised when you use randn for the first time. For two reasons:

  1. randn generated negative numbers
  2. randn generates numbers greater than 1 and lesser than -1



lRandom = np.random.randn(10)
print(lRandom[lRandom < 0])

The above code produced the following output during a sample run:

[-0.52004631 -0.4080691 -0.04164258 -0.46942423 -0.84344794 -0.01001501]

Greater than 2

lRandom = np.random.randn(500)
lRandom[lRandom > 2]

The above code produced the following output during a sample run:

[ 2.09666448 2.29351194 2.16025808 2.78635893 2.3467666 2.54232853 2.35466425 2.26961216 2.62167745 2.0261606 2.00743211]


This is because randn unlike rand generates random numbers backed by normal distribution with mean = 0 and variance = 1.

If you plot the histogram of the samples from randn, it becomes quite obvious:

lRandom = np.random.randn(5000)

lHist, lBin = np.histogram(lRandom)

plot = plt.plot(lBin[:-1], lHist, 'r--', linewidth=1)

The above code produced the following output during a sample run:

Histogram of randn samples

Binary arithmetic using python

Convert unsinged integer to binary string



Convert binary string to unsigned integer



It also works if you try the binary string with the prefix ‘0b’. For example,


Convert signed integer to binary string

It is a little bit difficult to deal with negative numbers. Trying to convert it the same way we did with unsigned numbers doesn’t work as expected,



You would have expected a Two’s complement number as the output but it just prints the binary string of positive number with a ‘-’ prefix. This problem can be fixed by specifying the length of the bits you want as output.

bin(-10 & 0xff)


If you want the length to be dynamic,

int("1" * 8, 2)

Convert singed binary string to signed integer

I am not sure if there is a direct way to do this in python. If you find any please let me know! I have written a small function to do it,