Python - Geek went Freak!

# Python

Datasets or toy datasets, as sklearn calls it, reside in sklearn.datasets package.

A dataset can be loaded by using sklearn.datasets.load_*() function.

In this post, let us consider iris dataset. iris dataset can be loaded using sklearn.datasets.load_iris().

By default sklearn provides datasets as sklearn.datasets.base.Bunch.

from sklearn.datasets import load_iris
print(type(irisData))


The Bunch structure is convenient since it holds data, target, feature_names and target_names.data and target fields are both numpy.ndarray containing independent and dependent variables respectively.

from sklearn.datasets import load_iris
print(type(irisData))
print(type(irisData.data), type(irisData.target))
print(irisData.feature_names)
print(irisData.target_names)
print(irisData.data)
print(irisData.target)


sklearn datasets’ load methods can also provide the features and targets directly as numpy.ndarray by using the return_X_y argument.

from sklearn.datasets import load_iris
print(irisData[0])
print(irisData[1])


statsmodels comes with some sample datasets built-in. In this tutorial, we are going to learn how to use datasets in statsmodels.

The built-in datasets are available in package statsmodels.api.datasets.

In this tutorial lets explore statsmodels.api.datasets.fair.

One can load data from the datasets either as numpy.recarray or pandas.core.frame.DataFrame.

statsmodels.api.datasets.fair.load().data provides data as numpy.recarray.

statsmodels.api.datasets.fair.load_pandas().data provides data as pandas.core.frame.DataFrame.

The following code will display the dataset as table in ipython notebook.

import statsmodels.api as sm
dta


# Negative values in numpy's randn

If you are used to rand function, which generates neat uniformly distributed random numbers in the range of [0, 1), you will be surprised when you use randn for the first time. For two reasons:

1. randn generated negative numbers
2. randn generates numbers greater than 1 and lesser than -1

## Examples

### Negative

lRandom = np.random.randn(10)
print(lRandom[lRandom < 0])


The above code produced the following output during a sample run:

[-0.52004631 -0.4080691 -0.04164258 -0.46942423 -0.84344794 -0.01001501]

### Greater than 2

lRandom = np.random.randn(500)
lRandom[lRandom > 2]


The above code produced the following output during a sample run:

[ 2.09666448 2.29351194 2.16025808 2.78635893 2.3467666 2.54232853 2.35466425 2.26961216 2.62167745 2.0261606 2.00743211]

## Reason

This is because randn unlike rand generates random numbers backed by normal distribution with mean = 0 and variance = 1.

If you plot the histogram of the samples from randn, it becomes quite obvious:

lRandom = np.random.randn(5000)

lHist, lBin = np.histogram(lRandom)

plot = plt.plot(lBin[:-1], lHist, 'r--', linewidth=1)
plt.show()


The above code produced the following output during a sample run:

# Binary arithmetic using python

## Convert unsinged integer to binary string

bin(10)


‘0b1010’

## Convert binary string to unsigned integer

int('1010',2)


10

It also works if you try the binary string with the prefix ‘0b’. For example,

int('0b1010',2)


## Convert signed integer to binary string

It is a little bit difficult to deal with negative numbers. Trying to convert it the same way we did with unsigned numbers doesn’t work as expected,

bin(-10)


’-0b1010’

You would have expected a Two’s complement number as the output but it just prints the binary string of positive number with a ‘-’ prefix. This problem can be fixed by specifying the length of the bits you want as output.

bin(-10 & 0xff)


‘0b11110110’

If you want the length to be dynamic,

int("1" * 8, 2)


## Convert singed binary string to signed integer

I am not sure if there is a direct way to do this in python. If you find any please let me know! I have written a small function to do it,