Python Pandas: May 2020

Sunday, 17 May 2020

Python NumPy Performing Certain Operations Blog 3

Chapter 1 Python NumPy Blog 3

Performing Operations on NumPy  Slicing / Joining / Arithmetic / Statistical

Learning Objectives:

* Developing the logical approach when to extract an element
* Extracting a subset or a grid from an existing Array
* How to add/ change / delete an element or a subset of the array

Text highlighted in blue colour to be pen down in the IP register along with the code.

Operations On NumPy

1. Slicing
2. Joining
3. Arithmetic
4. Statistical

SLICING

Extracting an element or a set of elements from an existing 1D array or a subset of elements from a 2D array is called Slicing.

Ways in which slicing can be performed

1. Operator ':'
2. Slice( )

Operator ':'
Syntax  array_name[ start : stop : step] or array_name[ from: to: jump]

The slice returns the data from the array from the 'start' index and ends one element before the 'stop' index.
And if step value is mentioned then it step up that particular order of the data from the start index address.

In the above example nums is an array with data 10 to 90

Now in the code line 4 some_nums[2:7] means data stored at index address 2 to index address 7 will be sliced/extracted.

Let us see and understand with a few more code examples

In the above code line In[5]:

10 20 30 40 50 60 70 80 90

0 1 2 3 4 5 6 7 8

start value = 0 = start index address of slicing = 0

stop value = 9 = stop index address of slicing = 9-1 = 8

so slicing will be done on index range from 0 to 8

that is exactly our array [10 20 30 40 50 60 70 80 90]

Q^ WAP in NumPy to show the first 5 elements form the array of first 10 natural numbers.

Q^ WAP in NumPy to show the first 9 elements from the array of first 10 odd natural numbers.

Q^ Now, WAP to accept the limit value 'n'(<50) from user to show the first n elements of multiples of '5'.

Q^ WAP to accept the limit value 'n'(<20) from user to show the first n elements of multiples of '11'.

Negative Index Slicing

Always remember the stop value will show data/element up to one previous index address

Here we gave the slicing range index as [-11 to -6] but showed the value/element up to index -7.

Because the stop index value is always 1 less than the stop value= -6 -1 = -7

In the above example In[1]

start value = -11 =

stop value = -6

will slice data from index address -11 to -7 (-6 -1 = -7)

Let us see some more examples -

P.S. Negative indexes allow us to easily take n-last elements of a NumPy array.

The below code extracts the last three elements of the first 10 natural numbers.

Start Value = -3 which is negative index

stop Value = default in negative indexing = till -1 (no decrease / -1)

So will slice array from index address -3 to -1

Q^ WAP in NumPy to extract and show the last 5 elements of an array which holds first 10 even natural numbers.

Q^ WAP in NumPy to extract and show the last 3 elements of an array which holds multiple of 12. (upto 20th multiples of 12)

P.S. Negative indexes also allow us taking all the elements but not the 'n' last elements .

Q^ WAP in NumPy to extract and show other than the last 3 elements of an array which holds first 10 natural numbers.

Start Value = default (0)

Stop Value = -3 (-3 -1 = -4)

Start Value = -10 (as stop value is negative index)

Slice array value range = from -10 index address to - 4

Q^ WAP in NumPy to extract and show other than the last 6 elements of an array which holds first 10 odd natural numbers.

Q^ WAP in NumPy to extract and show other than the last 2 elements of an array which holds first 10 even natural numbers.

Q^ Now, WAP to accept the limit value 'n'(<20) from user to show other than the last n elements of multiples of '100' (limit upto 20 = 100x1 ..... 100x20)

Some advanced functionality of slicing operator

Syntax --> array_name[ start : stop : step ] or array_name[ from : to : jump]

step value is if mentioned then while slicing from the array the specified step value becomes the positional order in the array which will be skipped up from the start index address value.

** Where step is not an index address

In the above example a[ : : 2 ] means the start value and stop value are blank (which means the extraction of data will be from index address '0' and till the last index address 9)

But the data at each 2nd position (not the index address) in the array 'a' will not be displayed rather stepped up / jumped over.

In the above example a[1::2] means start value is 1 which is the index address 1 and the value stored at index address is 22

stop value is not mentioned (there is no number between the 2 semicolon) so till the last index address value will be read.

step value is 2 which means every 2nd position (alternate) data will be skipped from the start value which is 22 so the data skipped will be the data after 22 then the alternate and so on.

P.S. When some of the last elements need not be extracted and the first few elements to be extracted on a particular gap.

start value =index address = 1 so start data value = 2

stop value = index address = -3 so stop data value = 7 (at -4)

step value = 2 = every alternate data from the start data value

start from 2 leave next 4 leave next 6 stop data value

output = [ 2 4 6 ]

Q^ WAP in NumPy to extract the data stored at index address 3 and 6 only.

We can use a negative step value to obtain a reversed list:

Start at index 4 so start value = 5

End index = not given

step value = -1 means reverse from the start value

(towards the left ) So [ 5 4 3 2 1 ]

2. slice( )

The slice( ) function returns a slice object that can be used to slice an array.

Syntax  slice( start, stop, step)

start (optional) - Starting integer where the slicing of the object starts. Default to None if not provided

stop - Integer until which the slicing takes place. The slicing stops at index stop -1 (last element)

step (optional) - Integer value which determines the increment between each index for slicing. Default to None if not provided

In the above example when no argument value is passed in the slice( ).

Type Error

In the above example when only one argument value is passed in the slice( ) then this argument value is treated as stop value.

Which means the start value = first index address = 0

stop value = one less than the specified index address in the argument value = 7-1 = 6

step value = 1 which is the default jump value if not mentioned (start index address + step value)

So the slicing will start from index address 0 to index address 6 = A R M A G E D

0 1 2 3 4 5 6

In the above example

start value = index address = 7 (Data at index address 7 = D)

stop value = one less than the specified index address in the argument value = 10-1 = 9

step value = 1 which is the default jump value if not mentioned.

So the slicing will start from index address 7 to index address 9 = D O N

7 8 9

In the above examples when only one argument value is passed in the slice( ) then this argument value is treated as stop value.

start value = the last negative index address = -10 (start / jump value skipped)

stop value = one less than the specified index address in the argument value = -3 -1 = -4

step value = 1 which is the default jump value if not mentioned.

So the slicing will start from index address -10 to index address -4 = A R M A G E D

-10 -9 -8 -7 -6 -5 -4

In the above example

start value = the specified index address = -8

stop value = one less than the specified index address in the argument value = -4 -1 = -5

step value = 1 which is the default jump value if not mentioned.

So the slicing will start from index address -8 to index address - 5 = M A G E

-8 -7 -6 -5

In the above example

start value = the specified index address = -10

stop value = one less than the specified index address in the argument value = -5 -1 = -6

step value = 3 is the jump value ( -10 + 3 = -7 .... -7 + 3 = - 4 ...... - 4 + 3 = - 1 )

A	R	M	A	G	E	D	D	O	N
-10	-9	-8	-7	-6	-5	-4	-3	-2	-1
Start				Stop		x			x
Step to 3	Step to 1	Step to 2	Step to 3	Step to 1	Step to 2	Step to 3	Step to 1	Step to 2	Step to 3

So the slicing will start from index address -10

will stop at index address - 6 but will not jump to immediate next value rather to every third value in the array.

Slicing in 2-D Array

What will be the output for the given code?

import numpy

Arr1=numpy.arange(1,16,1).reshape(3,5)

print(Arr1)

print()

print(Arr1[2,0])

Ans --

Now if I want an entire Row to be sliced

What will be the range to extract the third row in the above array?

Arr1[x:y] ??

WAP to slice the data [12 13 14] from the Arr1-

Row Value = start from index 0 upto 1 (1 less than stop value )

Column Value=start from index 3 upto 4 (1 less than stop value)

Stay Safe! Stay Healthy!!!

Will keep you posted soon with the next blog!!

Blog Post 1 Unit 1 - Data Handling Intro to Python

Class 12 IP CBSE

AISSCE 24 Batch Revised Syllabus

Syllabus content in red font colour has been eliminated (But we will cover all these topics)

Data handling using Pandas – I

Topic 1 --
Introduction to Python libraries - Pandas, Matplotlib.
Data structures in Pandas - 1. Series and 2. Data Frames.
1. Series: Creation of Series from – ndarray, dictionary, scalar value;
mathematical operations; Series attributes
Head and Tail functions; Selection, Indexing and Slicing.

Topic 2 --
2. Data Frames: creation - from dictionary of Series, list of dictionaries, Text/CSV files, display; iteration; Operations on rows and columns: add(insert/append), select, delete(drop column and row), rename;

Head and Tail functions; Indexing using Labels, Boolean Indexing; Joining, Merging and Concatenation.
Importing/Exporting Data between CSV files and Data Frames.

Topic 3 --
Data handling using Pandas – II
Descriptive Statistics: max, min, count, sum, mean, median, mode, quartile, Standard deviation, variance.
DataFrame operations: Aggregation, group by,
Sorting, Deleting and Renaming Index, Pivoting.
Handling missing values – dropping and filling.
Importing/Exporting Data between MySQL database and Pandas.

Topic 4 --
Data Visualization
Purpose of plotting;
Drawing and saving following types of plots using Matplotlib –
1. line plot 2. bar graph 3. histogram 4. pie chart 5. frequency polygon 6. box plot and 7. scatter plot.
Customizing plots: color, style (dashed, dotted), width; adding label, title, and legend in plots.

Learning objectives of this blog -
* What is Python, what are the views of the developer Guido Van Rossum
* RAD projects and python
* What are libraries, Python libraries, their purpose
* Introduction to Pandas

Text highlighted in blue is to be written in the register.

Let us start with

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics.

·Interpreter – is a translator which converts the program/code line by line. You will/might have notice that when you do coding in Python, the error is highlighted immediately and before moving onto the next line you fix the error.

·Object-oriented programming – is a concept or a paradigm with the help of which we create instances of modules/classes(predefined in library packages of Python) in our program rather than calling/using them directly. When we make an instance of a module/class we are free to call/use all or some of its subclasses and built-in functions as per the need of our program.

· High-level programming – any programming which can be done with an easy set-up, independent of platform specification, friendlier to use ( writing, understanding, support and execution)

·Dynamic semantics – Semantics are tools which help a programmer to make her program user interactive. Dynamic semantics are the ways/features through which a programmer can make her program, maybe, to update the data automatically or save memory spaces. The tools are objects which are constructs which we create as an instance of the modules/classes to bind them with their properties and functions, variables assigned with multiple values, variable declaration is initiated only during run-time, in the program

Python’s high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.

Data structures – are containers which hold data in particular patterns (Some linear, non-linear, heterogenous, homogenous, tree like, etc. ) to establish relationship on these data, to perform certain operations on these data in order to obtain a desired result.

Dynamic typing – A variable is not declared in the Python program (independent of

/* Coding in C language to find the sum of two given numbers */

#include <stdio.h>

int main( )

{

int num1=2, num2=5, sum;

sum=num1+num2;

printf("%i", sum);

return 0;

}

/* Coding in Python to find the sum of two given numbers */

num1=2; num2=5

sum=num1+num2

print(sum)

Dynamic binding - binding means using objects and the functions together (as objects are instances of modules/classes)

Rapid Application Development -

Steps in Rapid Application Development

1.Define the requirements

2.Prototype

3.Receive Feedback

4.Finalize Software

1. Define the Requirements

At the very beginning, rapid application development sets itself apart from traditional software development models. It doesn’t require you to sit with end users and get a detailed list of specifications; instead, it asks for a broad requirement.

2. Prototype

This is where the actual development takes place. Instead of following a strict set of requirements, developers create prototypes with different features and functions as fast as they can. These prototypes are then shown to the clients who decide what they like and what they don’t.

3. Receive Feedback

In this stage, feedback on what’s good, what’s not, what works, and what doesn’t is shared. Feedback isn’t limited to just pure functionality, but also visuals and interfaces.

4. Finalize Software

Here, features, functions, aesthetics, and interface of the software are finalized with the client. Stability, usability, and maintainability are of paramount importance before delivering to the client.

Scripting Language - A script or scripting language is a computer language with a series of commands within a file that is capable of being executed without being compiled but interpreted. It brings new functions to applications and glue complex system together.

Glue Language - the extension ("glue") modules are required because Python cannot call C/C++ functions directly; the glue extensions handle conversion between Python data types and C/C++ data types and error checking, translation error return values into Python exception.

Q What is the purpose of this glue…?

To develop an application we may require combining the desirable qualities: like speed of C and Java (internally faster because uses compilers as translators) with ease of use of Python (highly-user friendly because of dynamic semantics but internally slower because of interpreter as translator). Turns out, executing C/Java code from Python is not that hard. So it became a practice to run fast C/Java code through Python. The "through Python" part is why it's called a "glue" language

Summary

Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

1. Pandas

Library in Computer Languages – Library is a collection of various packages which contain purpose-alike pre-defined modules/ classes/ subclasses and their built-in functions which a programmer may use in her code as per the task requirement. (Just like we have dictionaries in our spoken languages to refer with). Most of the programming languages have a standard library.

Python’s standard library is very extensive, offering a wide range of built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers.

Pandas

The name is derived from the term "Panel Data"

Example –

Stud_id	Age	Height	Class	Eyesight
A101	15	160	11	6/6
A101	14	158	10	6/6
A101	13	158	9	6/6
A102	13	162	9	2/3
A102	14	164	10	2/3
A102	15	164	11	2/3

In the above example a dataset with a panel structure is shown.

Individual characteristics ( age, height, eyesight) are collected for different students in their different classes. Here the two students (A101, A102) are observed in each class 9, 10 & 11.

The above is a Balanced Data Panel.

Stud_id	Age	Height	Class	Eyesight
A201	11	154	7	6/6
A201	9	148	5	6/6
A201	10	150	6	6/6
A202	9	146	5	6/6
A203	10	164	6	2/3
A202	8	164	4	2/3

In the above example a dataset with a panel structure is shown.

Where dataset of 3 students are observed but not in a balanced way.

Three students are observed but in different period of time and at that too the period of time is not the same for all the three.

This is an example of an Unbalanced Data Panel.

2. Series

We can build, edit and compare data in pandas through –

1. Series

2. DataFrames

Series is a data structure of Pandas which is used to create one dimensional(1-D) homogeneous array, size is immutable but values are mutable.

Series: Creation of Series from – ndarray(numpy array), dictionary, scalar value,

Creation of Series from – 1. ndarray(numpy array),

NumPy (Numerical Python)

NumPy-

Data which need to be calculated and manipulated are first stored in the simplest form called Array and then operations are performed on it to get the desired result.

Let’s create an ndarray (Numpy Array) !!

Open the site of Jupyter.org

Scroll Down the page to The Jupyter Notebook section

and click the orange button with caption – “Try it in your browser”

Now File--> New Notebook --> Python 3

1-D array in Numpy can be created in 2 ways- i) through numpy.array(obj)

ii) through numpy.fromstring(objs)

i. with array(object_name)- is a method / function of numpy module which converts the specified object in the argument to an ndarray.

These object_name can be any valid data structure which holds data in it, like a list, dictionary, tuple etc.

Step 1 -- > List

Can consist of elements belonging to different data types
No need to explicitly import a module for declaration
Cannot directly handle arithmetic operations

MyList = [ 1 , 2 , 3 , 4 , 5 ]

print("Check the list :", MyList)

Step 2 -- > Convert the list into ndarray.

Array

Only consists of elements belonging to the same data type

Need to explicitly import a module for declaration

Can directly handle arithmetic operations

** Since array( ) belongs to the numpy package so numpy should be imported in the program.

import statement has four parts –

2. Module name – here 'numpy'

3. 'as' keyword

4. object-name – instance name of the module created by user so

that user can use the submodules / classes / built-in functions of that particular module.

import numpy

MyList= [1, 2, 3, 4, 5]

print(numpy.array(MyList)) OR arr1=numpy.array(MyList) print(arr1)

Or import the NumPy library -

import numpy as <object_name> #object name is user-defined name and should

abide by the identifiers naming rules.

Eg -- > import numpy as np

MyList= [1, 2, 3, 4, 5]

print(np.array(MyList))

** why do we need to convert a list to an array !! why can’t we directly use a list instead!!

A list accepts the data value as string be it numbers, alphabets, characters so in case of number values the mathematical operations will not be possible. So we need to convert a list into an array.

import numpy

MyList = [1, 2, 3, 4, 5]

arr1=numpy.array(MyList)

print("The ndarray from the list object is :", arr1, "\n")

A Pandas Series is a labeled (indexed) array that holds data.

Series(data [, index]) - is the construct( )/method/ function of the library Pandas (So always remember to import pandas to use this method).

This method converts the data (ndarray/scalar list/dictionary) specified in its arguments into a series. index allows to rearrange/ assign a new data label to the the element / items of the series.

import pandas

series1=pandas.Series(arr1)

print("The Series from ndarray is :")

print(series1) OR series1

The Series will be created now. Notice the data structure appearance.

O/P -->

The ndarray from the list object is  : [1 2 3 4 5] 

The Series from ndarray is :
0    1
1    2
2    3
3    4
4    5
dtype: int64

The ndarray is created, now create a series from this ndarray!!

import numpy

import pandas

list1=['Naman','Abhishek','Prakhar']

print("Check the list :",list1)

arr1=numpy.array(list1)

print()

print("Check the ndarray created from list :",arr1)

series1=pandas.Series(arr1)

print()

print("Check the Series created from ndarray :")

print(series1)

O/P -->

Check the list : ['Naman', 'Abhishek', 'Prakhar']

Check the ndarray created from list : ['Naman' 'Abhishek' 'Prakhar']

Check the Series created from ndarray :
0       Naman
1    Abhishek
2     Prakhar
dtype: object

0, 1 and 2 are the data labels assigned by the pandas for identifying each  element uniquely.

(Can it be called index address too??)

Indexes
are of two types: positional index and labelled index. Positional index takes an integer value that corresponds to its position in the series starting from 0, whereas labelled index takes any user-defined label as index.

***  can we change ? What if we Change?? Why to change??

dtype - data type of the series (by default the data type of a series is float) but depending on the type of data/element/value it changes

**  How to find the dtype??? Can we change ???

Series is a 1-D array which appears in vertical manner.

Creation of a Series from  2. Scalar Value

A scalar value is a value of  one single data type.

for example if one element of the series is integer type then all the other element needs to  be integer type.

Ex -

import pandas

scalarvalue=[100, 200, 300, 400, 500]

series2=pandas.Series(scalarvalue)

print("The Series from the Scalar Value is :")

print(series2)

O/P-->

The Series from the Scalar Value is :
0    100
1    200
2    300
3    400
4    500
dtype: int64

ii. with fromstring(string_data, [ dtype,] sep) - this method / function is used to create an array from a string data.

dtype - is the keyword used to define the data type of the array; and the default data type is float.

sep - is the separator keyword which separates numbers in the string;

values assigned to separator can be a comma, a period, a blank quote.

Eg -- > import numpy as np

print(np.fromstring('1234'))

Observe the output in each different arguments.

In the code In[21] :

when fromstring( ) is used without the second argument which is 'sep' (separator)

then the output is ValueError which means the size of the data passed as an argument is lesser to the required data length.

Imagine if we tried to put a Great Dane (dog) into a Chihuahua’s kennel. This would be a problem with the value of the dog, because although they are both of type ‘dog’, a Chihuahua’s kennel would not be able to accept a dog the size of a Great Dane.

So here, the string size is lesser than to be specified.

In the code In[20] :

When the second argument of the method fromstring( ) is ‘sep’ keyword with the value ‘,’ (Comma) then the output is like the string ends with a decimal point within the array.

In the code In[3] :

When the second argument of the method fromstring( ) is ‘sep’ keyword with the value ‘ ’ (blank space) then the output is like the string elements are actually separated with blank spaces within the array.

In the code In[19] :

When the second argument of the method fromstring( ) is ‘sep’ keyword with the value ‘.' (dot) then the output is like the string ends with a decimal point within the array.

array(object_name)- is a method / function of numpy module which converts the specified object in the argument to an ndarray.

These object_name can be any valid data structure which holds data in it, like a list, dictionary, tuple etc.

Case 1:

** in the above example 3 lists are converted into one array of 3 rows and 5 columns by the

list 1=[1,2,3,4]

list2=[11,12,13,]

** in the above example the 3 lists are converted into a nested list and not in array.

ii. empty( [rows,columns], dtype=data_type) - is a method / function of numpy module which creates an array with random values.

[rows, columns] - to specify the total number of rows and columns of the array

dtype - is used to specify which type of data is to be generated; by default the data type is float.

Example -- numpy.empty( [ 3, 2 ], dtype=int )

** In the above program the empty( ) has generated an array with random values in a matrix of 3x2 where the random values are shown as integer value. Kindly remember that these random values will be different each time when the prohram is executed.

** the output is of system generated random default numbers of type float (long exponential type numbers )

iii. numpy.zeroes( rows, columns , dtype=data_type) - this method/ function is used to create an array of specified rows and columns with the data type specified.

[rows, columns] - to specify the total number of rows and columns of the array

dtype - is used to specify which type of data is to be generated; by default the data type is float.

Example --

In the above example 5 columns and 1 row has been generated for the 2-D array all with the value '0' and of type integer (which means without the decimal dot.)

In the above example 3 columns and 2 rows have been generated for the 2-D array all with the value '0' and of type float(which means each zero value is suffixed with the decimal dot.)

Stay Safe Stay Healthy!!!