3. Python
Perhaps the most used, or de-facto, programming languages for data analysis are Python, R, and Matlab. Of these, Python and R are free and supported by most cloud services. Python has been chosen by many companies as the programming language for data-based projects due to its versatility and ease of use.
The good sides of Python are considered to be:
- simple and easy-to-read syntax
- comprehensive data structures
- many extension libraries (including for machine learning) that are constantly evolving
- versatile
- quick to create "prototypes"
NOTE! This section goes through the Python syntax needed for this course module, so this is not a Python Programming Fundamentals material.
Python Versions
Currently, Python is at version 3.11. A major change in syntax occurred in 2008 when it transitioned to version 3.0. Code written in 2.x versions generally does not work in 3.x versions, the most visible change is probably in the print function:
print 'Hello World' # 2.x versions
print('Hello World') # 3.x versions
The official Python tutorial
You can find out the version of Python in Jupyter Notebook as follows:
from platform import python_version
print(python_version())
3.7.13
The version of Python in the VLE's Jupyter Hub is currently 3.9.13.
Python Syntax
Python differs from many other programming languages in its syntax:
- Commands do not end with semicolons; instead, commands are separated by line breaks
- Program blocks are not delimited by curly braces, but by indentations at the beginning of lines
- Indent with spaces, not tabs
- Comments are written after the #-symbol
- There is no separate syntax for multi-line comments
# Example code, NOTE this is a comment line because it starts with a #-symbol
def summa(a, b): # calculates the sum of numbers
summa = a + b
return summa
for i in range(1, 5): # the range function creates a list from the desired number range
print(summa(i, 2*i))
print("Valmis")
3
6
9
12
Valmis
Variables
Variables do not need to be explicitly defined as in many other programming languages. Variables also do not need to have their type defined; it is automatically determined when a value is assigned to the variable (dynamic typing).
However, you may need to convert the type of variables using the following functions.
- int(x)
- float(x)
- str(x)
You can find out the type of a variable with the type() function
# Example: Variable Types
# integers
x = 5
y = -35656222554887711
print(type(x))
print(type(y))
# floating-point numbers
x = 1.10
y = -4e45 # scientific notation
print(type(x))
print(type(y))
# boolean values
x = True
y = False
print(type(x))
print(type(y))
# strings
x = 'Cyber threat information'
y = "Data analytics"
print(type(x))
print(type(y))
# type conversions
x = 1
y = str(x)
print(type(x))
print(type(y))
<class 'int'>
<class 'int'>
<class 'float'>
<class 'float'>
<class 'bool'>
<class 'bool'>
<class 'str'>
<class 'str'>
<class 'int'>
<class 'str'>
Strings
Strings can be represented
- Using single quotes (')
- Using double quotes (")
- Using three single or double quotes (''' or """), this can include a string that spans multiple lines
- Special characters inside strings (',", ) are marked with a \, for example \' or \" or \
- or the ' character inside a string enclosed by " or vice versa
- Strings are concatenated using the + operator. Strings can also be repeated with the * operator
- Variables can also be combined with a string using the str.format function or with C-like %-syntax (examples later)
# Example: Strings and their concatenation in output
a = 5
b = ' wrestlers went under '
c = 'the scale '
print(3*(str(a) + b + c)) # note that the integer a must be converted
# to a string so that it can be concatenated with strings
5 wrestlers went under the scale 5 wrestlers went under the scale 5 wrestlers went under the scale
Characters from strings can be extracted using indices, the index of the first character is zero
NOTE! Strings cannot be modified, so for example a[2] = 'b' will not work.
string = "Hello, World!"
print(string[1]) # returns the second character
print(string[2:5]) # returns a string of 3 characters, with characters from positions 2, 3, and 4. So position 5 is not included.
print(string[:5]) # the same as [0:5]
print(string[5:]) # the same as [5:end]
print(string[-3:-1]) # negative index is read from the end, the last character is -1 (not included here)
print(string[:-3]) # the last three characters are omitted
print(string[1:-3])
e
llo
Hello
, World!
ld
Hello, Wor
ello, Wor
Strings also have various functions, some of which are presented next.
# Example: Changing a string to UPPERCASE with the upper() function
mjono1 = 'data analytics'
print(mjono1)
# The upper method does not change the string
mjono1.upper()
print(mjono1)
# but returns a new string on which it has performed the operations
mjono2 = mjono1.upper()
print(mjono2)
data analytics
data analytics
DATA ANALYTICS
# Example: Searching for a space character and reversing the name
a = 'Helen Maroulis'
print(a)
n = a.find(' ') # Searching for the position of the space character
firstname = a[:n]
lastname = a[n+1:]
print(lastname + ' ' + firstname)
# another way
a = 'Helen Maroulis'
parts = a.split(' ')
print(parts[1]+" "+parts[0])
Helen Maroulis
Maroulis Helen
Maroulis Helen
Operators
The following is a list of some of Python's most important operators:
- = assignment
- + addition
- - subtraction
- * multiplication
- ** exponentiation
- / division
- // "floor division"
- % modulo
- == comparison
- <
- >
- <=
- >=
- !=
- not
- and
- or
Shorter assignment operators can also be used
- x += 3 means x = x + 3
- however, x++ does not work
Reading data from the user
User input can be read with the input command. For example, the command
celsius = input("Enter temperature: ")
first prints the text "Enter temperature: " on the screen, then reads the text entered by the user and assigns it to the variable celsius. The text is always stored as a string, so for calculations, it must be converted to a number type using the functions int()
or float()
.
celsius = input("Enter temperature: ")
print("The temperature",celsius,"Celsius degrees is in Kelvin ",float(celsius)+273.15)
Enter temperature: 22.2
The temperature 22.2 Celsius degrees is in Kelvin 295.34999999999997
Data Structures
Data structures are extremely important when dealing with data and its analysis. Let's go through the built-in data structures in Python, although later in the course we will mostly use more versatile structures found in different libraries, such as NumPy array, Pandas Series, and especially Pandas DataFrame.
List
Python has a few different types of (built-in) data structures, the most common of which is the list (List)
A list is defined inside square brackets and items are separated by commas. The elements of a list can be of different data types or even other lists. In the example below [False,True] is a list within a list.
lista1 = [1, 2.2, 'cat', 3, [False, True]]
Elements of the list are referenced by indices just like in strings
print(lista1[2:3])
Lists are dynamic, meaning they can be modified after creation. Methods for handling lists
# Example: Lists
lista1 = [1, 2.2, 'dog', 3, [False, True]]
print("Printing the list: \n", lista1)
print("Printing an element from the list lista1[1]: \n", lista1[1])
print("Printing an element from the inner list lista1[-1][0]: \n", lista1[-1][0])
# Changing the value of a list element
print("\nChanging the value of a list element")
lista1[1] = 2022
print(lista1)
# adding one element in between
print("\nAdding one element in between with the insert() function")
lista1.insert(3, 'cat')
print(lista1)
#removing the second last element
print("\nRemoving the second last element")
del lista1[-2]
print(lista1)
print('\n')
# iterating through the list
for i in lista1:
print(str(i) + " of type " + str(type(i)))
Printing the list:
[1, 2.2, 'dog', 3, [False, True]]
Printing an element from the list lista1[1]:
2.2
Printing an element from the inner list lista1[-1][0]:
False
Changing the value of a list element
[1, 2022, 'dog', 3, [False, True]]
Adding one element in between with the insert() function
[1, 2022, 'dog', 'cat', 3, [False, True]]
Removing the second last element
[1, 2022, 'dog', 'cat', [False, True]]
1 of type <class 'int'>
2022 of type <class 'int'>
dog of type <class 'str'>
cat of type <class 'str'>
[False, True] of type <class 'list'>
# LIST FUNCTIONS
# Create an empty list:
prime_numbers = []
# Create a new list and add elements to it:
prime_numbers = [2, 3, 5, 7, 11, 13, 17]
# Create a new list with a certain number of elements:
prime_numbers = [5] * 6
# Add a new element to the end of the list:
prime_numbers.append(19)
# Add a new element at a specific position:
prime_numbers.insert(2, 23)
# Remove an element from the list:
prime_numbers.remove(5)
# Is the element in the list:
if 19 in prime_numbers:
print("Yes")
# Number of repeating elements in the list:
count = prime_numbers.count(5)
# Find the position or index of an element in the list:
index = prime_numbers.index(23)
# Sort the list:
prime_numbers.sort()
# Reverse the order of the list:
prime_numbers.reverse()
Total number of elements:
elements = len(prime_numbers)
Modifying an element:
prime_numbers[5] = 29
Retrieving a specific element:
number = prime_numbers[4]
Repeating a list
list1 = [1, 2, 3] list2 = list1 * 2 print(list2)
Combining lists
list1 = [1, 2, 3] list2 = [4, 5, 6] list3 = list1 + list2 list3 = list3 + [7] print(list3)
The split function of a string returns a list of strings
equation = "2+15+3+8" parts = equation.split("+") print(parts)
Here list2 refers to the same object as list1
list1 = [1, 2, 3] list2 = list1 list2[0] = 0 print(list1)
Here, on the other hand, a copy of list1 is made, so modifying list2 does not affect list1
list1 = [1, 2, 3] list2 = list1[:] list2[0] = 0 print(list1)
Yes
[1, 2, 3, 1, 2, 3]
[1, 2, 3, 4, 5, 6, 7]
['2', '15', '3', '8']
[0, 2, 3]
[1, 2, 3]
## Tuple
Tuples are structurally similar to lists, but there is one significant difference: __they are immutable, meaning tuples do not have methods or any other ways to manipulate them__.
A tuple is defined within __regular parentheses__ while a list is defined within __square brackets__.
Tuples are often used when it is desired to ensure that a list-like parameter given to a function will not change at any point.
## Dictionary
A dictionary (often also called an associative array) is an unordered array where elements are referred to by unique keys. A dictionary contains two main types (key-value pair): keys and values. In the case of a dictionary,
the key must be unique, meaning a dictionary cannot contain two of the same key.
Defining a dictionary
So the key-value pairs in the example above are:
- card and Visa
- limit and 2000
- name and Maroulis Helen
A dictionary element is referenced by giving the element's key instead of an index
There are several options for iteration:
* *for x in dictionary1* iterate through keys, as well as *for x in dictionary1.keys()*
* *for x in dictionary1.values():* iterate through values
* *for x,y in dictionary1.items():* iterate through both keys and values
A dictionary can also be created using a slightly different syntax
```python
#Example: Dictionary
dictionary1 = {'card' : 'Visa', 'limit' : 2000, 'name' : 'Maroulis Helen'}
print(dictionary1['limit'])
print('------------\n')
for x in dictionary1:
print(x)
print('------------\n')
for x in dictionary1.keys():
print(x)
print('------------\n')
for x in dictionary1:
print(dictionary1[x])
print('------------\n')
for x in dictionary1.values():
print(x)
print('------------\n')
for x,y in dictionary1.items():
print(x + " : " + str(y))
print('------------\n')
dictionary1['expires'] = '11/19'
for x,y in dictionary1.items():
print(x + " : " + str(y))
2000
------------
card
limit
name
------------
card
limit
name
------------
Visa
2000
Maroulis Helen
------------
Visa
2000
Maroulis Helen
------------
card : Visa
limit : 2000
name : Maroulis Helen
------------
card : Visa
limit : 2000
name : Maroulis Helen
expires : 11/19
Conditional and Loop Structures
Often in programs, we need to repeat things several times in a row, or the program can branch into different paths depending on, for example, the value of a variable. In such cases, loops or iteration structures are used, either while or for statements depending on the implementation, or conditional statements such as if-else or if-elif-else structure.
if...elif...else
The basis of the if-else structure is a logical statement, where if the statement is true, i.e., True, the code associated with the if statement is executed. If the condition is false, i.e., False, the code associated with the else structure is executed. If the else structure is missing, the execution continues forward at the same level. The structure also defines elif (else if) sections, which allow chaining several if statements in a row. This enables testing multiple options in the same if statement. There can be an arbitrary number of elif statements in an if statement, but elif and else statements cannot exist without an if statement.
#Example: Conditional Structure
a = 0
if a > 0:
print('a is greater than 0')
elif a < 0:
print('a is less than 0')
else:
print('a is equal to 0')
a is equal to 0
In conditional statements, use the and/or/not operators
#Example: Conditional structure and operators
a = 5
b = 'dog'
if a > 0 and b == 'dog':
print('a is greater than 0 and b is a dog')
# parentheses clarify
if (a > 0) and (b == 'dog'):
print('a is greater than 0 and b is a dog')
a is greater than 0 and b is a dog
a is greater than 0 and b is a dog
if...elif...else structures can also be written on one line, compare to other languages (a>b?a:b)
#Example: Conditional structure
a = 3
b = 5
if a > b: print("a is greater than b")
print("A") if a > b else print("B")
print("A") if a > b else print("=") if a == b else print("B")
# parentheses clarify
(print("A") if a > b else (print("=") if a == b else print("B")))
# shorter way
print(("A" if a > b else ("=" if a == b else "B")))
B
B
B
B
WHILE loop structure
The while statement is executed as long as the condition is true or True. The structure of the while statement is very free-form, and the interpreter does not track its progress in any way. For this reason, care should be taken to avoid creating infinite loops.
# Example: WHILE loop structure
i = 1
while i < 6:
print(i)
i += 1
print('------------\n')
i = 1
while i < 6:
print(i)
if i == 3:
break
i += 1
print('------------\n')
i = 0
while i < 6:
i += 1
if i == 3:
continue
print(i)
1
2
3
4
5
------------
1
2
3
------------
1
2
4
5
6
For loop structure
The for structure is another way in Python to implement a loop. The differences in the structure of the for statement compared to the while structure are:
- The number of iterations in the loop is presented as a fixed value
- The for statement can be used to iterate through lists, i.e., with multi-element structures
The range function is often used in for loops as well:
- range(6) gives the numbers 0, 1, 2, 3, 4, 5 (but not the number 6)
- range(2,6) gives 2,3,4,5 (but not 6)
- range(2,50,10) gives 2, 12, 22, 32, 42, i.e., every 10th number
- range(6,1,-1) gives 6,5,4,3,2
# Example: For loop structure
for i in range(6):
print(i)
print('------------\n')
for i in range(2,6):
print(i)
print('------------\n')
for i in range(2,50,10):
print(i)
print('------------\n')
for i in range(6,1,-1):
print(i)
0
1
2
3
4
5
------------
2
3
4
5
------------
2
12
22
32
42
------------
6
5
4
3
2
Functions
A function is a piece of code that is executed only when it is called. You have used several predefined functions such as range(), append(), insert(), etc.
In Python, own functions can be defined with the def keyword. This kind of programming is called modular programming and it significantly clarifies the code. Previously made functions can be reused later.
#Example: Defining and calling a function
def my_function():
print("Hello from a function")
# calling the function
my_function()
Hello from a function
Parameters in a function
Values can be passed to functions with parameters, and they can also be given default values.
#Example: Defining and calling a function
def hello_function(name):
print("Hello " + name)
hello_function("Helen Maroulis")
Hello Helen Maroulis
hello_function("Yui Susaki")
Hello Yui Susaki
# Example: A function that adds together 3 numbers given as parameters
def print_sum(a, b, c):
print("The sum is "+ str(a + b + c))
print_sum(3, 5, 8)
The sum is 16
# Example: A function that calculates the power of the sum of two added numbers
# NOTE! c=2 is a constant that is used if no value is given for c when calling
def print_power_of_sum(a, b, c = 2):
print((a + b)**c)
print_power_of_sum(3, 5)
print_power_of_sum(3, 5, 3)
def print_power_of_sum2(a, b = 1, c = 2):
print((a + b)**c)
print_power_of_sum2(3)
print_power_of_sum2(3, c = 3)
print_power_of_sum2(3, b = 3, c = 4)
64
512
16
64
1296
Namespaces
Functions do not see variables defined outside of them. If a variable x is created inside a function, then this variable is only available within the function in which it was created. This means that there is no connection between variables across different functions.
However, for example, passing a list as a parameter to a function passes a reference to the original list, which means that modifying it within the function also affects it outside the function. A copy of the list can be obtained with the syntax lista2 = lista1[:]
# Example: Example of namespace
def function(x):
print("x is upon entering the function", x)
x = 2
print("x changed inside the function to", x)
x = 50
print("x before the function", x)
function(x)
print("x after calling the function is still", x)
x before the function 50
x is upon entering the function 50
x changed inside the function to 2
x after calling the function is still 50
Return Value
A function can return values with the return statement.
There can also be multiple return values.
# Example: Function return value
def maximum(x, y):
if x > y:
return x
else:
return y
number1 = 100
number2 = 50
larger = maximum(number1, number2)
print("The larger value is", larger)
The larger value is 100
# Example: Function return value
def largest_and_smallest(list):
list.sort()
return list[0], list[-1]
mylist = [4, 5, 8, 0, -3, 11]
smallest, largest = largest_and_smallest(mylist)
print(str(smallest) + " " + str(largest))
print(mylist)
-3 11
[-3, 0, 4, 5, 8, 11]
From the lowest output, we notice that the function has changed the list defined in the main program (with the sort function). Passing numbers and strings (int, float, string) as a parameter to a function gives the function only a copy of the variable (call-by-value). But passing a list as a parameter passes a reference to the original list (call-by-reference).