11. Timestamps and Time Series

Time Series

In data analysis, the development of variables over time is often examined, which means datasets contain information about the time. The data can be at fixed time intervals, e.g., "every 15 seconds" or "once a month", or at irregular intervals.

Time information can be in different formats in the data:

timestamps, e.g., 2019-04-05 11:23
periods, e.g., the year 2019, January 2019
a period can also be indicated by two timestamps start - end
elapsed time, e.g., seconds since the start of an experiment

Pandas has many tools and algorithms for processing time series, for example, aggregating at desired time intervals is easy.

datetime and time data types in basic Python

Python's datetime module has data types for date and time information:

date (year, month, and day)
time (time, hours, minutes, seconds, and microseconds)
datetime (date and time)
timedelta (the time between two datetime values, days, seconds, and microseconds)
tzinfo (time zone)

from datetime import datetime
now = datetime.now()
print(now)


sometime = datetime(2019,5,6,11,23,45)
print(sometime)


another_time = datetime(2019,8,31)
print(another_time)

2022-11-23 15:03:10.621445
2019-05-06 11:23:45
2019-08-31 00:00:00

from datetime import date


today_date = date.today()
print(today_date)

2022-11-23

from datetime import timedelta


difference = sometime-now
print(difference)
print(type(difference))
print(difference.days)
print(difference.seconds)
print(now)
print(now + timedelta(10))
print(now + timedelta(1,10))

-1298 days, 20:20:34.378555
<class 'datetime.timedelta'>
-1298
73234
2022-11-23 15:03:10.621445
2022-12-03 15:03:10.621445
2022-11-24 15:03:20.621445

You cannot add datetimes together, but you can add a timedelta to a datetime.

datetime <-> string

A datetime object and the later introduced pandas Timestamp object can be printed as a string using the str and strftime methods:

now = datetime.now()


print(now)
print(str(now))


print(now.strftime('%d.%m.%Y'))
print(now.strftime('%m.%d.%Y'))
print(now.strftime('%c'))

2022-11-23 15:03:10.664621
2022-11-23 15:03:10.664621
23.11.2022
11.23.2022
Wed Nov 23 15:03:10 2022

strftime formatting codes:

%Y Four-digit year
%y Two-digit year
%m Two-digit month [01, 12]
%d Two-digit day [01, 31]
%H Hour (24-hour clock) [00, 23]
%I Hour (12-hour clock) [01, 12]
%M Two-digit minute [00, 59]
%S Second [00, 61] (seconds 60, 61 account for leap seconds)
%w Weekday as integer [0 (Sunday), 6]
%U Week number of the year [00, 53]; Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”
%W Week number of the year [00, 53]; Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”
%z UTC time zone offset as +HHMM or -HHMM; empty if time zone naive
%F Shortcut for %Y-%m-%d (e.g., 2012-4-18)
%D Shortcut for %m/%d/%y (e.g., 04/18/12)
%a Abbreviated weekday name
%A Full weekday name
%b Abbreviated month name
%B Full month name
%c Full date and time (e.g., ‘Tue 01 May 2012 04:20:57 PM’)
%p Locale equivalent of AM or PM
%x Locale-appropriate formatted date (e.g., in the United States, May 1, 2012 yields ’05/01/2012’)
%X Locale-appropriate time (e.g., ’04:24:12 PM’)

Using the same formatting codes, a string can be interpreted as a datetime object with the datetime.strptime method:

string = '22.11.2022'
time = datetime.strptime(string, '%d.%m.%Y')
print(time)

2022-11-22 00:00:00

To avoid writing formatting codes, you can use the parser.parse method from the dateutil library, which parses most common date/time formats:

from dateutil.parser import parse


then = parse('22.11.2022 14:15')
print(then)


at_that_time = parse('2.12.22')
print(at_that_time) # incorrect if 2 is the day. In England, the month often comes first


at_that_time = parse('2.12.22',  dayfirst=True)
print(at_that_time)

2022-11-22 14:15:00
2022-02-12 00:00:00
2022-12-02 00:00:00

pandas.to_datetime

The to_datetime method from pandas similarly parses recognizable strings into pandas Timestamp objects.

import pandas as pd


tt = pd.to_datetime('1.4.34', dayfirst=True)
print(tt)
print(type(tt))

2034-04-01 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

When the parameter is of list type, pd.to_datetime returns a DateTimeIndex object, from a Series it returns a Series, etc.

times = ['12:23', '13:23', '23:34']
print(pd.to_datetime(times))

DatetimeIndex(['2022-11-23 12:23:00', '2022-11-23 13:23:00', '2022-11-23 23:34:00'], dtype='datetime64[ns]', freq=None)

timesS = pd.Series(['12:23', '13:23', '23:34'])
print(pd.to_datetime(timesS))

0   2022-11-23 12:23:00
1   2022-11-23 13:23:00
2   2022-11-23 23:34:00
dtype: datetime64[ns]

You can also tell to_datetime the format used:

pd.to_datetime('1.4.34', format='%H.%M.%S')

Timestamp('1900-01-01 01:04:34')

pd.to_timedelta creates a time difference, which can be added to a timestamp, for example.

df = pd.read_csv('Datasets/time.txt')
print(df)

      broadcast  duration
0  2019-01-01      5
1  2019-01-02      7
2  2019-01-02     12
3  2019-01-03      8
4  2019-01-08      3
5  2019-01-11      4

df['broadcast']  = pd.to_datetime(df['broadcast']) # convert to timestamps
df['delivery'] = df['broadcast'] + pd.to_timedelta(df['duration'],'d')  # add days ('d') according to duration


print(df)

     broadcast  duration   delivery
0 2019-01-01      5 2019-01-06
1 2019-01-02      7 2019-01-09
2 2019-01-02     12 2019-01-14
3 2019-01-03      8 2019-01-11
4 2019-01-08      3 2019-01-11
5 2019-01-11      4 2019-01-15

Timestamps in the index

A commonly used representation for time series is one in which the index of the Series or DataFrame is timestamps (Timestamp).

from datetime import datetime
import numpy as np


dates = [datetime(2022, 1, 2), datetime(2022, 1, 5), datetime(2022, 1, 7), 
         datetime(2022, 4, 8), datetime(2023, 1, 10), datetime(2023, 1, 12)]


ts = pd.Series(np.random.randn(6), index=dates)


print(ts)

2022-01-02   -0.234078
2022-01-05    1.457706
2022-01-07   -0.193775
2022-04-08   -1.387376
2023-01-10    0.833841
2023-01-12   -0.517104
dtype: float64

Such can be indexed or sliced just like other Series or DataFrames.

Additionally, any string interpretable as a time can be given. Or, for example, just the year or year and month, etc.

In slicing, the given date/time does not need to appear in the time series.

print(ts[2])

```python
print('\n--------\n')
print(ts['1.2.2022'])


print('\n--------\n')
print(ts['2022'])


print('\n--------\n')
print(ts['2022/1'])


print('\n--------\n')
print(ts['2022/2':'2023'])


print('\n--------\n')
print(ts[:'20220110'])

-0.19377497567135307


--------


-0.23407829942228328


--------


2022-01-02   -0.234078
2022-01-05    1.457706
2022-01-07   -0.193775
2022-04-08   -1.387376
dtype: float64


--------


2022-01-02   -0.234078
2022-01-05    1.457706
2022-01-07   -0.193775
dtype: float64


--------


2022-04-08   -1.387376
2023-01-10    0.833841
2023-01-12   -0.517104
dtype: float64


--------


2022-01-02   -0.234078
2022-01-05    1.457706
2022-01-07   -0.193775
dtype: float64

parse_dates in read_csv

When the dataset is read with read_csv, the desired columns can be parsed as timestamps at this stage with the parse_dates parameter, which can be given:

boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

File:

date,value,value2 1.1.2022,15,-5 2.1.2022,14,-3 3.1.2022,11,1 4.1.2022,11,2 5.1.2022,20,1 6.1.2022,16,-2 7.1.2022,11,-1 8.1.2022,13,0

df = pd.read_csv('Datasets/times.txt')
df