Python Reference

Python logging JSON Formatter

Turn log messages into correct JSON format in Python using the builtin logging module:

import logging
import json


class JSONFormatter(logging.Formatter):
	def __init__(self):
		super().__init__()
	def format(self, record):
		record.msg = json.dumps(record.msg)
		return super().format(record)

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
loggingStreamHandler = logging.StreamHandler()
# loggingStreamHandler = logging.FileHandler("test.json",mode='a') #to save to file
loggingStreamHandler.setFormatter(JSONFormatter())
logger.addHandler(loggingStreamHandler)
logger.info({"data":123})

Source: https://everythingtech.dev/2021/03/python-logging-with-json-formatter/

logging inherit contextual information

# myapp.py
import logging
import mylib


class ContextFilter(logging.Filter):
    def __init__(self, filter_name, extra):
        super(ContextFilter, self).__init__(filter_name)
        self.connid = extra

    def filter(self, record):
        record.connid = self.connid
        return True


def main():
    logging.basicConfig(filename='myapp.log',level=logging.INFO,
                        format='%(levelname)s:%(name)s:[%(connid)s] %(message)s')
    logger = logging.getLogger('test')
    cf = ContextFilter(filter_name='add_conn_id', extra='123')
    logger.addFilter(cf)
    logger.info('Started')
    mylib.do_something()
    logger.info('Finished')

if __name__ == '__main__':
    main()

My log output now looks like this:

INFO:test:[123] Started
INFO:test:[123] Doing something
INFO:test:[123] Finished

Source: https://stackoverflow.com/questions/46895678/python-logging-inherit-contextual-information

Solving psutil error while running Jupyter Notebooks

If you run into the error ModuleNotFoundError: No module named 'psutil' while running Jupyter Notebooks, uninstall and reinstall psutil using pip.

Marshmallow - Object Serialization

Source: Marshmallow Documentation

Attrs Schema Validation Examples

Source:

Attrs - Dataclasses

Source: Attrs, Dataclasses and Pydantic

Templating SQL Queries with JinjaSQL

ImportError: cannot import name ‘Markup’ from ‘jinja2’

JSON/Dictionary Valdiation

Conda - unable to completely delete environment

Command-line options can only go so far, unless you get very specific; perhaps the simplest approach is to delete things manually:

  1. Locate Anaconda folder; I’ll use "D:\Anaconda\"
  2. In envs, delete environment of interest: "D:\Anaconda\envs\myenv". Are you done? Not quite; even while in myenv, conda will still sometimes install packages to the base environment, in "D:\Anaconda\pkgs\"; thus, to clean traces of myenv,
  3. Delete packages installed to myenv that ended up in "D:\Anaconda\pkgs\"
  4. (If above don’t suffice) Anaconda Navigator -> Environments -> myenv -> Remove
  5. (If above don’t suffice) Likely corrupted Anaconda; make note of installed packages, completely uninstall Anaconda, reinstall. Note: step 3 is redundant for the goal of simply removing myenv, but it’s recommended to minimize future package conflicts.

Source: https://stackoverflow.com/questions/58736579/conda-unable-to-completely-delete-environment

How do I prevent Conda from activating the base environment by default?

conda config --set auto_activate_base false

Source: https://stackoverflow.com/questions/54429210/how-do-i-prevent-conda-from-activating-the-base-environment-by-default?rq=1

How do I format a string using a dictionary in python-3.x?

geopoint = {'latitude':41.123,'longitude':71.091}
print('{latitude} {longitude}'.format(**geopoint))

Source: https://stackoverflow.com/questions/5952344/how-do-i-format-a-string-using-a-dictionary-in-python-3-x

Keep your SQL queries DRY with Jinja templating

JinjaSQL

Jinja Templating

Difference between “raise” and “raise e”?

There is no difference in this case. raise without arguments will always raise the last exception thrown (which is also accessible with sys.exc_info()).

The reason the bytecode is different is because Python is a dynamic language and the interpreter doesn’t really “know” that e refers to the (unmodified) exception that is currently being handled. But this may not always be the case, consider:

try:
    raise Exception()
except Exception as e:
    if foo():
        e = OtherException()
    raise e

What is e now? There is no way to tell when compiling the bytecode (only when actually running the program).

In simple examples like yours, it might be possible for the Python interpreter to “optimize” the bytecode, but so far no one has done this. And why should they? It’s a micro-optimization at best and may still break in subtle ways in obscure conditions. There is a lot of other fruit that is hanging a lot lower than this and is more nutritious to boot ;-)

Abstract base classes

Source: abc — Abstract Base Classes

Return libraries the current shell has imported

import sys

print(sys.modules)

Source: https://www.geeksforgeeks.org/python-sys-module/

raise JSONDecodeError(“Expecting value”, s, err.value) from None >JSONDecodeError: Expecting value

Use json.dumps() to convert to JSON-readable string and then read it back using json.loads()

Get a random value from range

random.randint(start: int, end: int)

Randomly choose ‘n’ items from sequence

random.sample(seq, n)

Analyzing Python Code with Python

How to get current date and time in Python?

Get today’s date

from datetime import date

today = date.today()
print("Today's date:", today)

Current date in different formats

from datetime import date

today = date.today()

# dd/mm/YY
d1 = today.strftime("%d/%m/%Y")
print("d1 =", d1)

# Textual month, day and year	
d2 = today.strftime("%B %d, %Y")
print("d2 =", d2)

# mm/dd/y
d3 = today.strftime("%m/%d/%y")
print("d3 =", d3)

# Month abbreviation, day and year	
d4 = today.strftime("%b-%d-%Y")
print("d4 =", d4)

Output:

d1 = 16/09/2019
d2 = September 16, 2019
d3 = 09/16/19
d4 = Sep-16-2019

Get the current date and time

from datetime import datetime

# datetime object containing current date and time
now = datetime.now()
 
print("now =", now)

# dd/mm/YY H:M:S
dt_string = now.strftime("%d/%m/%Y %H:%M:%S")
print("date and time =", dt_string)	

output:

now = 2021-06-25 07:58:56.550604
date and time = 25/06/2021 07:58:56

Source: How to get current date and time in Python?

Fugue and DuckDB: Fast SQL Code in Python

List all available modules

print(help('modules'))

Can also try for older versions of pip:

import pip
installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
     for i in installed_packages])
print(installed_packages_list)

Source:

Convert string dictionary to dictionary

Using json.loads()

import json

string_dict = '{"a": "apple", "z": "zebra"}'
d = json.loads(string_dict)

print(type(d))

Using ast.literal_eval()

import ast

string_dict = '{"a": "apple", "z": "zebra"}'
d = ast.literal_eval(string_dict)

print(type(d))
Source: [Python Convert string dictionary to dictionary](https://www.geeksforgeeks.org/python-convert-string-dictionary-to-dictionary/)

Calculate Size of all Installed Packages

import os
import pkg_resources

def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size



dists = [d for d in pkg_resources.working_set]

for dist in dists:
    try:
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        if size/1000 > 1.0:
            print (f"{dist}: {size/1000} KB")
            print("-"*40)
    except OSError:
        '{} no longer exists'.format(dist.project_name)

Pandas - Drop Duplicates

df.drop_duplicates()

# drop all duplicates except the last occurrence
df.drop_duplicates(keep="last")

# drop all duplicates
df.drop_duplicates(keep=False)

# reset index
df.drop_duplicates(keep=False, ignore_index=True)

# drop duplicate rows based on column
df.drop_duplicates(subset=["Name"])

Source: https://theprogrammingexpert.com/drop-duplicates-pandas/

Pandas - “ValueError: columns overlap but no suffix specified: Index([], dtype=’object’)"

df1.join(df2, how = 'left', lsuffix = '_left', rsuffix = '_right')

# alternative solution: Use the merge method. It will give priority to the columns of the DataFrame that you provided in the ‘how’ argument.
df1.merge(df2, how = 'left')

Source: https://www.roelpeters.be/solve-pandas-columns-overlap-but-no-suffix-specified/

Pandas - append dataframes

combined = pd.concat([df1, df2, df3], ignore_index=True)

Source: https://www.statology.org/pandas-append-two-dataframes/

Abstact Data Loader

Create and use abstract data loader class for code to not know environment differences.

Abstract Base Class

Source: https://docs.python.org/3/library/abc.html

PyPDF - Python Library for working with PDFs

Source: https://pypi.org/project/PyPDF2/

Round to 2 decimals using fstring

number = 3.1415926
print(f"The number rounded to two decimal places is {number:.2f}")

Source: https://stackoverflow.com/questions/20457038/how-to-round-to-2-decimals-with-python

Pretty Print a Dictionary in Python

import pprint

dct_arr = [
  {'Name': 'John', 'Age': '23', 'Country': 'USA'},
  {'Name': 'Jose', 'Age': '44', 'Country': 'Spain'},
  {'Name': 'Anne', 'Age': '29', 'Country': 'UK'},
  {'Name': 'Lee', 'Age': '35', 'Country': 'Japan'}
]

pprint.pprint(dct_arr)
import json

dct_arr = [
  {'Name': 'John', 'Age': '23', 'Country': 'USA'},
  {'Name': 'Jose', 'Age': '44', 'Country': 'Spain'},
  {'Name': 'Anne', 'Age': '29', 'Country': 'UK'},
  {'Name': 'Lee', 'Age': '35', 'Country': 'Japan'}
]

print(json.dumps(dct_arr, sort_keys=False, indent=4))
import yaml

dct_arr = [
  {'Name': 'John', 'Age': '23', 'Residence': {'Country':'USA', 'City': 'New York'}},
  {'Name': 'Jose', 'Age': '44', 'Residence': {'Country':'Spain', 'City': 'Madrid'}},
  {'Name': 'Anne', 'Age': '29', 'Residence': {'Country':'UK', 'City': 'England'}},
  {'Name': 'Lee', 'Age': '35', 'Residence': {'Country':'Japan', 'City': 'Osaka'}}
]

print(yaml.dump(dct_arr, sort_keys=False, default_flow_style=False))

Source: https://www.delftstack.com/howto/python/python-pretty-print-dictionary

How to apply a function to two columns of Pandas dataframe

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

Source: https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe

Convert a string representation of list into list

ini_list = "[1, 2, 3, 4, 5]"
  
# printing initialized string of list and its type
print ("initial string", ini_list)
print (type(ini_list))
  
# Converting string to list
res = ini_list.strip('][').split(', ')
  
# printing final result and its type
print ("final list", res)
print (type(res))
import ast
  
# initializing string representation of a list
ini_list = "[1, 2, 3, 4, 5]"
  
# printing initialized string of list and its type
print ("initial string", ini_list)
print (type(ini_list))
  
# Converting string to list
res = ast.literal_eval(ini_list)
  
# printing final result and its type
print ("final list", res)
print (type(res))
import json
  
# initializing string representation of a list
ini_list = "[1, 2, 3, 4, 5]"
  
# printing initialized string of list and its type
print ("initial string", ini_list)
print (type(ini_list))
  
# Converting string to list
res = json.loads(ini_list)
  
# printing final result and its type
print ("final list", res)
print (type(res))

Source: https://www.geeksforgeeks.org/python-convert-a-string-representation-of-list-into-list/

Pandas/Python: Set value of one column based on value in another column

one way to do this would be to use indexing with .loc

Example

In the absence of an example dataframe, I’ll make one up here:

import numpy as np
import pandas as pd

df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5  Value
6      g

Assuming you wanted to create a new column c2, equivalent to c1 except where c1 is Value, in which case, you would like to assign it to 10:

First, you could create a new column c2, and set it to equivalent as c1, using one of the following two lines (they essentially do the same thing):

df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']

Then, find all the indices where c1 is equal to ‘Value’ using .loc, and assign your desired value in c2 at those indices:

df.loc[df['c1'] == 'Value', 'c2'] = 10

And you end up with this:

>>> df
      c1  c2
0      a   a
1      b   b
2      c   c
3      d   d
4      e   e
5  Value  10
6      g   g

If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have, rather than create a new column, then just skip the column creation, and do the following:

df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10

Giving you:

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5     10
6      g

Source: https://stackoverflow.com/questions/49161120/pandas-python-set-value-of-one-column-based-on-value-in-another-column

Select rows from a Pandas DataFrame based on values in a column

import pandas as pd

# Create some dummy data
raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
'age': [20, 19, 22, 21],
'favorite_color': ['blue', 'blue', 'yellow', "green"],
'grade': [88, 92, 95, 70]}

df = pd.DataFrame(raw_data)
df.head()
'''
age	favorite_color	grade	name
0	20	blue	88	Willard Morris
1	19	blue	92	Al Jennings
2	22	yellow	95	Omar Mullins
3	21	green	70	Spencer McDaniel
'''
# Select rows based on column value:
#To select rows whose column value equals a scalar, some_value, use ==:
df.loc[df['favorite_color'] == 'yellow']
'''
age	favorite_color	grade	name
2	22	yellow	95	Omar Mullins
'''
# Select rows whose column value is in an iterable array:
#To select rows whose column value is in an iterable array, which we'll define as array, you can use isin:
array = ['yellow', 'green']
df.loc[df['favorite_color'].isin(array)]
'''
age	favorite_color	grade	name
2	22	yellow	95	Omar Mullins
3	21	green	70	Spencer McDaniel
'''
# Select rows based on multiple column conditions:
#To select a row based on multiple conditions you can use &:
array = ['yellow', 'green']
df.loc[(df['age'] == 21) & df['favorite_color'].isin(array)]
'''
age	favorite_color	grade	name
3	21	green	70	Spencer McDaniel
'''

# Select rows where column does not equal a value:
#To select rows where a column value does not equal a value, use !=:
df.loc[df['favorite_color'] != 'yellow']
'''
age	favorite_color	grade	name
0	20	blue	88	Willard Morris
1	19	blue	92	Al Jennings
3	21	green	70	Spencer McDaniel
'''
# Select rows whose column value is not in an iterable array:
#To return a rows where column value is not in an iterable array, use ~ in front of df:
array = ['yellow', 'green']
df.loc[~df['favorite_color'].isin(array)]
'''
age	favorite_color	grade	name
0	20	blue	88	Willard Morris
1	19	blue	92	Al Jennings
'''

Source: https://www.interviewqs.com/ddi-code-snippets/rows-cols-python

How to filter Pandas dataframe using ‘in’ and ‘not in’ like in SQL

You can use pd.Series.isin.

For “IN” use: something.isin(somewhere)

Or for “NOT IN”: ~something.isin(somewhere)

As a worked example:

import pandas as pd

>>> df
  country
0        US
1        UK
2   Germany
3     China
>>> countries_to_keep
['UK', 'China']
>>> df.country.isin(countries_to_keep)
0    False
1     True
2    False
3     True
Name: country, dtype: bool
>>> df[df.country.isin(countries_to_keep)]
  country
1        UK
3     China
>>> df[~df.country.isin(countries_to_keep)]
  country
0        US
2   Germany

Source: https://stackoverflow.com/questions/19960077/how-to-filter-pandas-dataframe-using-in-and-not-in-like-in-sql

Ways to filter Pandas DataFrame by column values

options = ['Science', 'Commerce'] 
    
# selecting rows based on condition 
rslt_df = dataframe[dataframe['Stream'].isin(options)] 
    
print('\nResult dataframe :\n',
      rslt_df)
# selecting rows based on condition
rslt_df = dataframe.loc[dataframe['Percentage'] > 70]
	
print('\nResult dataframe :\n',
	rslt_df)

Source: https://www.geeksforgeeks.org/ways-to-filter-pandas-dataframe-by-column-values/

Python JSON Benchmarking - orjson, ujson

  • ujson is 3 times faster than the standard json library
  • orjson is over 6 times faster than the standard json library

Source: https://dollardhingra.com/blog/python-json-benchmarking/

pandas-stubs: Public type stubs for pandas

pip install pandas-stubs

Debugging Python with pdb

The PDB module in Python gives us gigantic highlights for compelling debugging of Python code. This incorporates:

  • Pausing of the program
  • Looking at the execution of each line of code
  • Checking the values of variables

This module is already installed with installing of python. So, we only need to import it into our code to use its functionality. Before that we must know some concepts which are mentioned below:

  1. To import we simply use import pdb in our code.
  2. For debugging, we will use pdb.set_trace() method. Now, in Python 3.7 breakpoint() method is also available for this.
  3. We run this on Python idle terminal (you can use any ide terminal to run).

Sources:

How to create a density plot in matplotlib?

import numpy as np
import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.set_style('whitegrid')
sns.kdeplot(np.array(data), bw=0.5)

Source: https://stackoverflow.com/questions/4150171/how-to-create-a-density-plot-in-matplotlib

“Rich” library in Python - Progress, Color, Font

Track progress

import time
from rich.progress import track

for i in track(range(20), description="Processing..."):
    time.sleep(1)  # Simulate work being done
import time

from rich.progress import Progress

with Progress() as progress:

    task1 = progress.add_task("[red]Downloading...", total=1000)
    task2 = progress.add_task("[green]Processing...", total=1000)
    task3 = progress.add_task("[cyan]Cooking...", total=1000)

    while not progress.finished:
        progress.update(task1, advance=0.5)
        progress.update(task2, advance=0.3)
        progress.update(task3, advance=0.9)
        time.sleep(0.02)

References:

  • https://www.freecodecamp.org/news/use-the-rich-library-in-python/

Pandas – Append a List as a Row to DataFrame

list = ["Bigdata", 27000, "40days", 2800]
df2 = df.append(pd.DataFrame([list], 
     columns=["Courses","Fee","Duration","Discount"]), 
     ignore_index=True)

You can append a list as a row to the DataFrame at a specified Index using iloc

df.loc[index]=list

Source: https://sparkbyexamples.com/pandas/pandas-append-list-as-a-row-to-dataframe/

Creating Pandas dataframe using list of lists

# Import pandas library
import pandas as pd

# initialize list of lists
data = [[1, 5, 10], [2, 6, 9], [3, 7, 8]]

# Create the pandas DataFrame
df = pd.DataFrame(data)

# specifying column names
df.columns = ['Col_1', 'Col_2', 'Col_3']

# print dataframe.
print(df, "\n")

# transpose of dataframe
df = df.transpose()
print("Transpose of above dataframe is-\n", df)

Source: https://www.geeksforgeeks.org/creating-pandas-dataframe-using-list-of-lists/

How to Validate Your JSON Using JSON Schema

  • https://towardsdatascience.com/how-to-validate-your-json-using-json-schema-f55f4b162dce
  • https://python-jsonschema.readthedocs.io/en/stable/validate/
  • https://json-schema.org/understanding-json-schema/reference/string.html
  • https://akaphenom.medium.com/json-schema-to-validate-objects-for-downstream-consumers-5708147de2be
  • https://antenna.io/blog/2018/12/keep-your-sanity-and-use-json-schema-to-validate-nested-json-documents/
  • https://cswr.github.io/JsonSchema/spec/basic_types/
  • https://json-schema.org/understanding-json-schema/reference/object.html
  • https://stackoverflow.com/questions/26532137/jsonschema-multiple-values-for-string-property

PyDantic Automatic JSON Schema Creation

https://pydantic-docs.helpmanual.io/usage/schema/

Python Requests - No connection adapters

You need to include the protocol scheme:

'http://192.168.1.61:8080/api/call'

Without the http:// part, requests has no idea how to connect to the remote server.

Note that the protocol scheme must be all lowercase; if your URL starts with HTTP:// for example, it won’t find the http:// connection adapter either.

Source: https://stackoverflow.com/questions/15115328/python-requests-no-connection-adapters

Get response statuscode

response.status_code

What’s the best way to parse a JSON response from the requests library?

import json
import requests

response = requests.get(...)
json_data = json.loads(response.text)

Source: https://stackoverflow.com/questions/16877422/whats-the-best-way-to-parse-a-json-response-from-the-requests-library

How to extract HTTP response body from a Python requests call?

r = requests.get("http://www.google.com")
print(r.content)

Source: https://stackoverflow.com/questions/9029287/how-to-extract-http-response-body-from-a-python-requests-call

importlib and reloading within session

importlib.reload
import __builtin__
from IPython.lib import deepreload
__builtin__.reload = deepreload.reload

Source: https://ipython.org/ipython-doc/3/api/generated/IPython.lib.deepreload.html

Configure git

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com

Push git branch to remote

$ git push -u origin feature

Git Squash All Commits on a Branch

Soft-reset all changes

$ git log
commit a856ee456967a942ab379b27a4839962f88b92ce (HEAD -> feature/long-features)
Author: Cuong Nguyen
Date:   Mon Dec 27 20:53:18 2021 +0700

    Feature 2.3

commit 6f1599a18691906ed148dc40d2d290aaeceeaa5c
Author: Cuong Nguyen
Date:   Mon Dec 27 20:53:03 2021 +0700

    Subfeature 2

commit 94e35bae85f395c62fdaaa1aeaedbb11d2c94375
Author: Cuong Nguyen
Date:   Mon Dec 27 20:52:39 2021 +0700

    Subfeature 1

commit 9265e3bd97863fde0a13084f04163ceceff9a9d0 (grafted, tag: v1.0.0, branch-off-from-tag-v1.0.0)
Author: Cuong Nguyen
Date:   Sun Dec 19 19:33:07 2021 +0700

    Merge pull request #1 from stwarts/feature/shared-branch
$ git reset --soft 9265e3bd97863fde0a13084f04163ceceff9a9d0
$ git status
On branch feature/long-features
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   sub_feature_1.txt
	new file:   sub_feature_2.txt

Add Back the Changes

git add -A

The final step is to use git commit -m to generate a new commit.

$ git commit -m 'Squash 3 commits into 1'
[feature/long-features 8cc336c] Squash 3 commits into 1
 2 files changed, 2 insertions(+)
 create mode 100644 sub_feature_1.txt
 create mode 100644 sub_feature_2.txt
$ git log
commit 8cc336c6d1b2e6ed55470f99b040d6835ec655e5 (HEAD -> feature/long-features)
Author: Cuong Nguyen <cuong.nguyen@oivan.com>
Date:   Mon Dec 27 21:07:54 2021 +0700

    Squash 3 commits into 1

commit 9265e3bd97863fde0a13084f04163ceceff9a9d0 (grafted, tag: v1.0.0, branch-off-from-tag-v1.0.0)
Author: Nguyễn Phú Cường <npcuong.011308@gmail.com>
Date:   Sun Dec 19 19:33:07 2021 +0700

    Merge pull request #1 from stwarts/feature/shared-branch

Source: https://www.delftstack.com/howto/git/git-squash-all-commits-on-a-branch/

Regression Testing Script

Source: https://www.oreilly.com/library/view/programming-python-second/0596000855/ch04s04.html

4 Types of Comprehension in Python

  1. List Comprehension: my_list = [<expression> for <item> in <iterable> if <condition>]
  2. Dictionary Comprehension: my_dict = [<key>:<value> for <item> in <iterable> if <condition>]
  3. Set Comprehension: my_set = {<expression> for <item> in <iterable> if <condition>}
  4. Generator Comprehension: my_gen = (<expression> for <item> in <iterable> if <condition>)

Source:

A Deep Dive into Python Stack Frames

Python Stack Frame Evaluation

Frame Evaluation Python Code

def PyEval_EvalFrameEx(f):
    code = f.f_code
    while True:
        op = next_instruction()
        if op == CALL_FUNCTION:
            call_function(...)
    t.frame = f.f_back

def call_function(...):
    fast_function()

def fast_function():
    t = PyThreadState_GET()
    f = PyFrameObject()
    f.f_back = t.frame
    t.frame = f
    ...
    PyEval_EvalFrameEx(f)
    del f

Source: https://www.youtube.com/watch?v=smiL_aV1SOc&ab_channel=PyGotham2018

Python Threading Lock - Guide to Race-condition

Race condition example:

import threading
import time
 
x = 10
 
def increment(increment_by):
    global x  # x is a shared resource
 
    local_counter = x
    local_counter += increment_by
 
    time.sleep(1)  # we make the function sleep for a second so that both threads have enough time to finish execution
 
    x = local_counter  # update the shared resource
    print(f'{threading.current_thread().name} increments x by {increment_by}, x: {x}')
 
# creating threads
t1 = threading.Thread(target=increment, args=(5,))
t2 = threading.Thread(target=increment, args=(10,))
 
# starting the threads
t1.start()
t2.start()
 
# waiting for the threads to complete
t1.join()
t2.join()
 
print(f'The final value of x is {x}')

Let’s understand the code above:

  1. We have imported the threading and time module of python in the first lines.
  2. A variable x = 10 , is acting like a shared resource for threads t1 and t2. Shared Resource
  3. Two threads t1 and t2 are created using the threading module, with target function pointing to increment.
  4. t1 and t2 will try to modify the value of x in the increment function with 5 and 10 of increment as specified in args tuple. Two threads try to modify same resource
  5. start will intitiate the threads while join will wait for them to finish the execution as they will sleep for 1 sec in the increment function.
  6. t1 increments x to 15 and store/replace it in x.
  7. t2 modified x(which is 10 for t2) to 20 and then store/replace it in x.
  8. Then we will endup with x = 20 as t2 will replace/overwrite t1’s incremented value.
  9. This is the race condition, both t1 and t2 race to see who will write the value last.

Output:

Thread-1 increments x by 5, x: 15
Thread-2 increments x by 10, x: 20
The final value of x is 20 

Solution using threading’s Lock

Intended Outcome

Intended Outcome

Race condition brings unpredictability to the shared variable/resource.This is due to the order in which threads run. If each thread is executed individually, the expected outcome can be achieved.

A mechanism that ensures only one program has access to the shared resource(here x) at a time, this region will be called the critical section. It is possible through locking.

Critical Section

Locking is a synchronization (between two or more threads) mechanism. One process can lock the shared resource and make it inaccessible to others if operating on it.

Locking has two states:

  • Locked – means critical section is occupied, in binary i.e. 1
  • Unlocked – means critical section is vacant, in binary i.e 0
import threading
from threading import Lock
import time
 
x = 10
 
def increment(increment_by, lock):
    global x  # x is a shared resource
 
    lock.acquire()
 
    local_counter = x
    local_counter += increment_by
 
    time.sleep(1) # thread gets locked while sleeping
 
    x = local_counter  # update the shared resource
    print(f'{threading.current_thread().name} increments x by {increment_by}, x: {x}')
 
    lock.release()
 
lock = Lock()
 
# creating threads
t1 = threading.Thread(target=increment, args=(5, lock))
t2 = threading.Thread(target=increment, args=(10, lock))
 
# starting the threads
t1.start()
t2.start()
 
# waiting for the threads to complete
t1.join()
t2.join()
 
print(f'The final value of x is {x}')

Let’s try to understand the code above:

  1. To avoid the race condition we have imported the Lock class of threading module and created an instance/object of it, named lock.
  2. Lock has methods namely acquire and release, which as the name suggests acquires and releases the lock.
  3. For instance, in increment function t1 aquires the lock and rights to operate on shared resource(here x). t2 can’t modify or interfere in the operation until the lock gets released for t1.
  4. Firstly t1 completes its increment and finally t2 completes its increment on x, hence we obtain the intended value as the result.

Output:

Thread-1 increments x by 5, x: 15
Thread-2 increments x by 10, x: 25
The final value of x is 25

Alternatively, we can thread lock using a context manager

Context managers are a way of allocating and releasing some sort of resource exactly where you need it.

import threading
from threading import Lock
import time
 
x = 10
 
lock = Lock()
 
def increment(increment_by,lock):
    global x
 
    with lock:
        local_counter = x
        local_counter += increment_by
 
        time.sleep(1)
 
        x = local_counter
        print(f'{threading.current_thread().name} increments x by {increment_by}, x: {x}')
 
# creating threads
t1 = threading.Thread(target=increment, args=(5,lock))
t2 = threading.Thread(target=increment, args=(10,lock))
 
# starting the threads
t1.start()
t2.start()
 
# waiting for the threads to complete
t1.join()
t2.join()
 
print(f'The final value of x is {x}')

Source: https://www.pythonpool.com/python-threading-lock/

Race condition with a shared variable

Source: https://superfastpython.com/thread-race-condition-shared-variable/

Global Interpreter Lock (GIL)

Lock Critical Sections

Source: https://www.geeksforgeeks.org/python-how-to-lock-critical-sections/

Implementing Python Lock in Various Circumstances

Source: https://www.pythonpool.com/python-lock/

How to lock a variable in Python

Source: https://superfastpython.com/lock-variable-in-python/

Global Variables are bad

This has nothing to do with Python; global variables are bad in any programming language.

However, global constants are not conceptually the same as global variables; global constants are perfectly harmless. In Python the distinction between the two is purely by convention: CONSTANTS_ARE_CAPITALIZED and globals_are_not.

The reason global variables are bad is that they enable functions to have hidden (non-obvious, surprising, hard to detect, hard to diagnose) side effects, leading to an increase in complexity, potentially leading to Spaghetti code.

However, sane use of global state is acceptable (as is local state and mutability) even in functional programming, either for algorithm optimization, reduced complexity, caching and memoization, or the practicality of porting structures originating in a predominantly imperative codebase.

All in all, your question can be answered in many ways, so your best bet is to just google “why are global variables bad”. Some examples:

If you want to go deeper and find out why side effects are all about, and many other enlightening things, you should learn Functional Programming:

Source: https://stackoverflow.com/questions/19158339/why-are-global-variables-evil

Pandas qcut - Quantile-based discretization function

Source: https://pandas.pydata.org/docs/reference/api/pandas.qcut.html

Get list of holidays for each country

Holidays Python pacakge: https://pypi.org/project/holidays/

Type hinting/annotation for numpy.ndarray

Source: https://stackoverflow.com/questions/35673895/type-hinting-annotation-pep-484-for-numpy-ndarray

Easy way to test if each element in an numpy array lies between two values

import numpy as np
a = np.array([1, 2, 3, 4, 5])
(a > 1) & (a < 5)
# array([False,  True,  True,  True, False])

Source: https://stackoverflow.com/questions/10542240/easy-way-to-test-if-each-element-in-an-numpy-array-lies-between-two-values

SQL on Pandas

Pandas DataFrames stored in local variables can be queried as if they are regular tables within DuckDB.

import duckdb
import pandas

# connect to an in-memory database
con = duckdb.connect()

my_df = pandas.DataFrame.from_dict({'a': [42]})

# query the Pandas DataFrame "my_df"
results = con.execute("SELECT * FROM my_df").df()

Source: https://duckdb.org/docs/guides/python/sql_on_pandas

Pandas melt

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)

Source: https://pandas.pydata.org/docs/reference/api/pandas.melt.html

Get absolute path

import os
os.path.abspath()

Pandas DataFrame info

df.info(verbose=True)

Print a concise summary of a DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

Source: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html

Working with Python virtual environments: the complete guide

Source: https://blog.teclado.com/python-virtual-environments-complete-guide/

Automate Python Virtual Environment with a Script

Source: https://tech.serhatteker.com/post/2022-04/automate-python-virtualenv/

fitter - Python Package

fitter package provides a simple class to identify the distribution from which a data samples is generated from. It uses 80 distributions from Scipy and allows you to plot the results to check what is the most probable distribution and the best parameters.

Source: https://fitter.readthedocs.io/en/latest/

sweetviz - Automated EDA in Python

Source: https://towardsdatascience.com/sweetviz-automated-eda-in-python-a97e4cabacde

Interface

Simple example of an informal interface

class Validator:
    def validate(self, data):
        pass

class StringValidator(Validator):
    def validate(self, data):
        return isinstance(data, str)

class IntValidator(Validator):
    def validate(self, data):
        return isinstance(data, int)

sv = StringValidator()
print(sv.validate("abc")) # True
print(sv.validate(2)) # False

iv = IntValidator()
print(iv.validate(2345)) # True
print(iv.validate("2345")) # False
print(iv.validate(2345.2)) # False

Data Validation using dataclasses

from dataclasses import dataclass, asdict

@dataclass
class MyData:
    x: int
    y: str
    z: list[str]

def main():
    a = MyData(10, 10, 10)
    print(a)
    print(asdict(a))

if __name__ == "__main__":
    main()

Using a hash function

Python’s default hash function is randomized by default each time you start an instance (server) to prevent dictionary insertion attacks. So, it doesn’t produce consistent results everywhere.

Instead use a deterministic hash function like md5 or sha256 from the hashlib library.

Method chaining

https://www.tutorialspoint.com/Explain-Python-class-method-chaining

Using Pickle

https://wiki.python.org/moin/UsingPickle

Dict

  • https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value
  • https://stackoverflow.com/questions/3420122/filter-dict-to-contain-only-certain-keys

Abstract Base Class

  • https://dev.to/dollardhingra/understanding-the-abstract-base-class-in-python-k7h

UML

  • http://www.cs.utsa.edu/~cs3443/uml/uml.html

Creating an Executable

  • https://datatofish.com/executable-pyinstaller/

Dataclasses

  • https://medium.com/@jkishan421/dataclasses-an-awesome-approach-for-oop-in-python-50bc8b973b09

Pipenv

https://realpython.com/pipenv-guide/

Flask

https://reqbin.com/req/python/c-dwjszac0/curl-post-json-example

https://stackabuse.com/how-to-get-and-parse-http-post-body-in-flask-json-and-form-data/

https://stackabuse.com/get-request-query-parameters-with-flask/

Jupyter with pipenv

In your project folder:

pipenv install ipykernel
pipenv shell

This will bring up a terminal in your virtualenv like this:

(my-virtualenv-name) bash-4.4$

In that shell do:

python -m ipykernel install --user --name=my-virtualenv-name

Launch jupyter notebook:

jupyter notebook

In your notebook, Kernel -> Change Kernel. Your kernel should now be an option.

Source: https://stackoverflow.com/questions/47295871/is-there-a-way-to-use-pipenv-with-jupyter-notebook

Software Debugging Course

  • https://www.udacity.com/course/software-debugging–cs259

Sort List of Tuples

  • https://pythonguides.com/python-sort-list-of-tuples/

Sorting dictionary using operator.itemgetter

  • https://stackoverflow.com/questions/4690416/sorting-dictionary-using-operator-itemgetter

How to Sort a Dictionary by Value in Python

  • https://stackabuse.com/how-to-sort-dictionary-by-value-in-python/

conda - point to correct pip

On MacOS, running conda install pip fixed the issue

WSL - in ~/.profile, move $PATH to front to avoid conda pointing to wrong pip

Hashing in Python

  • Always use a algo from hashlib to generate a hash. Never use Python’s in-built hash function because if Python’s default hash function is used, it generates a different hash value on different servers.

https://docs.python.org/3/library/hashlib.html

https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions

List of all alphanumeric signs

Inside string module

  • string.ascii_lowercase
  • string.ascii_uppercase
  • string.digits as well as a few others

Each is given as a single string. If you want to convert them to a list, you can simply use list(string.ascii_lowercase + string.ascii_uppercase + string.digits)

  • https://stackoverflow.com/questions/51265716/is-there-a-list-of-all-alphanumeric-signs-in-python

Pandas - rename columns

  • https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html

Python project source codes

https://thecleverprogrammer.com/2021/01/14/python-projects-with-source-code/

How To Filter Pandas Dataframe By Values of Column?

https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/

Generating and using a Callgraph, in Python

https://cerfacs.fr/coop/pycallgraph

Python Tutorial: Understanding Python MRO - Class search path

  • https://makina-corpus.com/python/python-tutorial-understanding-python-mro-class-search-path

Python Interface

https://realpython.com/python-interface/#python-interface-overview

Virtual Subclass

https://www.demo2s.com/python/python-virtual-subclasses.html

Multithreading

https://jedyang.com/post/multithreading-in-python-pytorch-using-c++-extension/

Pypy vs CPython Multithreading

https://www.cs.cornell.edu/~asampson/blog/parallelpypy.html

Dataclasses

https://realpython.com/python-data-classes/

Why You Should Probably Never Use pandas inplace=True

https://towardsdatascience.com/why-you-should-probably-never-use-pandas-inplace-true-9f9f211849e4

How to Summarize Data with Pandas

https://medium.com/analytics-vidhya/how-to-summarize-data-with-pandas-2c9edffafbaf

40 Useful Pandas Snippets

https://medium.com/bitgrit-data-science-publication/40-useful-pandas-snippets-d7833472d12f

Black vs YAPF

https://news.ycombinator.com/item?id=17155048

Cruft

https://pypi.org/project/cruft/

Pandas dtypes

  • object
  • int64
  • float64
  • datetime64
  • bool

5 Pandas Group By Tricks You Should Know in Python

  • https://towardsdatascience.com/5-pandas-group-by-tricks-you-should-know-in-python-f53246c92c94

Monday of week

from datetime import datetime, timedelta
now = datetime.now()
monday = now - timedelta(days = now.weekday())
print(monday)
  • https://stackoverflow.com/questions/59981999/find-monday-of-current-week-in-python