How to build my first Python project?

LeTurtelboy
6 min readApr 15, 2023

--

We all love Python, aren’t we?, So if we love Python definitely we are going to love this starting guide to building well-structured repos in Python.

In this article, my idea is to cover the key aspects of a well-designed repository, all the concepts that I write here are the result of different experiences in different organizations and different approaches to solving common challenges in each of them. Let’s start with a small summary of what I consider to be a good repository. If you have any feedback, please let me know so we can build a better definition together

So let’s start with some key concepts that (I believe) are the fundamental building blocks for a solid foundation.

  • Structured folders and well-named modules: This makes everyone happy.
  • Linter and Security: Check our style 💅! and health! ⛔
  • Tests: Basic. We want to have a solid code! ✨
  • Makefiles: a pretty cool tool to automate stuff inside our repo 💻.
  • Git: just… Git!

Step 0: The Beginning.

Let’s start at the beginning! Many times when newbies are working, they don’t usually use IDEs properly, one of the most common problems but one that I have noticed that nobody mentions is the fact that we need to open only our project in the IDE, not a big

Step 1: Ignore what we don't want to follow!

If we are going to work with Python and we work with GIT we need to work with a proper .gitignore file, a gitignore file is a reference for our version control tool to avoid certain files or folders to be followed.

GitHub has a great example of a nice git ignore and we can use this one as a reference BUT remember to read the gitignore and understand what we are excluding to upload into our repos:

This is a pretty good example of the GitHub python reference gitignore.

Step 2: Structure!

I will focus on a simple Python/PySpark ETL process here to build a general structure (I’m using this because I'm a data engineer and feel really comfortable with Data engineering repositories 🐵). So, let's begin with some key files and why we want to include those in our project.

Base Project for Simple ETL

I'm going to start with a brief description of all the folders and the files that I have on my repo:

All folders and structure should be flexible to adjust your necessities, maybe you don't need the data folder or the scripts one… be pragmatic!

data:

this folder contains any useful data for our repo. These should be small files or small elements that we want to share with our partners or simply some important files that need to be shared. In this case, we have a sample CSV file.

Remember to share some mockup files. We don’t want our production process to leak in one of these files.

modules and utils:

these folders contain our helping modules and our utilities. A brief example can be this. In utils, we are going to place some logic that helps us with the process of dealing with dates or files. Maybe some useful module for logging or so, and modules should contain our product logic!

scripts: useful bash or cmd commands to recreate stuff. An example of this can be the conversion of some CSV to JSON schema or even parquet to CSV. This can be useful for the readability of the data we want to share.

tests: this folder is life haha:
In the unit subfolder, we want to place a mirror structure inside of this carpet that contains modules, utils, and in general every folder that contains “.py” files.

In behavioral, we maybe want some more advanced tests, based on the behavior of the app (this by itself deserves a different article tbh).

Step 3: The Files!

So let's dig down on some of the files we have in our repo there:

.pre-commit-config.yaml and .secrets.baseline:

Pre-commit is a tool that helps us maintain code quality by automating checks for common issues such as formatting, linting, and security vulnerabilities.

Pre-commit provides a framework for running these checks as hooks that run automatically before each commit. This ensures that code is consistently formatted, follows best practices, and is free of security vulnerabilities before it is committed to version control.

.secrets.baseline is a file used by the detect-secrets tool, which is a pre-commit hook that scans for potential secrets in code before it is committed. The baseline file contains a list of known secrets or patterns that should be ignored by the tool, to avoid false positives.

BTW I love using pre-commit with black, black is a Python code formatter that automatically formats code to adhere to the PEP 8 style guide.

Makefiles:

A Makefile is a simple text file used to define tasks and their dependencies, and automate software building processes. It is used to create a standardized build process that can be easily reproduced, maintained and distributed.

The Makefile works by defining rules that specify the dependencies and the commands to be executed. When a target is invoked, Make checks its dependencies to see if they need to be updated, and if so, it executes the commands specified in the rule.

But let me beter show you what im talking about:

Simple example:

Simple Hi on makefile!

noticed pls that the interaction with the makefile is by using make commands!

i define a command named hi and invoke the process using $make hi

Now some more advanced but usefull stuff:

clean-pyc:
find . -name '*.pyc' -exec rm -f {} +
find . -name '.DS_Store' -exec rm -f {} +
find . -name '*.pyo' -exec rm -f {} +
find . -name '*~' -exec rm -f {} +
find . -name '__pycache__' -exec rm -fr {} +

install:
pipenv install

install-local:
pipenv install --dev
pipenv run pre-commit install

test:
coverage erase
coverage run --source=my-project/ --omit=*/main.py,**/__**.py,*/*/*settings*.py,*/*/arg_parser.py,*/*/arg_parser.py -m unittest discover -s tests/unit/
coverage report -m
if [ -d htmlcov ]; then rm -r htmlcov; echo "removed old htmlcov"; fi
coverage html

Pipfile:

A Pipfile is a configuration file used by pip, the package manager for Python, to manage dependencies for a project. It provides a more modern and efficient way of managing dependencies compared to the traditional “requirements.txt” file.

The Pipfile specifies the dependencies of the project, including their versions, and can also include information about the Python version and environment used for the project. It is designed to be easily shareable and reproducible across different systems, and can be used to create virtual environments for the project.

In summary, the Pipfile is a powerful tool for managing dependencies in Python projects. It simplifies the installation and management of packages, provides a consistent and reproducible environment, and is becoming increasingly popular among Python developers.

Step 4: Continue Iterating!!!

“Remember, every master was once a beginner. Keep practicing these good habits in Python, and soon enough, you’ll be a master too!”

Working in development is amazing. Let’s make the process of learning enjoyable for people who are just starting out, and help each other in our work. We are where we are today because of people who didn’t deny us the answers to our questions. Be the kind of senior that you wished you had when you were a junior.

Thanks for your time!

--

--

No responses yet