Data Project from Scratch

Flow

This is an example project dedicated to demonstrating refactoring practices. In this space, you will find an in-depth description of ETL, installation instructions, answers to frequently asked questions, and more. Whether you are a collaborator or simply someone interested in the project, we hope you find this documentation useful.

Additionally, this documentation can be integrated into Confluence or an internal intranet, facilitating access and collaboration for all team members.

To access GitHub repository, click here.

Introduction

The objective of this project is to demonstrate how refactoring techniques can be applied to improve code quality, optimize performance and make software more maintainable. Refactoring is essential for keeping code clean and understandable, allowing teams to maintain high development velocity over time.

Prerequisites:

Git for code versioning
Pyenv for creating virtual environments
Poetry for dependency management
Pytest for unit and integration testing
MKDocs for documentation and GitHub Pages to host it
GitHub Actions for Continuous Integration

Installation Guide

Here, you will find detailed instructions on how to install and configure the project in your local environment. Following the instructions correctly ensures that you have a smooth experience when working on the project.

`Clone the repository`

git clone https://github.com/guimarczewski/DataProject.git
cd DataProject

`Configure the correct Python version with pyenv`

pyenv install 3.11.3
pyenv local 3.11.3

`Install project dependencies`

poetry install

`Activate the virtual environment`

poetry shell

`Run the tests`

task test

`Run the documentation`

task doc

`Run the pipeline`

task run

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search