Data Project from Scratch

Flow

This is an example project dedicated to demonstrating refactoring practices. In this space, you will find an in-depth description of ETL, installation instructions, answers to frequently asked questions, and more. Whether you are a collaborator or simply someone interested in the project, we hope you find this documentation useful.

Additionally, this documentation can be integrated into Confluence or an internal intranet, facilitating access and collaboration for all team members.

To access GitHub repository, click here.

Sections

Introduction

The objective of this project is to demonstrate how refactoring techniques can be applied to improve code quality, optimize performance and make software more maintainable. Refactoring is essential for keeping code clean and understandable, allowing teams to maintain high development velocity over time.

Prerequisites:

  • Git for code versioning
  • Pyenv for creating virtual environments
  • Poetry for dependency management
  • Pytest for unit and integration testing
  • MKDocs for documentation and GitHub Pages to host it
  • GitHub Actions for Continuous Integration

Installation Guide

Here, you will find detailed instructions on how to install and configure the project in your local environment. Following the instructions correctly ensures that you have a smooth experience when working on the project.

Clone the repository

git clone https://github.com/guimarczewski/DataProject.git
cd DataProject

Configure the correct Python version with pyenv

pyenv install 3.11.3
pyenv local 3.11.3

Install project dependencies

poetry install

Activate the virtual environment

poetry shell

Run the tests

task test

Run the documentation

task doc

Run the pipeline

task run