Skip to main content

Building Reusable Open-Source Lego Pieces in Data Freelance Missions

·735 words·4 mins
Morgan Durand
Author
Morgan Durand
Freelance @ Data Upsurge.

In the fast-paced world of data science consulting, efficiency and innovation are crucial. Freelancers, especially those working on data-focused projects, often face the challenge of balancing client-specific needs with the desire to build long-term value. One powerful way to address both is by developing open-source “Lego pieces” — modular, reusable components that can be easily customized and applied across multiple projects.

What Are Open-Source Lego Pieces?

Open-source Lego pieces are pre-built, modular tools or libraries designed to perform specific functions within a data project. Think of them as building blocks that can be assembled and reassembled to create tailored solutions for different clients. These pieces might range from data preprocessing scripts and machine learning models to visualization templates and pipeline automation tools.

Benefits for the End Client
#

  • Cost Reduction: By leveraging pre-built components, freelancers can reduce development time and, subsequently, costs for the client. Instead of reinventing the wheel for each project, freelancers can customize existing modules to fit the client’s unique needs.
  • Faster Delivery: Open-source Lego pieces streamline the development process, enabling freelancers to deliver solutions more quickly. This rapid turnaround can be a competitive advantage in today’s data-driven business world.
  • Quality Assurance: Since these components are reused across multiple projects, they are often tested and refined over time, leading to more reliable and higher-quality solutions.

Advantages for Freelancers
#

  • Reusability and Efficiency: Building reusable components reduces repetitive work, freeing up time to focus on more complex and high-value tasks.
  • Scalability: Modular pieces can be expanded or modified to meet evolving project demands. They also serve as a foundation for future projects, creating cumulative value over time.
  • Reputation Building: Contributing to the open-source community can enhance a freelancer’s reputation, showcasing their technical expertise and commitment to collaboration.
  • Community Collaboration: Open-source projects encourage collaboration with other data professionals, leading to improvements and innovations beyond what a single individual could achieve.

How to Develop and Share Open-Source Lego Pieces
#

  • Identify Common Pain Points: Start by analyzing common challenges faced across multiple projects. Build components that address these pain points efficiently.
  • Modular Design: Ensure that each piece is self-contained and easily integrable with other components.
  • Documentation: Clear, comprehensive documentation is essential for adoption and reusability. It should cover installation, usage, customization, and troubleshooting.
  • Version Control and Licensing: Use platforms like GitHub to manage versions and changes. Choose an open-source license that aligns with your goals and ensures proper usage.
  • Client Transparency: Be transparent with clients about the use of open-source components and the benefits they bring. This fosters trust and demonstrates a commitment to efficiency and innovation.

Real-World Example
#

One recent mission involved helping a client build their data analysis ecosystem in R. The client used R blogdown to publish data analysis reports in a private data blog, sharing insights across their organization.

During the project, we developed a set of BigQuery data query functions that automatically archived query results in both local and S3 storage. This allowed the team to modify or extend existing reports based on the exact same data at any time.

To ensure efficient data retrieval, the cache key was based partially on the hash of the query string. However, due to query reformatting, the string was sometimes modified, which invalidated the cache.

To address this issue, we, at Data Upsurge, developed an R package called SQLFormatteR, which formats SQL code with customizable options, including indentation, case formatting, and more, ensuring clean, readable, and consistent queries. This package was a wrapper around the Rust crate sqlformat.

Integrating SQLFormatteR into the client’s ecosystem was cost-effective, enabling the client to gain this valuable feature with minimal additional expense. This solution was part of the extra-mile service that freelancers can bring to a mission.

The result ? The client benefited from a reliable caching system that reduced redundant data processing, and the R package was made available to the open-source community, promoting broader collaboration and innovation.

A Win-Win for Clients and Freelancers
#

Open-source Lego pieces offer a win-win solution for both freelancers and clients. They foster innovation, enhance efficiency, and create long-lasting value. By embracing this modular approach, data freelancers can position themselves as strategic partners who not only solve immediate challenges but also build the foundation for future growth. They indeed offer opportunities to create lasting impact beyond a single mission while demonstrating technical expertise.

Ready to level up your data freelance missions ? Start building your open-source toolbox today !