Pandata Blog

AI design and development for high risk industries

We live in an era of unprecedented automation and rapidly expanding data. In 2019 alone, the internet generated 2.5 billion bytes of data every day, but only about .5% of that data was meaningfully analyzed! This data deluge, combined with the want to derive more value ever faster, has many companies beginning to explore new ways to tame the uncertainty of this rapidly growing digital world. These changes have driven unprecedented growth in high-level programming, with Python seeing a historic 7% increase in use during 2019 alone. 

With this growth comes opportunities, but also challenges.  Although Python and other languages offer incredible flexibility in a package anyone can learn, their presence has led to the rise of the “incidental programmer,” the person who programs for their job, but for whom programming is not primarily their job. For these individuals, the incredible initial value of automation can easily turn sour, as the challenges of engineering a solution scale with its size. And so many companies get stuck somewhere in the middle, using adequate solutions to automate their most painful tasks, but struggling to scale past small scripts and solutions. But there is a way forward! 

The struggles that many small and medium-sized businesses face in fully leveraging programming are the same struggles that software engineers have had to conquer over the last 50 years. And luckily for us, there is much documentation to work from – a set of “tips and tricks,” as it were – for engineering a solution that will not only solve the problem, but that can scale with your team as you grow. By learning from what software engineers use to develop complex applications, we can intelligently apply the tools and techniques most likely to simplify our lives. What follows is a list of some of the most common tools and principles to keep our programs simple as they grow: 

  • As the creator of Python, Guido Van Rossum, said himself, “Code is read much more often than it is written.” This may not be true of all code, but for business functions you do over and over again, you will spend far more time reading and debugging than you will writing code. And for that, standards matter
  • So how do we make sure we’re writing good code? I’d recommend reading some guides, but the easiest way is to install a linter! Basic linters check for simple readability issues like PEP8 standards violations, while more advanced ones can even identify bugs in your code before you compile. And given that you’ll likely spend more time debugging than programming, small reductions in bugs can have huge impacts on the hours needed for a project!
  • Although most of us will rigorously test our outputs before release, these tests take time, and can only catch bugs that we remember to look for! For code that is relatively stable, automated testing can ensure that changes to unrelated parts of the program don’t introduce errors into the output without you knowing it. 
  • As we all know, a lack of consistent version control can turn even the best project into a wasteland of unwieldy filenames and abandoned files, and can make it hard to track down feature-breaking bugs. A VCS like Git, hosted on a remote server like Bitbucket, allows us to track, manage, and approve changes as they occur, and before insane file-structures and avoidable bugs infest the codebase. 
  • Bonus feature: Continuous Integration and Deployment (CI/CD) tools can allow you to automate all of the above process, so that a simple command can set off automated linting, testing, and notify a colleague to review the changes. The time savings can be immense, and solutions are inexpensive and widely available.

Each of these tasks doesn’t simply improve the “purity” of your codebase; each translates into time, money and sanity saved for everyone on your team. Coding standards make reading code easy, which saves time debugging, onboarding new employees, and facilitates collaboration. Linters can help nudge you to conform to those standards, makes it easier to understand and debug code when you come back to it later. Automated tests speed up testing and ensure that our code acts the way we think it should, reducing debug and downtime significantly. And finally, version control systems allow not only the security of an offsite backup, but many levels of verification and error-checking, which means fewer bugs for you and your team, a faster development process, and a demonstrably more robust engineered solution.

At Pandata we take these lessons to heart and are always working to better align data science and data engineering. This dual emphasis on efficient prototyping and scalable solutions ensure that our models stay clean and robust while our development stays agile.

Chris Brace is a Data Analyst at Pandata.