Garbage in, garbage out (GIGO) is a concept originating from the early days of computing. If you input faulty data, you get junk output. Garbage data, or junk data, determines the quality of the final product.
You can apply this principle to any system that relies on data, including management. Using flawed data to make managerial decisions will lead to poor — and perhaps even harmful — outcomes.
In this post, we’ll focus on garbage/junk data, and explain its main causes.
What is junk data?
Junk data, sometimes called “dark data” or “bad data,” is data that is unsuitable for further use or no longer serves any distinct purpose. Many organizations know they have junk data but aren’t sure what to do about it. Others don’t realize they have it, so they continue to accumulate more.
Garbage in: Where does junk data come from?
Some data starts out fine but becomes garbage because team members don’t update it. Other data is junk from the start because it’s incomplete, inaccurate, or duplicated. This can happen for a few reasons.
Data inaccuracy or incompleteness could be a result of poorly written code or incorrect formatting. Human error, auto-correction, and auto-filling features can spell trouble for your data.
For example, say you were to misdraw a dependency link between two tasks without noticing. That mistake could seriously impact the schedule of the dependent task. And apart from potentially triggering a series of events you didn’t intend, your stakeholders will have an inaccurate view of your project timeline.
In an Agile setting, a team member could enter an incorrect number of story points completed on a given day. As a result, the burndown chart would show inaccurate progress of the team throughout the Sprint.
One-off data sets
Junk also results when someone creates multiple one-off sets of the same file or another dataset. Then, they work on it independently — enter new information and remove what they don’t need — without updating or integrating it with the primary source. And this can happen at scale because every organization member can do it.
Imagine introducing changes to your project roadmap or work breakdown structure, but keeping those updates only in your personal spreadsheet. Unless you also update the project management software you and your (non)Agile teams use, you generate junk data. As a result, your project management software, or any other single source of truth your organization uses to manage its projects and portfolios, is out of date.
Another source of junk data is data hoarding. Organizations often collect loads of data without any plan to use it — and no plan to ever delete it, either. After a while, that data’s forgotten and gets out of date.
Even when companies collect data for a specific reason, they often collect more than they actually need. Sometimes it’s just easier to collect more upfront and figure it out later. But in practice, this leads to more garbage and makes analysis much more difficult. (Collecting data willy-nilly might even violate privacy laws, so you really want to be intentional about what you gather and why.)
Project scope sometimes changes, even during the project execution phase. This can happen for many reasons. For example, stakeholders might request ad-hoc changes or you might encounter delays that affect other deadlines. In a situation like this, junk data is produced when those changes slip by without being noted by the person responsible.
In a project management setting, junk data can also stem from task over or underestimations. In Agile planning and other methodologies, teams estimate tasks before they start working on them. But those estimations can often be inaccurate. Teams can estimate that those tasks will take more effort than they actually do — or vice versa. Those inaccurate estimations can also result from a lack of proper planning.
Consequently, teams, their manager, and other stakeholders rely on inaccurate (junk) estimations that can impact project development decisions.
The problem occurs when data are presented in a way that leaves users confused and causes them to draw wrong conclusions.
Data format also plays a significant role in data validity. If users can’t read the format, then the file becomes junk data.
Multiple points of data entry
If data enters your system from several different sources, you can be misled by duplicate information. This holds especially true for organizations or teams working heavily with CRM systems.
In this scenario, one user enters new customer information without prior checking whether such a customer already exists in the database. Data duplication may also happen during bulk data import from, most commonly, CSV files.
Garbage out: Why is junk data a problem?
One of the most important characteristics of good data is accuracy. But even a small mistake can greatly affect data quality and reliability. And that can cause a series of problems.
Junk data in project requirements
If we create a product customers don’t need (no matter how pretty it is), the product is worthless.
Poorly identified project requirements (inputs) result in useless output (a worthless product). Some companies double down on that useless output by performing further operations on flawed datasets or products (e.g., running faulty reports).
But the opposite does not necessarily hold true: Even if you use quality inputs, there’s no guarantee that the output will turn out the way you hope.
Because you can make a mistake somewhere along the way. For example, you might incorrectly program a machine. Or copy the wrong data column to a spreadsheet, even though the data in that column is correct (albeit irrelevant).
So there are lots of ways to spoil outputs, even with quality inputs. If you start with garbage, you’re done before you’ve begun. Once the garbage is in, it stays garbage. You can’t fix spoiled inputs to produce quality outputs. The garbage is just garbage.
Scaled garbage in scaled Agile
The situation looks slightly different when we consider scaling Agile within an organization.
Here, our input is a process. The process is not data per se, but you can think of it as a data-like component that you can spoil and generate more junk.
Consider a small scale first: a single Agile team follows inefficient Agile processes and doesn’t deliver iteratively. Here, the “garbage in” is an inefficient process that causes teams to deliver poor outcomes.
What happens if you scale up those ineffective processes and apply them to more teams or even the entire organization? In line with the GIGO principle, you’ll be worse off than you were at the beginning. Because you’ll have multiplied the garbage and junked up your organization.
That’s why, just like junk data in project requirements, data analysis, and any other data-dependant situation, you need all the processes and tools ready at the beginning. Only then can you hope for good results.
Garbage causes privacy concerns
Ungoverned data — especially duplicative junk — often contains sensitive information that poses risks. In the event of a security breach, junk data could disclose customer lists, personally identifiable information, and proprietary product information.
On the other hand, management might not be even aware of such a security risk until a breach or data leakage happens. And that is a real possibility, especially when data is stored outside of a company’s security operations.
Reduce junk data with BigPicture
BigPicture by Appfire is a comprehensive PPM software that can supercharge your daily operations and help you and your teams produce less junk data.
It’s perfect for project planning and execution. It enables you and your teams to track tasks and update progress. This way, you empower your teams to be part of keeping your project data up-to-date and maintaining one source of truth for everyone.
The app also allows you to share your project views with your stakeholders, even if they don’t have access to Jira, eliminating data compatibility issues.