You may think that your product is your greatest asset. in fact, your Data is probably your biggest asset. It needs to be treated accordingly.

Let me explain. You are a scientist (a medicinal chemist for example) working on a new potential drug, you get 10 mg of this new precious compound, the most valuable thing to you. But about all the data that you have generated during the process of getting these precious 10mg? Are you treating it with as much care?

The physical bias

Ten mg of a new compound, whatever it is a nice little white crystallin powder or, in my experience, a clear-ish sticky residue at the bottom of a round-bottom flask, is like gold-dust to a scientist. It is likely to be the results of a lot of effort, blood, sweat and tears. But you can see it. It has a mass when you put it on the balance. It is a physical, tangible thing that you can touch. So you can care about it “with your hands”.

The same can’t be said for the data you generated on the way to obtain this compound. When data was solely recorded on paper, you could hang to the physical notebook, make sure it is is kept in a safe place. But even that was not sufficient to capture and secure all the data: what instrument did you use for a particular assay? Where is the printout of the analysis? When was the instrument calibrated etc.?

With digital data, it is even worse: it is not possible to “touch” the data. It is encrypted in a series of 1 and 0 on a server, likely to be on the cloud, somewhere in a big data centre. Nothing physical to touch, making it much harder to care about it in traditional ways.

It is easier to care for something you can physical touch, that has a physical presence. You can take some physical measures to take care ofr it, like labelling it, placing it in the fridge. You can see the physical evidence of caring for it. Others can too. Not so easy for the digital data…

Why care for data?

Let’s go back to the example of the 10 precious mg of the new compound you have newly synthesised. After completing the structural confirmation analyses, you send it for biological testing. Good news, the initial tests show something promising. So now you are asked to produce more of this compound. No problems! Let’s go back to the bench.

Where do you start? Do you use your memory to reproduce the 27 steps needed or do you go back to the data you captured in your Electronic Lab Notebook (ELN)? Yes, even if you are a super human with amazing memory, using the notes you made earlier in the process is a good idea… but is it quality data? What if you need someone else to prepare some more of this product for you?

What about the quality of the data?

We hear often about the quality of a product. Measures are usually in place to ensure a product is good quality like the well-named quality control and quality assurance. Is it the same for data? Is it needed for data?

The answer is YES of course, you need good quality data. In the earlier example, quality data recording on the steps used will enable to reproduce the same results as before, in most of the cases -depending on the quality of the data. Beyond reproducing results, quality data is also key to many learnings and opens the door to improvements and new discoveries.

Quality data just doesn’t happen by itself and everyone needs to be involved as mentioned in this article. You need to put in place some measures to ensure of data quality.

What does quality data look like?

Here are some aspects to consider when it comes to data quality (non exhaustive list!):

  • Accurate and complete capture of the data: all the parameters of experimentation and assays should be capture in an as-accurate-as possible manner. Using automation and integrated instruments can help as they capture systematically measures and parameters.
  • Metadata: it is often forgotten but it is an important source of information. It can allow categorisation, further analysis and insights into the data. Though, metadata shouldn’t be random, it needs to be harmonised across an organisation to be useful. Ontologies, terms catalogues need to be carefully designed to work for every stakeholder of the data.
  • Storage and Access. A lot of the scientific data is now stored on the cloud. This is convenient due to the redundant backups and security provided by the various platforms. Permission to the people allowed to see the data vs. those who can’t also need to be considered. It is important to mention that unfortunately there are a lot of data siloes where data is stuck in its own little world i.e. an instrument or system, that doesn’t “talk” any other system. These need to be identified and opened/connected to the world. Not quite the world, some access permissions need to be in place for confidential and restricted information.
  • Finding the data and being able to read and use it. When I was using paper notebooks, one of the biggest challenges I faced was to find the data I needed. Another one was the readability my own handwriting or my colleagues’ one. With the digitalisation, these problems are still present but in a different form: Is the data correctly indexed and findable? Is it recorded in a digital format that humans can read or at least is an interoperable digital format so it can be correctly understood by other systems, not just the one that it was created with.
  • Negative results capture: This one may seem a bit odd but hear me out: It is very tempting to capture what worked and don’t bother what didn’t. But, like in learning by experience, you may learn more by what didn’t work that what actually did. When you need to improve your product and its synthesis, the records of what has not worked will save you time, effort and resources so you don’t spend it redoing work that won’t be productive. It is also very important for machine learning (ML) and training data for Artificial Intelligence (AI).

I have listed here only a few pointers to take care of your data. You can also find further help in the FAIR data principles and in the ALCOA+ principles, used by many industries,

What can you do to improve your data quality?

As mentioned above, data quality is everyone’s business. It is important to identify all the stakeholders involved in the creation and use of the data and understand their requirements but also their challenges. You need to involve them, show the value quality data can bring to the organisation and to them specifically. Any changes must be carefully introduced and managed. Forcing a change without caring for the individuals is a sure way to failure.

Having a Data Steward can help with this along, helping establish Data Governance and get the most value of your data. Read more about Data Stewardship in our previous blog.

Contact us to discuss your data needs and see how we can help you.

Data is probably your biggest asset. Ensure it is of good quality and treat it with care.


Leave a Reply

Your email address will not be published. Required fields are marked *