So here is a news from strange but true department. Microsoft Excel blamed for gene study errors [bbc.com].
Microsoft’s Excel has been blamed for errors in academic papers on genomics.
Researchers trying to raise awareness of the issue claim that the spreadsheet software automatically converts the names of certain genes into dates.
Gene symbols like SEPT2 (Septin 2) were found to be altered to “September 2”.
Aah, classic!
This is what happens when you spend countless hours learning genome sequencing and very little about the software tools where your data goes. May be we need bring clippy back to warn people about such sticky situations.
All jokes aside, here is a public service announcement for you. Beware of helpful features in Excel like auto correct, flash fill, auto fill, scientific notation etc.
Here are a few tips for you if you find yourself coding genome in Excel (or something similar)
- Use TEXT format for data that contains possible dates, values that start with = etc. To set TEXT format, select data entry range and use Home > Number > Text
- This can deal with cells that contain possible dates, credit card numbers, very long numbers, leading zeros, fractions, values that start with = (which Excel thinks formulas )
- This can deal with cells that contain possible dates, credit card numbers, very long numbers, leading zeros, fractions, values that start with = (which Excel thinks formulas )
- When importing text files to Excel (like your genome sequence data or what have you), select text as data type for the columns that can be misinterpreted by Excel.
- If a cell starts with = and should not be treated as a formula, prefix the cell with ‘ apostrophe
- Disable features like Flash fill, auto complete and percentage entry if you must
Help the hapless, share your tips
Now its your turn. Please share your tips to handle situations like these. Post your tips in the comments box.
More reading:
Before you embark on saving sensitive stuff in spreadsheets, soak up some survival skills: