Philosophy of Coding and managing MD simulations (Things I Hope to Know Earlier)

Shasha Feng
4 min readDec 3, 2020

Over the years of working on MD simulations and data analyses, I start to think about the best practice and the philosophy of doing such kind of work. How to make it more enjoyable and less tiring?

I also have this in my Github private repository about MD simulation and analysis scripts. This might be an on-going efforts. I will update as I think deeper.

  1. It is very beneficial to document what I did during the exploratory phase. Lots of new coding. The time it takes to write it down can save tons of time later. It also helps reduce the resistance to pick something up again.
  2. More than point 1, beautiful, clear formatting and logic are also important for quick review and pick-up.
  3. Learn to manage data. Managing a large amount of data requires pre-planning, strategic thinking, and techniques. More automatic processing would be preferable. Clear naming comes into a necessity. (Very comfortable later to use the automatic code. Super cool to manage large trajectories.)
  4. Good to have a work station with enough power to work on large data. Get on the highway.
  5. Every week, try to clean up the data generated last week. Would give myself more peace of mind in the longer term.
  6. Good file naming is a sign of a well-educated programmer noble. Name them in a way that is easy to retrieve.
  7. It is good to start an analysis, but also important to follow through. Do detailed benchmarking and variate the hyperparameters. Do not leave things behind. Finish them on the spot and strong. So there will be less worries in life and more peace of mind.
  8. Keep updated with the latest technology and package development in the field. For example, ML & DL, MDAnalysis package.
  9. Start small, step by step, and take it easy.
  10. How to develop an analysis pipeline:
    - Do exploratory analysis
    - Understand the significance of results
    - Tune the parameters to prove robustness, justification
    - Find the best set of parameters for production/deployment
    - Get others to go to Sup for reference info
    - Source other findings to support this
  11. When we feel things are tedious, it is usually because they are not automatic enough. After the first few trials, we should seek to automate the whole process and audit each step.
  12. When we feel difficult, it is not an illusion, because managing lots of data is indeed difficult. Also good documentation about what we did.
  13. Like a painting needs to be framed up for observation, the analysis data also need to be cut and organized in a publication way for observation and appreciation. Technically, it is about to make them into a publication figure, putting into Work document, and say what we can learn from this figure.
  14. It is very helpful to take some time once a while (every week or two weeks) to slow down and record the techniques or philosophy we learned recently. Maybe even good to share our stories.
  15. Learned how to organize the results of a project and write ‘Main messages’ and ‘New & insights’ for each section. Each important point is written into a sentence and this sentence can be reformatted into a bullet point. Each bullet point is then connected to the figure ID. (New things learned from Wonpil today)
  16. This is my first time to repick an old project and do the analysis. I then want to have a pilot study of how difficult or easy it is to redo all the analyses. This can be a good examination of my code management. My Python parser should be good, since they are versatible to adapt to a different folder name.
  17. Consider joining some community efforts, e.g., writing a package for VMD, MDAnalysis, ChimeraX etc, to get to know people in the community, get to know how to construct one’s own package, to contribute to the community, to improve the coding skills (Python, GPU, Javascript etc).
  18. Know that we have the ability to master everything. One thing I learned at Merck CSC internship is that we can virtually learn every computation techniques. The process of doing research and getting a PhD is exactly about learning the skill of learning. Once we reach that stage, things later are just going to be much easier. And we will have the capability and time to consider the next important question.
  19. The next important question: what is really difficult? Obviously data analyses are not that difficult. And for sure, codings we can always learn bit by bit. The most difficult thing in my eyes right now is to find the beauty and arts in science. That is a wonderful and eternal topic. That is why we need to engage arts and have creativity.
  20. Recently I read on Twitter that how one feels stupid when he writes those unit tests, but so eager to have unit test when he tries out someone else’s code. So it is really a good practice to build unit test. Currently this repository does not have unit test yet.
  21. My past code and jupyter notebook code can be improved by having more annotations or explaining how the code works. It is currently a bit difficult to figure out some coding.
  22. Science is all about endurance and resilience. Writing up a draft of the manuscript is not easy. However, it is more difficult when receiving feedbacks and has tons of places to modify. But we have overcome lots of difficulties along the way, technique-wise or human-wise. So let us hold on to it and keep perfecting and polishing it.
  23. One of the most beautiful feelings, when I am doing science, is getting help from others and helping others. The other is in-depth conversations. It creates such fuzzy and warm feelings, that feels so good.

--

--