To see the interactive versions of these charts, head over to our blog. If you’re inclined to share, here is the link: http://blog.plot.ly/post/104937857072/sharing-preserving-beautiful-graphs-with-your
We painstakingly craft beautiful, complex, important graphs. Then the data is lost on an old hard drive, desktop, spreadsheet, or email. We can’t reproduce experiments or build on research. It may cast doubt on the research. Plotly solves this problem, uniting your data, graphs, and code online. Read on to learn more about lost data and how to preserve data with our free cloud-based product or Plotly on-premise, on your servers.
Data Loss, Citations, & Access
Two recent studies looked at how hard it is to track down data from published research. The authors of “The Dawn of Open Access to Phylogenetic Data” examined the publishing journal’s impact, looking at the influence on their ability to access partial datasets (top two plots) and complete datasets (bottom two) from online archives (left two) and asking for data (right two). The shaded sections show a 95% confidence interval. They concluded:
Generally, studies published in journals with a higher impact factor are more likely to both deposit the corresponding (partial or complete) datasets in online archives and to provide those data upon direct request.
In another study, data was sought from 516 studies between 2 and 22 years old. They concluded that:
- The odds of a data set being reported as extant fell by 17% per year.
- Broken e-mails and obsolete storage devices were the main obstacles to data sharing.
Thus age can serve as a predictor for data loss. Older papers also get more citations; they have had more time to accumulate citations. Newer papers are more likely to archive data.
The authors of the study plotted above also ran regressions for each publication year individually. They noted a bump in citations for papers that shared data.
The citation benefit was greatest for data published in 2004 and 2005, at about 30%.
Data behind published studies should be available, especially if government funding went into the research. The plot below from figshare shows a trend towards open access publication in the Web of Science, a scientific citation indexing service.
We’re excited to publish graphs, data, and code together, but publishing research and data together isn’t a new idea. The Journal des sçavans first published research and data together in 1665. A 1914 academic report published this figure advocating publishing data with figures.
Instead of emailing authors for data, we can jointly publish figures, data, and code to reproduce our work. As a recent blog post by Jure Triglav notes,
[T]he problem of the way we create scientific charts and figures should simply be recognized. The mistakes of flattening each figure and compressing it, mangling the data, converting vector illustrations into raster images — all of those should be recognized and addressed.
Uniting & Preserving a Figure
Tools at various levels of the analysis, visualization, and publication workflow are enabling collaboration and tapping into the potential of the web.
<iframe width="710" height="750" frameborder="0" seamless="seamless" scrolling="no" src="https://plot.ly/~cimar/250.embed?width=710&height=750"></iframe>
Proprietary, paid, or complex software that only runs on a particular OS or browser can bottleneck data and reproducibility. In Plotly, it’s free and easy. If someone wants to edit the plot or data in Plotly, they can do so without downloading, installing, or paying. It’s all online (or on-premise if you use Plotly on your servers). Here are a few screenshots showing how it works.
How Plotly Lets You Harness The Web
Plotly is bringing powerful scientific and technical computing tools to the web. Our interoperable APIsfor Python, R, MATLAB, Julia, and Excel let you and your team collaborate, live-stream data, make plots with LaTeX, and craft 3D plots.
We can use LaTeX and add fractions, equations, and symbols into our plots by wrapping them in a $ sign. Here is a tutorial.
Our next plot shows a streaming 3D plot. Made in an IPython Notebook, the plot simulates a chaotic solution to the Lorenz system, also known as the Lorenz attractor or butterfly. You can stream new data to your plots every day, every minute, or every 50 ms. You and your team can see the same updated stream of data and graphs.
For more, see our collection of 3D plots.