R: Pipeline to create reusable formatted tables
The problem does not seem too complicated at first: I want to show data, and to visualize them I want to show them in a nicely formatted table.
The real problems arise when I add just a few requirements such as
The table should be...
- generated from raw data (csv)
- formattable with (conditional) cell styles i.e. borders, back/forground colors, fonts
- look the same across different applications
- scalable without quality loss
I thought this could not be asked too much but I was proven wrong.
It's not the first time I stumbled across this problem and I know I'm by far not the only one. I didn't do this very often, yet, buy I'm very pleased with the result: I asked twitter for help and received a couple of very cools answers, approaches and package names, pointing in various directions.
No tables file format
Unfortunately, different that for images, there is no tables file format that could possibly combine data and layout and would be rendered the same in multiple applications. When I create a table I usually need it for presentation of scientific data. It will be reused in papers and presentations, on posters or on websites. It's incredibly frustrating if for each medium, the content has to either be re-generated or will be displayed differently without me having a chance to influence the result. So if possible, I would like to use the same table in
- HTML
- Latex
- Scribus
- Gimp
- Inkscape
Why not just use a spreadsheet?
The possibilities to generate and export tables when you search the internet are: make a spreadsheet, then export to... or copy and paste to...
I must admit here, that I am biased, as I hate spreadsheets (no matter if Excel or Libreoffice Calc). I hate receiving spreadsheets that are not well formatted and it's way too easy to have a malformed spreadsheet; This has multiple reasons that would be worth another blog post, but to summarize:
- They mess around with my data when I don't notice that cell's are automatically formatted
- It's rediculously hard to get displayed text values of a date-formatted collumns
- People working with spreadsheets often don't understand their own data
- It's easy to organize spreadsheets in a way so they are not exportable to csv (usable in external applications) without information loss, e.g.
- information coded as background color
- inconsistend data format in columns
- multiple tables and plots on one sheet (to explain the color codes)
- The created statistics and plots are horrible
Apart from my personal dislike against spreadsheets, the results I was able to produce are not very convincing; visually not appealing and fiddly to produce. The only reasonable export format is pdf was not rendered correctly in some applications.
The not perfect but good solution
CSS is a very convenient way to create reproducible and arbitrary nicely styled results. However, it's not possible to work with and import HTML files into most applications. SVG would be my preferred output format, but unfortunately I found very crude support of creating tables e.g. in Incscape to directly produce an SVG. So the compromise is to generate HTML output and convert to SVG.