Third conference of Research Software Engineers

This year saw the Third Conference of Research Software Engineers (RSEs). It was held at the University of Brimingham on the 3rd and 4th of September. The conference was sold out with an attendance of 314 delegates from twelve countries. The majority of the delegates were from the UK representing a mix of academic institutions and companies. As previously the format was two days with keynote talks and parallel tracks of talks and workshops. This year there was an exhibition hall where the sponsors had stands.

Day 1

The conference programme kicked off with two keynote talks. The first was from Eleanor Robson of UCL, titled "Nammu and Oracc: Digital Humanities Software in the Sustainable Development of Iraqi History and Heritage". This was interesting in that the humanities are often underrepresented in these sorts of events. Eleanor who is an expert in cuneiform writing described the evolution of their Nammu editor, the open corpus of cuneiform Oracc, and the benefits of working with the RSE group at UCL.

The next keynote was by Andrew Fitzgibbon of Microsoft Research, one of the sponsors, titled "Building Computer Vision Systems That Really Work". Andy is an entertaining speaker and has a long CV of interesting research that lead to products such as the Kinect and Hololens. Unfortunately he didn't have time to finish all of his talk, if the end was as good as the bits we did see, we would have been in for a treat (as he would say).

After lunch the conference split in to parallel sessions. I went to the one with the theme of software engineering. Tobias Schlauch from DLR followed up a talk from last year with "Software Engineering Guidelines – From Theory to Practice". The guidelines (in English) are found at http://doi.org/10.5281/zenodo.1344612. These provide checklists for evaluating the maturity of a software project and guidance on which to apply depending on the scale and distribution of the software. One comment from the audience is that a similar/complementary set of guidelines are NASA's Reuse Readiness Levels . Benjamin Mort from Oxford's OERC spoke on "SIP: Prototyping the Science Data Processing for the Worlds Largest Radio Telescope" the telescope is of course the Square Kilometer Array. Benjamin's talk interested me from an architecture point of view and when the slides are available I would like to take a closer look. For continuous integration (CI) they are using Travis. All the info can be found in the SKA book. Dominik Jochym from STFC talked about CASTEP in "A (Long) Tale of Academic Software Development ". The name initially stood for Cambridge Serial Total Energy Package, it is apparently "none of those things now". For CI, after six years of experience with BuildBot they have moved to an external (to the CASTEP group) service called ANVIL built on Jenkins. The final talk in the session was Alex Meakins from UK AEA on "Modernising Data Access for the JET Tokamak". The JET data is in a proprietry in-house data format. This work is for an abstraction layer that looks a bit like HDF5 and allows them to change the underlying format without disturbing the consumers of the data.

The next session provided a dilemma with three of my Bristol RSE colleagues presenting workshops at the same time. Christopher Woods and Leter Hedges presented "Building and Deploying Custom JupyterHub Images Using Docker and Kubernetes to Run Workshops in the Cloud" and Matt Williams "Make Testing Easy with pytest". In order to avoid playing favourites I went to neither and instead attended the workshop by Jim Cownie on "Getting More Python Performance with Intel Optimized Distribution for Python". There were notebooks provided on the conference virtual machine image to work through that showed simply replacing the standard python with the free (as in beer) Intel one you can get a 47 times speedup on some problems.

🐳 #docker and #kubernetes for #jupyterhub instances workshop now at #RSE18... Or rather the "inception workshop" as @chryswoods prefers referring to it 🤯 pic.twitter.com/cmiB3OfvC4
— ✨Tania Allard 💀🇲🇽 🇬🇧 she/her (@ixek) September 3, 2018

Earlier in the day Christopher Woods had announced the work on the formation of the Society of Research Software Engineering which will place the organisation on a better footing as a charitable organisation independent of any particular university's procurement policies. At the end of the second day we had the last of the informal AGM's under the association's old structure. This was followed by a feedback and requirements gathering exercise for what the members.

Afterwards was the conference dinner.

The Bristol RSE Group about to enjoy dinner at #RSE18. All of our workshops today went really well and we are looking forward to Andrew’s talk tomorrow :-) pic.twitter.com/Z838WM7T4W
— BristolRSE (@BristolRSE) September 3, 2018

The University of Bristol research software engineers we well represented at the conference , here are most of us at the dinner. Not pictured Chris Edsall. (Photo: Christopher Woods)

Day 2

The first of the morning's keynotes was from James Howison of the University of Texas "Challenges and Pathways to Sustainability in Scientific Software Ecosystems". This dug in to the definition of software maintenance which is often thought of as one thing but in fact can be split in to three distinct activities, he also examined where the value creating activities take place and resource flows occur in the context of FOSS, commercial and grant funded software development.

The sponsor keynote was by from DeepMind (Google)'s Andreas Fidjeland "How Machine Learning and AI are Transforming Research". This covered their well known research in to deep learning approaches to the game of Go with AlphaGo and AlphaGo Zero. As I was one of the (millions) of people who watched the games against Lee Sedol live on YouTube I was interested to hear the story from one of the people involved.

After morning tea I went to the session themed "Techniques and Technologies". First up was Pashmina Cameron from Microsoft Research whose talk "Fast Code with Just Enough Effort" gave useful strategies for speeding up C++ code. Another talk to get the slides from once they are out. Julia Damerow of Arizona State University's talk "Using Apache Kafka to Build a Modular Text Extraction Platform" described a Pub/Sub based architecture they had developed for researchers to extract embedded text from and OCR PDF documents for researchers who aren't au fait with the command line. This was very relevant to me as I have a user with a similar requirement. I talked to Julia afterwards about this. Finally, Corentin Schreiber from the University of Oxford delivered "Easy, Fast, and Robust Data Analysis with Modern C++" which was essentially a sales pitch to encourage people who had tried C++ and were put off. The message he was giving is that the language is much better now and the modern features make it much easier to use.

Day two of #RSE18 and plenty still to come. For starters, Catherine Jones @SciComp_STFC is talking about the unpopular task of shutting services down and is also taking part in a workshop exploring if Implicit Bias affects the careers of women RSEs. pic.twitter.com/wBWV30g2rO
— CoSeC (@CoSeC_community) September 4, 2018

Poster presentation of @biosimspace by Lester Hedges at #RSE18 pic.twitter.com/OsUC45duyB
— Antonia Mey (@ppxasjsm) September 4, 2018

After lunch I went to the workshop on Julia given by one of the main developers Valentin Churavy. Valentin promises "The ease of use of python with the speed of no less than half that of C". If you want to try it out without installing anything you can use https://juliabox.com/. Since we were targeting the just released version 1.0.0 we used https://staging.juliabox.com/. I found a reproducible crashing bug which I reported: https://github.com/JuliaLang/julia/issues/29064. We have some users of Julia at Bristol and with the milestone of a 1.0 release I would expect more interest.

For the last session of the conference I went to the one themed "Scalable Computing". Andy Turner from EPCC highlighted the work or the RSE's and challenges of managing a group distributed across many sites in "The DiRAC Distributed RSE Group: Software from the Smallest to the Largest Scales". Matthew Hartley from the plant and microbial research institute the John Innes Centre gave an entertaining talk titled "So, We Can Toss the Cluster in a Skip, Right?” describing their experiences with moving some of their computing and data to the Microsoft Azure Cloud. Matthew Archer from Cambridge gave a talk about large scale machine learning but I thought the presentation at IPXUG on the same work was better. Finally Ardita Shkurti from STFC described their experiences of using the AWS cloud.

Bonus Day 3: Tier 2 RSE Meeting

A couple of communities took the opportunity of having a large number of RSEs in the one place at the same time to hold add-on meetings. One of these was the Citation Format Hack Day, the other, which I attended, was the Tier 2 / Regional HPC Workshop. The programme was organised by James Grant who is the RSE at the University of Bath who is associated with the GW4 Tier 2 system Isambard. (For those unfamiliar with the tier-n nomenclature, the Branscomb pyramid defines tier-0 as pan-national services, tier-1 as national resources like ARCHER, and tier-2 as regional level services.)

Andy Turner (associated with Cirrus as well as EPCC) described the effort to get reproducible benchmarks on HPC systems. The repos is at https://github.com/hpc-uk/archer-benchmarks (the build instructions aren't included but can be found at https://github.com/hpc-uk/build-instructions).

Jon Gibson from NAG told us about the POP Centre of Excellence who provide performance optimisation service for free. This was funded by the EU under the Horizons 2020 program. While the original funding has ended there will be a new funding round for three more years 2018 - 2021.

Mark Dawson who leads the RSE group at Swansea gave an overview of Supercomputing Wales - Uwchgyfrifiadura Cymru (a follow-on to the HPC Wales project) with a focus on the RSEs spread througout the country. They have now filled all thirteen posts.

James Grant gave a review of the Isambard hackathons and documentation that have taken place so far. Since I have been involved in all of these there wasn't much news for me, but the approach might be of interest to other tier-2 sites that want to develop closer links amongst their RSEs. Phil Ridley from ARM gave a talk with hints and tips on porting to the ARM architecture. This was good for me to see as I unavoidably had to miss it when he gave the same talk in Bath six weeks ago.

Alan Simpson who ran the ARCHER Champions program facilitated a session with a presentation about how that was run. Now that the funding has run out he is looking to see how the community would like to take the program forward, perhaps expanding it to "HPC Champions", he is not wedded to the name, we could call it anything. He wants it to be as inclusive as possible. If you have an interest in HPC you can join. During the session he made use of Slido which he had seen used earlier in the conference to solicit instant online feedback from the group.

For the rest of the afternoon we split in to groups to discuss the proposed topics and feed back to a plenary session. I chaired a group discussing inclusivity.

A small number of us joined the other workshop for a "networking session" at the university staff club.

Summary

Research Software Engineering is a large and growing profession not just in the UK but also across the world.
The RSE conference is a valuable opportunity to build community, network and learn new techniques.
The University of Bristol should continue to support the involvement of it's RSEs in organising, volunteering and presenting at and attending the conference.

[ This blog post will be updated when the slides are available. ]