1 of 17

22nd February 2022

Open Source Software Vignettes

Cre

Vincent Fazio &�Ling Bo Jiang

AuScope AVRESenior Engineers

CSIRO Minerals

vincent.fazio@csiro.au

ARDC is enabled by NCRIS.

2 of 17

1. Background

2

What kind of software are we talking about?

Why was it made? �Who uses the software?

Were there any technical problems? Maintenance issues? Was the library originally part of another project? Was it difficult to split out?

We wanted to do an ARDC Tech talk about the pros and cons of creating and managing open source software. ��At CSIRO, we have some quite diverse examples within our AuScope projects. We hope to answer questions such as:

What were the solutions to the technical problems?��What lessons did we learn?

3 of 17

1.1 NVCL_KIT: Key Business Use Case.

3

What is it?

NVCL_KIT

A Python module used to read Australian NVCL borehole data

NVCL_KIT started as a small independent library within the AuScope Geomodels Portal�

Used to display boreholes and their hyperspectral datasets

We thought it would be a good idea for NVCL users to have a Python library to fetch and display NVCL datasets

This is becoming an increasingly common user requirement �

NVCL_KIT has became more complex and broader in scope

NVCL_KIT�https://pypi.org/project/nvcl-kit/

AuScope NVCL�https://www.auscope.org.au/nvcl

Jupyter Notebooks

https://jupyter.org/

AuScope Geomodels Portal

http://geomodels.auscope.org.au

AURIN Transition

https://aurin.org.au/about-aurin/aurin-transition/

How did it arise?

Geomodels website, NVCL Reporting software

Researchers from CSIRO minerals, various geological surveys and industry

User base?

4 of 17

1.2 NVCL_KIT: Technical Implementation Challenges.

4

The potential users turned out to be quite diverse in coding ability�

People did demand new features which I could not anticipate

Python isn’t complicated

The source code was quite independent from Geomodels

The standard Python module and package format is straightforward

Create a README, setup file, …

Upload to ‘pypi’ package server …

But people are

There are many names for the various NVCL dataset types

Other complexities

5 of 17

1.3 NVCL_KIT: Solution Technology or Architecture.

5

All APIs return Python “SimpleNamespace” objects which are easier to manipulate and maintain than specialised classes

Be prepared to write code that makes life easier for your users, even if it does not add any additional functionality to your API

All things to all people

Write two APIs

  • one simple but more restricted one with helper functions and Python loop generators
  • a more low-level one which can retrieve everything

Write Python Jupyter notebooks with examples�Hold workshops

The names of the APIs have to be carefully thought through before implementation

�Balance between being general to allow for future expansion and being specific enough to avoid ambiguity

Supporting links and references.

Keep it simple

6 of 17

2.1 NVCLAnalyticalServices: Key Business Use Case.

6

What is it?

An analytical services platform for AuScope NVCL

NVCLAnalyticalServices is a library for NVCL boreholes batch analysis.�

It is a part of the AuScope Portal project.

TSG “The Spectral Geologist” desktop software is not designed to analyse multiple boreholes�

NVCLAnalyticalServices was invented

to do batch processing to produce new scalars, analysis results, and visualization.

NVCLAnalyticalServices�https://github.com/AuScope/NVCLAnalyticalServices

AuScope Portal

http://portal.auscope.org.au/

TSG

https://research.csiro.au/thespectralgeologist/tsg/the-tsg-screens/

How did it arise?

Researchers from CSIRO minerals, state geological surveys and industry via AuScope and geological survey portals

User base?

7 of 17

2.2 NVCLAnalyticalServices: Technical Implementation Challenges.

7

Key Challenges

Big data sets dispersed all over Australia (>4500 boreholes of hyperspectral data) A single threaded full scope query might take days.

Simultaneous requests for data from website

Need to incorporate TSGMod engine (Windows C library)

Optimisation (multi-threading jobs and cache)

Automation requirement from website interface

Maintaining backwards compatibility and interoperability is critically important because of staggered upgrade schedule

8 of 17

2.3 NVCLAnalyticalServices: Solution Technology or Architecture.

8

Response

Scalability design (each state survey hosts their own raw data)

Restful API for backend call

Multithreading and cache for performance optimization.

Ported C Windows library to Linux platform

Use JMS for Job Queue

JNI for bridging C library to Java

Learnings

A well-designed architecture is the key to project success

API designed with backward compatibility and interoperability in mind

9 of 17

3.1 portal-core: Key Business Use Case.

9

What is it ?

A Java library that was shared between Geoscience Australia (GA) and AuScope��Implements various back end web services�

Uses Maven to build and deploy the library

Historically the GA’s AUSGIN Geoscience Portal started as a fork of the AuScope Portal

AuScope and GA software developers

How did it arise?

User base?

10 of 17

3.2 portal-core: Technical Implementation Challenges.

10

Easy to split out?

Dependencies aside, Java built with Maven is not difficult to split out into a new library

Each time a new function was added it had to be carefully considered whether to put it in the main AuScope Portal library or in the shared portal-core library

Don’t want to foist on other users unwanted code and create bloatware�

No. But there was a cost to maintain effective communication between two distinct development groups with different purposes

It is less work when there is only one head chef in the kitchen

People problems?

When to share?

11 of 17

3.3 portal-core: Solution Technology or Architecture.

11

Communication

Required regular meetings with all parties present

  • Approve pull requests before they were merged
  • Negotiate any future changes that might affect others�e.g.� deprecating APIs� augmenting APIs� make sure improvements are not duplicated

Annual workshop meetings also enabled us to discuss future plans and strategies

As a rule, if a new function was general in nature or the first of its kind then it was shared

Sharing

12 of 17

4.1 portal-core-ui-app: Key Business Use Case.

12

Angular NPM typescript library used for various website user interface services

Necessary to share some code between AuScope web portals: VGL and AuScope Portal

Mostly AuScope developers, also published in NPM

What is it ?

How did it arise?

User base?

13 of 17

4.2 portal-core-ui-app: Technical Implementation Challenges.

13

Comparatively difficult

We found Angular (NPM) libraries are less easy to create

Complex directory structure, there is some restructuring involved when you move your source code out into a library

If you modify the library, you will have to rebuild both the library and the main application

File Structure

https://angular.io/guide/file-structure

More maintenance required

Hand curation of APIs

  • Need to update files each time for new APIs

e.g. Add APIs to module export file

14 of 17

4.3 portal-core-ui-app: Solution Technology or Architecture.

14

Angular helps out

Angular has commands to create the initial skeleton of the library

Angular supports automatic recompilation via the ‘—watch’ command line flag, your library is automatically rebuilt and the main application rebuild is also triggered

15 of 17

5.1 Lessons Learned.

15

General

Open source development comes in many shapes and sizes, it is not “one size fits all”

If anything, successful open source software requires increased attention to design, development discipline and policy strategy�

There will also be additional costs involved with maintenance if you are actively sharing with other groups

There is always a cost involved with splitting out your new library from existing software, be aware of internal dependencies

16 of 17

5.2 Lessons Learned.

16

Your solutions and strategies will depend on

  1. What kind of services the library aims to provide
  2. The kind of user base
  3. The underlying software technology

General

Pertinent issues

  • Naming your APIs
  • Deprecating and augmenting your APIs
  • What functions to include and exclude
  • Balancing between generality and specificity

Of course, adequate testing and documentation are important

Sunsetting and deprecation phases

When to deprecate? Advice from Java/Oracle:

  • It is insecure, buggy, or highly inefficient
  • It is going away in a future release
  • It encourages bad coding practices�

Deprecation

Finally

17 of 17

Thank you!

ARDC is enabled by NCRIS.