anticlust is again up to date on CRAN

Hooray. CRAN now again has the most up to date version of anticlust, and it will remain there without issues (at least for now given the current status of technology). The change log for anticlust version 0.8.5 is found here.

This post documents my journey of keeping anticlust on CRAN while fighting a hard to reproduce error on a particular CRAN testing machine: The M1Mac. Errors on this machine are not shown in the standard results page, but instead occur as an “additional issue”.

Last year I submitted version 0.8.0 to CRAN. I was quite proud of this release, honestly. It made anticlustering vastly more performant with large data sets, by fixing issues with anticlustering() for large data sets and resurrecting fast_anticlustering() from the dead. Due to the nature of the problems this version dealt with (speed!), most new code was in the C code base rather than on the R side. This means potential trouble because C code can break a lot of things.

So, a few days after submitting the improved anticlust version to CRAN, I received message that the automated unit tests produced an error on the M1Mac testing machine on CRAN. I was given two weeks time for fixing the issue to ensure that anticlust is not removed from CRAN. The error that occurred was a segfault, which is kind of a worst case scenario. This means that some anticlust code tried to access memory that it should not. This usually cannot happen with R code but can easily happen with (faulty) C code. A problem for me was that the “additional issue” did not show which function actually produced the error and therefore I could only guess.

My fixes to anticlustering(), which were a big part of the 0.8.0 update, were primarily concerned with changing memory access to allow it to allocate larger chunks of memory when processing very large data sets. So of course, I assumed these changes were the culprit. Because I did not really have time to fix the issue (I was in paternal leave due to the birth of our second child) I quickly generated a version of anticlust that reversed these changes. To my surprise, the issue persisted. Again, due to a lack of time, I simply uploaded the previous version (0.7.0; now called 0.8.1 because CRAN requires that version numbers increase), which removed the M1Mac issue.

Up until last week, this was the status quo on CRAN. Hence, CRAN still had the slow R version of fast_anticlustering(), which was quite annoying to me, but I did not have much motivation for fixing stuff—until last week, when I “just did it” and submitted version 0.8.3.

To my dismay, the additional issue on M1Mac reappeared after I uploaded version 0.8.3, again without information regarding the source of the error. My hypothesis was that fast_anticlustering() had to produce the error because it was the only C code that changed between versions—even though fast_anticlustering() does not really mess with any memory allocation. But who knows what might happen with C code (on a Mac even)…

After submitting an unsuccessful update, the responsible CRAN maintainer was compliant with my request for debugging output, which was a huge relieve for me. So I finally got to know which function caused the segfault. It turns out it was not my own C code that was responsible! Instead, it was generated by the Rsymphony package, which I use as the default solver package for optimal anticlustering algorithms. I actually changed this in a recent version because Symphony is faster than the previous default solver, the GNU linear programming kit. But: I think this was already in version 0.7.0 and I still do not understand why this version did not generate the error. Well, I guess it doesn’t matter now since the issue is fixed. By now, the results page includes output which identifies the source of the error, which is a good thing and may help other developers (or myself) in the future.

Knowing which function produced the error, I could fix the CRAN problem as follows:

It is a relieve to me that CRAN now has the most recent anticlust developments and I can now focus on the future instead of the past (when it comes to anticlust).

Last updated: 2024-05-07