Hooray. CRAN now again has the most up to date version of anticlust, and it will remain there without issues (at least for now given the current status of technology). The change log for anticlust version 0.8.5 is found here.
This post documents my journey of keeping anticlust
on
CRAN while fighting a hard to reproduce error on a particular CRAN
testing machine: The M1Mac. Errors on this machine are not shown in the
standard results
page, but instead occur as an “additional issue”.
Last year I submitted version 0.8.0 to
CRAN. I was quite proud of this release, honestly. It made
anticlustering vastly more performant with large data sets, by fixing
issues with anticlustering()
for large data sets and
resurrecting fast_anticlustering()
from the dead. Due to
the nature of the problems this version dealt with (speed!), most new
code was in the C code base rather than on the R side. This means
potential trouble because C code can break a lot of things.
So, a few days after submitting the improved anticlust
version to CRAN, I received message that the automated unit tests
produced an error on the M1Mac testing machine on CRAN. I was given two
weeks time for fixing the issue to ensure that anticlust
is
not removed from CRAN. The error that occurred was a segfault,
which is kind of a worst case scenario. This means that some
anticlust
code tried to access memory that it should not.
This usually cannot happen with R code but can easily happen with
(faulty) C code. A problem for me was that the “additional issue” did
not show which function actually produced the error and therefore I
could only guess.
My fixes to
anticlustering()
, which were a big part of the 0.8.0
update, were primarily concerned with changing memory access to allow it
to allocate larger chunks of memory when processing very large data
sets. So of course, I assumed these changes were the culprit. Because I
did not really have time to fix the issue (I was in paternal leave due
to the birth of our second child) I quickly generated a version of
anticlust
that reversed these changes. To my surprise, the
issue persisted. Again, due to a lack of time, I simply uploaded the
previous version (0.7.0; now called 0.8.1 because CRAN requires that
version numbers increase), which removed the M1Mac issue.
Up until last week, this was the status quo on CRAN. Hence, CRAN
still had the slow R version of fast_anticlustering()
,
which was quite annoying to me, but I did not have much motivation for
fixing stuff—until last week, when I “just did it” and submitted version
0.8.3.
To my dismay, the additional issue on M1Mac reappeared after I
uploaded version 0.8.3, again without information regarding the source
of the error. My hypothesis was that fast_anticlustering()
had to produce the error because it was the only C code that changed
between versions—even though fast_anticlustering()
does not
really mess with any memory allocation. But who knows what might happen
with C code (on a Mac even)…
After submitting an unsuccessful update, the responsible CRAN maintainer was compliant with my request for debugging output, which was a huge relieve for me. So I finally got to know which function caused the segfault. It turns out it was not my own C code that was responsible! Instead, it was generated by the Rsymphony package, which I use as the default solver package for optimal anticlustering algorithms. I actually changed this in a recent version because Symphony is faster than the previous default solver, the GNU linear programming kit. But: I think this was already in version 0.7.0 and I still do not understand why this version did not generate the error. Well, I guess it doesn’t matter now since the issue is fixed. By now, the results page includes output which identifies the source of the error, which is a good thing and may help other developers (or myself) in the future.
Knowing which function produced the error, I could fix the CRAN problem as follows:
optimal_dispersion()
,
optimal_anticlustering()
and
balanced_clustering()
).It is a relieve to me that CRAN now has the most recent
anticlust
developments and I can now focus on the future
instead of the past (when it comes to anticlust
).
Last updated: 2024-05-07