raku to the .max

This post is inspired by a recent, fascinating discussion on the raku-beginner Discord channel starting with this message. Thanks to the other folks for helping me to get to a better understanding and apologies if my immediate responses were stupid // wrong. This post will hopefully give a bit more space for elaboration.

Before I wrote this post myself, I tried to get ChatGPT to do it for me …

Raku is a programming language that acts as if there was no distinction between numbers and strings, but it’s there behind the scenes internally. Number 2 and string “2” are 99% same, except when they aren’t. There are no “numbers” or “strings”. It’s just scalars, and they convert automatically. But it’s not so simple. 2 and 2.0 are the same thing (as numbers), but “2” and “2.0” are completely different (as strings).

I hope this helps!
ChatGPT – Please write a 2000 word blog post on how raku seamlessly combines numbers and strings

Well that’s a good start, eh? Although not quite 2000 words.

“HUH?”

Our #raku-beginner thread started with a “huh?” moment:

> my Int @i = [];              ... an empty Array of Ints
> say [max] @i;    # -Inf      ... how can a max be minus infinity?

> (-Inf).^name     # Num       ... specifically -Inf is a Num (aka a float)

> my Str @s = [];              ... an empty Array of Strs
> say [max] @s;    # -Inf      ... even worse a Str is now a number

I’ve adjusted this a little for clarity.

We need to explain wtf raku is doing here to incomers from less tolerant languages. If this has got you wondering, read on.

Untyped Context

One helpful clarification coming from our Discord chat:- this post is mainly about raku in untyped context (there will be some words at the end about how this stuff can be gradually controlled with raku types).

Consider this…

> 1 + "2"            #3          ...add an Int to a Str
> 1 ~ "2"            #12         ...concatenate an Int with a Str
> 1 cmp "2"          #Less       ...cmp an Int with a Str
> 1 cmp "a"          #Less       
> (1, "2").sort      #(1,2)      ...sort a List containing Int & Str
> (1, "a").sort      #(1 a)

Raku does its best to do useful things even if you mix types such as numbers and strings. The whole point of untyped context is to do operations between different types.

A typical use case would be reading data in from a .csv file … where number and string format are not well defined and we want to do operations such as sorting on a column.

Some ideas in play here are:

# Arithmetic operations automatically convert strings to numbers...

> "2"            #Str        
> 1 + "2"        #3          ... e.g. +-/* math operators
> + "2"          #Int        ... prefix:<+> is shorthand for .Numerical

# ... and string operations convert numbers to strings

> 1              #Int
> 1 ~ "2"        #"12"       ... string concatenation
> ~ 1            #Str        ... prefix:<~> is shorthand for .Stringy

Smart Building Blocks

In Raku, where possible, language features reuse lower level building blocks.

Smart comparison, cmp does either <=> or leg, depending on the existing type of its arguments

- leg forces string context for the comparison

- <=> forces numeric context for the comparison

cmp returns a type object Order::Less, Order::Same, Order::More

cmp will first try to compare operands as strings (via coercion to Stringy), and, failing that, will try to compare numerically via the <=> operator or any other type-appropriate comparison operator. 

Raku sort sorts the list, smallest element first. By default infix:<cmp> is used for comparing list elements.

In this spirit, sort is built on cmp, cmp is built on leg and <=> and these are built on type coercion with .Numeric and .Stringy methods. As we will see shortly, min and max also employ the same cmp logic.

This modular design can have some quirks and corner cases – but the basic idea is DRY (Do not Repeat Yourself) a familiar principle of all coding.

Operator Identity

Another idea in play here is operator identity. In general, infix operators can be applied to a single or no element without yielding an error, generally in the context of a reduce operation. Again, Raku is trying it’s best to deliver a valid result.

say [+] ()        #0

The design documents specify that this should return an identity value, and that an identity value must be specified for every operator. In general, the identity element returned should be intuitive. However, here is a table that specifies how it is defined for operator classes in Raku, which corresponds to the table in the above definition in the types and operators defined by the language:

Operator class	Identity value
Equality	True
Arithmetic +	0
Arithmetic *	1
Comparison	True
Bitwise	0
Stringy	”
Sets	Empty set or equivalent
Or-like Bool	False
And-like Bool	True

https://docs.raku.org/language/operators#Identity

Some real examples bring this to life:

say [+] (2,3);   #5       2 + 3
say [+] (2);     #2       2 + 0
say [+] ();      #0       0 is the identity for '+'

say [*] (2,3);   #6       2 * 3
say [*] (2);     #2       2 * 1
say [*] ();      #1       1 is the identity for '*'

I think of the identity as “what’s the default argument that gives the right answer”

Identity for min & max

Now we can start to see what was going on at the start… from the docs:

max returns the largest of the arguments, as determined by cmp semantics.

say [max] (2,3);  #3       2 max 3 (cmp (<=>) return largest)
say [max] (2);    #2       2 max -Inf
say [max] ();     #-Inf    -Inf is the identity for 'max'

So -Inf (minus infinity) is the identity for the max operator. It is the Raku way to say “what is the smallest possible thing”. That way anything else compared to -Inf will be returned as the largest.

Similarly +Inf is the identity for the min operator.

The Smallest Possible Number

+/-Inf is the Raku way to represent the IEEE 754 floating point standard infinity value. You can also write the ∞ unicode symbol.

IEEE 754 requires infinities to be handled in a reasonable way, such as

(+∞) + (+7) = (+∞)

(+∞) × (−2) = (−∞)

(+∞) × 0 = NaN – there is no meaningful thing to do

https://en.wikipedia.org/wiki/IEEE_754#Infinities

This is implemented by the Floating Point Unit (FPU) part of your CPU and, since it is a hardware concept, it is super fast and is the natural way for a computer to represent the largest possible number (+Inf) or the smallest possible number (-Inf).

I imagine that Larry Wall must have smiled when he realised that this was the perfect choice value for the identity values of min and max operators.

Also, for numbers, in untyped context, Raku already has an automatic and efficient way to walk up the set of built in number types from integers (Ints) to rationals (Rats) to floating point (Nums).

[21:22]librasteve: the idea afaik is that as you get beyond the range of Rats then the efficient way for your machine to handle bigger numbers is Nums so there is graceful degradation of precision, but not of accuracy

[21:23]librasteve: then, if you run out of Nums you get to Inf

So while it is tempting to ask “why don’t we have a special value for the smallest possible Int?” that is asking in principle to have two kinds of infinities – one for Ints and one for Nums. And then raku would need to invent special values and code that repeats what the FPU does anyway – not just for Ints, but for Rats and FatRats and so on. So I think that Larry made a good design choice here and that this mixing of Ints and Nums is one of the neat things you can do in untyped context.

“HUH!” 2.0

Here’s our Huh example again, first the numbers (min is similar to max, of course):

> my Int @i = [];
> say [max] @i;    # -Inf      ... how can a max be minus infinity?

> (-Inf).^name     # Num       ... specifically -Inf is a Num (aka a float)

So, we have a chain of reasonable behaviours:

max picks the largest value from an Array
The Array can be all the same type (e.g. Int) or it can be untyped
It iterates over neighbours using cmp semantics
cmp uses <=> on Numeric types
If I ask it to max an empty Array, it will return -Inf (the smallest possible number)
-Inf is a Num, so it is possible for max to return a different type

Some subtle aspects are (i) that max returns a defined value (Num:D) — I think that in general Raku operations should return values and try to avoid returning Type Objects such as (Int) otherwise every piece of code would have to handle Type Object arguments explicitly and (ii) that this design helps functional programming and recursion, like this simple example:

say (().max , "honeybee").max;    #"honeybee"

Legs and Strings

leg is the Raku String three-way comparator. Short for less, equal or greater?. It coerces both arguments to Str and then does a lexicographic comparison.

say 'a' leg 'b';       # Less
say 'a' leg 'a';       # Same
say 'b' leg 'a';       # More

So sort works on Str values via cmp and then leg:

say <b c a>.sort;      # (a b c)

And, following the logic of our building blocks, max and min too:

say max <a b c>;        # c
say min <a b c>;        # a

leg is a very natural way to include a dictionary word sort into the Raku operation set

What happens when you mix numbers and strings in untyped context:

say 1 cmp 'a';          # Less

# under the hood, cmp first tries Numeric comparison <=>
say 1 <=> 'a';          # Cannot convert string to number ... 

# when that fails, cmp switches to String comparison leg
say 1 leg 'a';          # Less

# leg succeeds because it coerces both args to strings

And with max and min:

say 1 max 'a';          # a

So that’s neat … I can use untyped context to sort a mixed set of numbers and strings lexicographically and it will auto convert the numbers to Str as it goes.

The Smallest Possible Thing

When we were dealing only with numbers, the case was clear that -Inf is a good candidate for the smallest possible thing.

Now we have mixed numbers and strings, it is a bit odd to see -Inf come up in our HUH?

Nevertheless, I believe that -Inf is a good design choice for the smallest possible thing, why:

there is no good candidate for the smallest possible character – what is the letter before ‘a’ in the alphabet?
the smallest possible thing should be a valid Raku value of one of the types in our comparisons – just as -Inf is a Num, not an Int or a Rat
in untyped context it is better to have only one smallest possible value – as opposed to having a smallest possible number and a smallest possible string and then the caller have to handle both return values

I would agree with critics that say this outcome is “weird” … while it is a natural consequence of the Raku modular approach, it is an odd looking corner case that emerges from a consistent application of the building blocks. Hopefully this post is a start to clarifying, explaining and teaching newcomers.

“HUH!” 3.0

Here’s our Huh example again, now with the strings (min is similar to max, of course):

> my Str @s = [];              ... an empty Array of Strs
> say [max] @s;    # -Inf      ... even worse a Str is now a number

Finally, we have a very similar chain of reasonable behaviours:

max picks the largest value from an Array
The Array can be all the same type (e.g. Int, Str) or it can be untyped
It iterates over neighbours using cmp semantics
cmp uses <=> on Numeric types and leg on Stringy types
If I ask it to max an empty Array, it will return -Inf (the smallest possible number)
-Inf is a Num, so it is possible for max to return a different type

I think that this chain is logical and easy to learn and accepting that +/-Inf is a corner case is better overall than special casing largest / smallest values for each type.

Class Act

As mentioned at the beginning, Raku types can be gradually introduced to control the weirdness.

> my Str @s = []; 
> my Str $res = [max] @s;

# Type check failed in assignment to $res; expected Str but got Num (-Inf)

Each degree of string and number specialisation is represented in the raku class diagram – and so you can both gloss over the type differences in untyped context, or you can tighten the types progressively according to your problem domain.

So, you can use the IntStr allomorph here too that will catch just the empty list case:

> my @a = []; 
> my IntStr $res = [max] @a;

# Type check failed in assignment to $res; expected Str but got Num (-Inf)

As ever, comments are welcome!

~librasteve

4 Comments

Pingback: 2023.21 Maxcinating – Rakudo Weekly News
Pingback: Allomorphia – Raku::Journey
manny corpus says:

June 18, 2023 at 6:37 am

say [max] (); #0 -Inf is the identity for ‘max’

You mean #-Inf

“there is no good candidate for the smallest possible character – what is the letter before ‘a’ in the alphabet?”

Why the heck are you talking about characters and alphabets? The least possible string is “”.

But of course there is no greatest possible string, so there’s no reasonable identity for min that is a string … and since min and max are methods on Cool (from which Str and Int are derived), they need some identity that is a Cool, and -Inf/Inf fit the bill.

LikeLike

1. librasteve says:
  
  June 21, 2023 at 5:46 am
  
  Hi Manny
  
  I have fixed the error – thanks for pointing that out.
  
  Thanks for your feedback on the ‘smallest possible character’. You are right. I forgot that lexicographic works on Str of differing lengths and so I suppose the min possible Str is ” in the sense that it is the only zero length Str and that, just as “`’a’ leg ‘aa’“` is Less so “`” leg ‘a’“ is Less.
  
  You then make the case for -Inf/Inf for Str very neatly. 😉
  
  LikeLike

raku to the .max

“HUH?”

Untyped Context

Smart Building Blocks

Operator Identity

Identity for min & max

The Smallest Possible Number

“HUH!” 2.0

Legs and Strings

The Smallest Possible Thing

“HUH!” 3.0

Class Act

Published by librasteve

4 Comments

Leave a Comment Cancel reply

“HUH?”

Untyped Context

Smart Building Blocks

Operator Identity

Identity for min & max

The Smallest Possible Number

“HUH!” 2.0

Legs and Strings

The Smallest Possible Thing

“HUH!” 3.0

Class Act

Share this:

Related

Published by librasteve

4 Comments

Leave a Comment Cancel reply