Cyber Defense Advisors

Exploiting Mistyped URLs

@Levi B.

“Those who are not familiar with the term “bit-squatting” should look that up”

Are you sure you want to go down that rabbit hole?

It’s an instant of a general class of problems that are never going to go away.

And why in

“Web servers would usually have error-correcting (ECC) memory, in which case they’re unlikely to create such links themselves.”

The key word is “unlikely” or more formally “low probability”.

Because it’s down to the fundamentals of the universe and the failings of logic and reason as we formally use them. Which in turn has been why since at least as early as the ancient Greeks through to 20th Century, some of those thinking about it in it’s various guises have gone mad and some committed suicide.

To understand why you need to understand why things like “Error Correcting Codes”(ECC) will never by 100% effective and deterministic encryption systems especially stream ciphers will always be vulnerable.

And why it also relates to the post @slashed zero has just made over on the current Friday Squid Page,

https://www.schneier.com/blog/archives/2024/06/friday-squid-blogging-squid-catch-quotas-in-peru.html/#comment-438257

Which is about why something like 12% of what are easily preventable medical related deaths are actually nothing what so ever to do with medicine but due to fundamental “information issues” that appear to be easily solvable thus avoidable but are actually not.

And is also in part why

“Computers can count but can not do mathematics, and never will be able to in a finite universe.”

And it was an issue sufficiently well known to Georg Cantor, Alan Turing, Kurt Gödel, Claude Shannon, John von Neumann, and others, which led upto the birth of information theory at the begining of the 1960’s.

And it’s still causing issues today with why LLM and ML AI hallucinates and why Roger Penrose has been given a lot of undeserved criticism.

Enough of a build up?

There is an old riddle that actually shows the problem tangentially

“If a rooster lays an egg on a church steeple which way will it fall?”

The obvious and incorrect answer is “roosters don’t lay eggs”. It’s incorrect because you have to alow for the failing of Juvenal’s “black swan metaphor”. So the actual answer has to be based on reasoning about “if such an egg existed”, and the answer is “we can not know” because it’s “undecidable”.

To see why this applies, lets start with a system even simpler than ECC systems. That of the “error detecting but not correcting” system called “parity checking”. When all is said and done the parity check bit is

“The least significant bit of a binary count of the defined bit states in a finite data set.”

So if you count say all the set bits in your data set the result will be either “odd or even”. And depending on how you set it up an odd count could indicate an error has been found.

That is we know at least one bit has been flipped, but what if two bits have been flipped? Then the count is even indicating no error found, which is incorrect. So “parity” only detects some bit flip errors and allows others to pass undetected.

The price you pay for this partial error detection is half the range of values your data set could potentially hold. So “there is a trade” in that error detection takes significant information bandwidth for an imperfect result.

But it actually gets worse, what if the bit flip errors are in the error detection bits rather than the data bits? That gives the opposite type of error in that the actual data is correct but the check code is incorrect and the correct data is rejected as opposed to incorrect data accepted.

No matter what you do all error checking systems have both false positive and false negative results. All you can do is tailor the system to that of the more probable errors.

But there are other underlying issues, bit flips happen in memory by deterministic processes that apparently happen by chance. Back in the early 1970’s when putting computers into space became a reality it was known that computers were effected by radiation. Initially it was assumed it had to be of sufficient energy to be ‘ionizing’ but later any EM radiation such as the antenna of a hand held two way radio would do with low energy CMOS chips.

This was due to metastability. In practice the logic gates we use are very high gain analog amplifiers that are designed to “crash into the rails”. Some logic such as ECL was actually kept linear to get speed advantages but these days it’s all a bit murky.

The point is as the level at a simple logic gate input changes it goes through a transition region where the relationship between the gate input and output is indeterminate. Thus an inverter in effect might or might not invert or even oscillate with the input in the transition zone.

I won’t go into the reasons behind it but it’s down to two basic issues. Firstly the universe is full of noise, secondly it’s full of quantum effects. The two can be difficult to differentiate in even very long term measurements and engineers tend to try to lump it all under a first approximation of a Gaussian distribution as “Addative White Gaussian Noise”(AWGN) that has nice properties such as averaging predictably to zero with time and “the root of the mean squared”. However the universe tends not to play that way when you get up close, so instead “Phase Noise in a measurement window” is often used with Allan Deviation

https://www.phidgets.com/docs/Allan_Deviation_Guide

The important point to note is “measurement window” it tells you there are things you can not know because they happen to fast (high frequency noise) and likewise because they happen to slowly (low frequency noise). But what it does not indicate is what the noise amplitude trend is at any given time or if it’s predictable, chaotic, or random. There are things we can not know because they are unpredictable or beyond or ability to measure.

But also beyond a deterministic system to calculate.

Computers only know “natural numbers” or “unsigned integers” within a finite range. Everything else is approximated or as others would say “faked”. Between every natural number there are other numbers some can be found as ratios of natural numbers and others can not. What drove philosophers and mathematicians mad was the realisation of the likes of “root two”, pi and that there was an infinity of such numbers we could never know. Another issue was the spaces caused by integer multiplication the smaller all the integers the smaller the spaces between the multiples. Eventually it was realised that there was an advantage to this in that it scaled. The result in computers is floating point numbers. They work well for many things but not with addition and subtraction of small values with large values.

As has been mentioned LLM’s are in reality no different from “Digital Signal Processing”(DSP) systems in their fundamental algorithms. One of which is “Multiply and ADd”(MAD) using integers. These have issues in that values disappear or can not be calculated. With continuous signals they can be integrated in with little distortion. In LLM’s they can cause errors that are part of what has been called “Hallucinations”. That is where something with meaning to a human such as the name of a Pokemon trading card character “Solidgoldmagikarp” gets mapped to an entirely unrelated word “distribute”, thus mayhem resulted on GPT-3.5 and much hilarity once widely known.

But as noted these problems cause avoidable deaths in the medical setting many are easily avoidable. They happen because information gets hidden from view by more old fashiond AI “Expert Systems”. The result either wrong intervention choices or inaction results and death follows.

This is a problem I’ve had some involvement in from back in the 1980’s through untill more recently, and their is a whole department involved with it at Queen Mary’s Uni in East London. Sadly there is no one answer, and as such it’s an unsolvable problem with a deterministic system of the types we currently have. The issue starts with the “User Interface”(UI) equipment screens have limited area and no visible depth unlike the old fashioned “whiteboards” and “medical files”. So the medical systems can not display all information so selection choices have to be made and it’s that which kills, plain and simple.

One obvious one is “most recent first” as a selection criteria. But is that recent when a test was requested, started, or results come back? A critical test might not be displayed because a half dozen simple observations or other tests have happened since and the system displays them in preference. Even if flagged in some way there are limits due to what the UI can do, all basic selection criteria have this issue and current technology is not going to change that. In fact any changes you make are likely to make the problem worse…

But there are other selection processes. One is the electronic “British National Formulary”(BNF) and “National Institute for Clinical Excellence”(NICE) guidelines, both are seen as “Bibles” to be obeyed without question. But which takes president and why? The answer is complicated and the results can kill people such as issues to do with iron supplements, anticoagulants, antibiotics, PPI’s and NSAID’s.

Then the “Expert Systems” are rule based systems based on “current” presented conditions. They are never up-to-date, never compleat, and once walking a path are difficult to get to change. Anyone who has experienced the newer Internet search engines like those from Microsoft and Google will have a feel for this issue.

Unofficially something like 12% of avoidable deaths in a medical setting are down to “Small UI” issues…