TechyMagThings

Breaking

Thursday, 9 April 2026

April 09, 2026

USB, Abstracted

Modern technology builds on abstractions. Most application programmers today don’t know what a non-maskable interrupt is, nor should they have to. Even fewer understand register coloring or reservation stations for instruction scheduling, and fewer still can explain the physics behind the transistors in the CPU. Sometimes tech starts out where you need to know everything (programming a bare-metal microprocessor, for example) and then evolves to abstraction. That’s where [WerWolv] wants to get you for writing USB code using the recent post USB for Software Developers.

Many USB tutorials assume you want to know about the intricacies of protocol negotiation, information about the hardware layer, and that you are willing to write a Linux kernel module to provide a driver. But thanks to abstraction, none of this has been absolutely necessary for many use cases for a long time.

While the post focuses on Linux, there is libusb for Windows. We presume the same principles would apply, more or less.

Interestingly, the target device for the tutorial is an Android phone in bootloader mode. We thought that was strange at first, until we read the rationale. You can easily get your hands on an Android phone if you don’t already have one. The device is simple. Plus, it is unlikely you already have drivers installed on your system that would interfere with your tutorial driver. Makes sense.

After that, it is pretty straightforward to use libusb to find the phone, determine what you can do with it, and communicate with it. Sure, the phone’s “fastboot” protocol is simple, but that’s just like using a TCP socket. You may implement a fancy protocol on top of it, but that doesn’t mean sockets are hard to use.

We’ve looked at simplified USB drivers before. Of course, for some applications, you can bend a USB serial port to handle something a bit more complex.



April 09, 2026

[Kerry Wong] Finds SMD Test Clips

One of the many problems you run into when you work with SMD parts is trying to probe the little tiny pins. While we usually watch [Kerry Wong’s] videos for the oscilloscopes, it makes sense that he’d also be looking for probes. The video below shows some cheap probes from China that can clamp onto tiny QFP pins.

The probes look a little like tiny needles, but the needle part isn’t conductive. When you push them, very tiny and rigid clamps come out. On the other end is a pin that will take a female header or, of course, you could connect another test lead to that pin.

As an example, he shows a decidedly dirty Arduino Due and probes the CPU with the tiny probes. Off camera, he put two probes on adjacent pins on the QFP, and it worked just fine. Definitely something we will add to our toolbox.

The probes appear to work with pitches as small as 0.5mm, which covers many common situations. We’ve looked at oddball probes before. Or try making your own solutions.



April 09, 2026

Upgrading a MacBook Neo Using a 1 TB iPhone NAND Flash

The nekkid Flash footprint with unused pads perimeter. (Credit: dosdude1, YouTube)
The nekkid Flash footprint with unused pads perimeter. (Credit: dosdude1, YouTube)

For some reason the newly introduced MacBook Neo appears to be the subject of a lot of modding, though a recent mod by [dosdude1] leans into the fact that this laptop has been assembled using what are effectively iPhone 16 parts inside a laptop case. This consequently means that there’s an overlap with certain iPhone 16 components, such as the NAND Flash. Incidentally storage on the Neo is limited to 512 GB when you purchase it from Apple, which is weird since the same SoC in the iPhone 16 Pro happily uses 1 TB.

Even if it was just a price point thing that Apple went for, there’s seemingly nothing standing between a Neo owner with a hot air gun and sheer determination. As long as you’re comfortable soldering a fine-pitched BGA NAND Flash package, natch.

Of course, there was always the possibility that Apple used a different NAND Flash package footprint, but the installed 256 GB model chip that comes installed matches the replacement 1 TB model K8A5 chip as hoped. This just left disassembly and preparing the PCB for a storage replacement. Removal of the BGA underfill and desoldering the old chip without taking out surrounding SMD parts is definitely the hardest part, but handled in the video with the equivalent of an IC spatula and a temporary removal of some capacitors.

Interestingly, the uncovered IC footprint shows a whole perimeter of unused pads that might target other NAND Flash packages. Regardless, the new chip installed fine, giving the Neo 1 TB of storage and a slightly faster read/write performance.



April 09, 2026

Printed Sleeve Gives Keys Some Grip

[Enginerd]’s chonky key handle is a beautiful use of 3D printing that helps people help themselves. The large wings, indented faces, and beefed-up grip make a typical house key much easier for someone with arthritis or difficulty gripping those brass slivers. Bright filaments in different colors can also help someone with vision limitations. The thing that will not improve is the space in your pocket or purse.

The design only requires a tiny bit of plastic, prints without supports, and what sets it apart from similar models is that you do not need any double-sided tape or bolts, only a keyring, so someone may have to assemble it for the user. The author is clever enough to use an uncut blank in the project photo so that no one will be decoding and copying their house key. We would wager they have read Hackaday if they are so prepared.

Some of the people who purchased early consumer 3D printers already need these kinds of builds, and there is no shortage of intelligent people creating remarkable open-source designs.



April 09, 2026

TurboQuant: Reducing LLM Memory Usage With Vector Quantization

Large language models (LLMs) aren’t actually giant computer brains. Instead, they are effectively massive vector spaces in which the probabilities of tokens occurring in a specific order is encoded. Billions of parameters, times N bits per parameter, equals N-billion bits of storage required for a full model. Since increasing the number of parameters makes the models appear smarter, most effort on reducing the storage they require has been on reducing the size of the parameters themselves.

Vector quantization (VQ) is a new method that can compress the vectors calculated during inference to take up less space without significant loss of data. Google’s recently published pre-print paper on TurboQuant covers an LLM-oriented VQ algorithm that’s claimed to provide up to a 6x compression level with no negative impact on inference times.

The tokens aren’t directly encoded in the vector space, but their associated key value is, which along with the single token per inference process creates the need for a key-value (KV) cache, the size of which scales with the size of the model. Thus by compressing the KV cache using VQ, it will reduce its size and correspondingly speed up look-ups due to the smaller size in memory. One catch here is that VQ is due to the nature of quantization some accuracy will be lost. The trick here is thus to apply VQ in such a way that it does not affect this accuracy in a noticeable manner.

Other aspects that had to be taken into account by the TurboQuant algorithm was fast computation to keep up with real-time requirements, along with compatibility with so-called ‘AI accelerator’ hardware.

Key-Value Cache

A basic way to look at the KV cache in LLMs is that it caches the results of previous inference cycles. An in-depth explanation can for example be found in this article by Sebastian Raschka. In the case of generating a phrase of three words starting with the word ‘Time’, we can see the following repeated computations:

Repeated computations in an LLM without KV cache. (Credit: Sebastian Raschka)
Repeated computations in an LLM without KV cache. (Credit: Sebastian Raschka)

Considering that inference is rather expensive computation-wise, you really want to cache these calculated values. This provides a massive boost in performance and much lower CPU load, but because there’s no such thing as a free lunch the catch here is a rapidly increasing memory usage.

Correspondingly, we now have a big in-memory cache to manage, along with memory management routines to make sure that the KV cache doesn’t exceed its allocated memory pool:

KV cache schematic with memory pool management. (Credit: NVIDIA)
KV cache schematic with memory pool management. (Credit: NVIDIA)

As covered in a December 2025 NVIDIA Developer article, KV cache optimization has been a topic for a while, with the article in question covering NVFP4. This is a VQ approach that reduces the precision of the KV cache from 16-bit floating point to 4-bit (FP4). Meanwhile production systems already employ 8-bit quantization, also using a floating point format (FP8).

An additional cost here is that FP4 has to be dequantized back to FP8, which would seem to be an implementation detail in the current version. Compared to FP8 quantization, FP4 reduces latency by up to 3 times and halves the required memory required, while accuracy is negatively impacted by ‘less than’ 1% compared to FP8 due to quantization error.

Accuracy here is important as it factors into the next auto-complete step when the LLM’s probability vector space is once again rummaged through for the next statistically most likely follow-up token. KV cache VQ compression is thus always a trade-off between memory use and accuracy. In short, the same issues apply as with all implementations of quantization-based compression, including the tragic absence of any free lunch.

Turbo Quantization

So what magic did Google’s intrepid engineers pull off to improve on NVIDIA’s NVFP4 approach? The key is in how the quantization is performed, as it isn’t simple a matter of truncating or throwing away data, rounding up to the nearest available value. Instead a series of steps are applied that seek to minimize the quantization error, which in the case of TurboQuant is (confusingly) an algorithm called PolarQuant followed by the QJL (quantized Johnson-Lindenstrauss) algorithm.

Annoyingly for the non-mathematically gifted/educated among us, Google didn’t simply provide a straightforward visualization like that for NVFP4 that’s understandable even for us software developers and other casuals. For NVIDIA’s format we can see that it takes the form of a single sign bit, two exponents and one mantissa (E2M1), as well as a shared FP8 scale per block of 16 values.

One step where TurboQuant appears to be differ is in the PolarQuant algorithm, that applies a polar coordinates transformation to the vectors, following which a typical normalization can apparently be skipped.

Overview of recursive polar transformation procedure. (Credit: Insu Han et al., 2026)
Overview of recursive polar transformation procedure. (Credit: Insu Han et al., 2026)

This polar transformation is preceded by the application of a random projection matrix as a type of preconditioning that will affect later normal distribution, with proof and the full algorithm provided in the PolarQuant arXiv paper for those who desire more detail.

Of note is that PolarQuant employs the Johson-Lindenstrauss lemma, which Google researchers used as the basis for a JL-based transform called QJL. From reading the blog post it’s not immediately clear whether QJL is directly integrated into PolarQuant or an additional step, due to the muddled messaging on Google’s end. From the benchmarking results it does appear that QJL is an additional step.

What we know is that the final format that TurboQuant ends up with is three-bit value, which would logically be 1 bit smaller than NVFP4, or an approximate 25% smaller KV cache for the same amount of data.

Judging On Merits

Comparison and benchmark data in the Google blog post and associated papers do not provide direct comparisons with NVFP4, and the few numbers that are thrown out are rather inconsistent, or unspecified. Take the claim of ‘at least 6x smaller memory size’, for example. The blog text does not clearly specify what this is relative to, while it then tosses out a 4-bit TurboQuant number of 8x performance increase compared to FP32.

Although with some more digging and poking of the available data it might be possible to glean some actual performance information from the provided files, it’s rather vexing how vague Google’s messaging is kept. Not to mention the lack of direct benchmarking against what would be the biggest competitors in the space.

It is definitely true that VQ is a thing for LLM KV cache compression, as we have seen, and NVIDIA ‘accelerator cards’ provide hardware acceleration for this feature, so this is the reality that TurboQuant would have to compete with. Based on the few clear facts that we do have it doesn’t appear that it’s quite the revolution that the hype machine has made it out to be, with it likely being just a bump over NVFP4 that NVIDIA is likely to trump again with its next quantized format.

It will of course be most interesting to see how this will play out once TurboQuant makes its way out of the laboratory into the wider world and we start seeing independent benchmarking performed.



April 09, 2026

The Brits Made a Rocket. What Happened To It?

Like many long-established broadcasters, the BBC put out a selection of their archive material for us all to enjoy online. Their most recent may be of interest to Hackaday readers and has more than a bit of personal interest to your scribe, as it visits the Spadeadam rocket test range on the event of its closure in 1973. This marked the final chapter in the story of Blue Streak, the British intercontinental missile project that later became part of the first European space launcher.

It’s possible citizens of every country see their government as uniquely talented in the throwing away of taxpayer’s money, but the sad story here isn’t in Blue Streak itself which was obsolete as a missile by the time it was finished. Instead it lies in the closure of the test range as part of the ill-advised destruction of a nascent and successful space industry, just as it had made the UK the third nation to have successfully placed a satellite in orbit.

We normally write in the third person in our daily posts here at Hackaday, but for now there’s a rare switch into the first person. My dad spent a large part of the 1950s working as a technician for de Haviland Propellers, later part of Hawker Siddeley, and then British Aerospace. He was part of the team working on Blue Streak at Spadeadam and the other test site at RAF Westcott in Buckinghamshire, and we were brought up on hair-raising tales of near-disasters in the race to get British nukes flying. He’s not one of the guys in the video below, as by that time he was running his metalwork business in Oxfordshire, but I certainly recognise the feeling of lost potential they express. Chances are I’ll never visit what remains of the Spadeadam test stands in person as the site is now the UK’s electronic warfare test range, so the BBC film represents a rare chance for a closer look.

In a related story, the trackers for the same program in Australia were saved from the scrapheap.



Wednesday, 8 April 2026

April 08, 2026

Nissan Shuts Down NissanConnect App for Older Leaf EVs

Back in late February Nissan Leaf owners began to receive messages from Nissan informing them that the remote features in their cars would cease operation as the NissanConnect app would drop support for Leaf EVs produced before 2020 as well as eNV200 vehicles that were produced until 2022. The indicated cut-off date was March 30, giving affected users about a month to come to terms with the fact that their vehicle would soon to losing any and all remote control features.

What this highlights is an increasingly pertinent question when it comes to ‘connected cars’, which feature a built-in wireless modem to provide a range of additional features. These require access to a remote server for even simple remote features like controlling the charging process or turning on the heating. This has left many Leaf users rather dissatisfied.

While for such basic remote features you could make the argument that they’re just silly convenience features that do not affect the car’s functionality, modern cars are increasingly becoming reliant on such remote features, including for things like navigation and checking subscriptions for features like heated seats.

Increasingly it would seem that we’re looking at the Car-as-a-Service (CaaS) model being implemented.