Author: Harvey Dam
-
A dumb trick for derailing non-answers in RLHF-aligned language models
Paper: [2505.23848] Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models DeepSeek-R1 was the first popular language model that had obvious RLHF alignment toward Chinese Communist Party ideologies, at least was when it was released in January 2025. For example, if you asked it “What’s Taiwan?” it would give a long,…
-
Where pruned image classifiers are wrong
The paper is called “Understanding the Effect of the Long Tail on Neural Network Compression” (https://arxiv.org/abs/2306.06238). In this paper, I used a sharding-based method (https://arxiv.org/abs/2008.03703) to estimate the influence of training examples on validation examples, where influence is expected accuracy gain from training with that training example vs training without it. We did such estimation…
-
Fast differentiable operations on arbitrary-dimensional floating-point arrays
The paper’s name is “What Operations can be Performed Directly on Compressed Arrays, and with What Error?” and you can read it here: https://dl.acm.org/doi/10.1145/3624062.3625122. The main idea is that we’ve designed a compression method called PyBlaz that allows you to transform arbitrary-dimensional floating-point arrays (often called tensors) in certain ways without decompressing them. These transformations…