A generalist and a technologist. # Software is my trade and # ArtificialIntelligence is my # science . I live in # LasGabias , # Granada , # Spain . I post about # technology and # WorldNews . 40 years old Pronouns: he/him I am the admin of this tiny instance. # DeepLearning , # IndustrialAnomalyDetection , # MachineIntelligence , # AI , # Linux , # Kubernetes , # RetroComputing , # Commodore64 , # cats , # polyamory , # panpsychism , # atheism , # anarchism , # leftist , # AnarchoCommunism , # robotics , # OpenSource , # fedi22
rukii.net
A generalist and a technologist. # Software is my trade and # ArtificialIntelligence is my # science . I live in # LasGabias , # Granada , # Spain . I post about # technology and # WorldNews . 40 years old Pronouns: he/him I am the admin of this tiny instance. # DeepLearning , # IndustrialAnomalyDetection , # MachineIntelligence , # AI , # Linux , # Kubernetes , # RetroComputing , # Commodore64 , # cats , # polyamory , # panpsychism , # atheism , # anarchism , # leftist , # AnarchoCommunism , # robotics , # OpenSource , # fedi22
rukii.net
@tero@rukii.net
·
Feb 09, 2026
AI field is still young and there are no established best practices. Here are some guidelines of my own though:
1. Understand the LLM psychology. Know that they read even mark-up and code loaded with meaning. A JSON describing a sound of a bark timestamped 01:00 is a dog barking at night for an LLM. Telling a model to draw a room without an elephant makes an elephant appear on a poster. Have empathy for the machine.
2. Use XML-like elements to structure your inputs and outputs, because it's vastly better for multi-line strings than e.g. JSON strings, reduces the need for escaping. It will later allow you to easily parse your instruction templates from the logs and refine this data further, allows easily leaving out all pre- and postample which LLMs sometimes like to emit, and a missing from the output means the response was cut off.
3. Avoid the need to prevent subverting your LLM agents. Agents shouldn't have more rights than your users, and shouldn't work against the interests of your users. Only as a last resort and after having made the consequence of LLM subversion as negligible as possible, you should consider adding guardrails.
4. Use hybrid indexes for your data, LLM managed metadata and LLM generated queries to both traditional and vector indices. When an LLM generates your index metadata, you know the semantics and heuristics for it. If it's human-created, you don't necessarily have explicit documentation for what the index fields mean. For vector indices you need to embed reference queries or something similar, rather than indexing raw chunks.
5. It's all about data. Forget the focus on AI architectures and even revenue. Everything else works out if you have a position in the data flows in the grand scheme of things.
6. Data quality is not the same as data fidelity. Humans aren't the gold standard, and neither is the real world. Photorealism for simulations isn't the goal. Data quality is generally the utility of the data for training models or for augmenting inference. This means that its utility is the density of true and usable knowledge and represented transferrable skills.
I am still open for new challenges. I am an #AI generalist with over 25 years of experience from a wide variety of domains, looking to work remotely from Spain. I am passionate about #PhysicalAI and #data.
View on rukii.net