Show HN: Swift-powered AI apps on iOS: Real-time multimodal semantic search

https://github.com/ashvardanian/SwiftSemanticSearch

Swift Semantic Search 🍏

Preview

This Swift demo app shows you how to build real-time native AI-powered apps for Apple devices using Unum's Swift libraries and quantized models. Under the hood, it uses UForm to understand and "embed" multimodal data, like multilingual texts and images, processing them on the fly from a camera feed. Once the vector embeddings are computed, it uses USearch to provide a real-time search over the semantic space. That same engine also enables geo-spatial search over the coordinates of the images and has been shown to scale even to 100M+ entries on an 🍏 iPhone easily.

The demo app is capable of text-to-image and image-to-image search and uses vmanot/Media libra to fetch the camera feed, embedding, and searching frames on the fly. To test the demo:

# Clone the repo
git clone https://github.com/ashvardanian/SwiftSemanticSearch.git
# Change directory & decompress the images dataset.zip, which brings:
#   - `images.names.txt` with newline-separated image names
#   - `images.uform3-image-text-english-small.fbin` - precomputed embeddings
#   - `images.uform3-image-text-english-small.usearch` - precomputed index
#   - `images` - directory with images
cd SwiftSemanticSearch
unzip dataset.zip

After that, fire up the Xcode project and run the app on your fruity device!


Links:

{
"by": "ashvardanian",
"descendants": 0,
"id": 40249546,
"score": 6,
"text": "I was recently playing with Apple&#x27;s CoreML and had several painful observations on tooling. It&#x27;s not enough for a long read but should be for an HN post.<p>In short, you can take a simple BERT-like encoder model in PyTorch, convert it into an f32 CoreML checkpoint, and run it on CPU or GPU, but not NPU. Let&#x27;s unpack this.<p>Having a simple and extensible format to exchange common ANN architectures is a big issue for anyone who uses more than one framework or programming language to run the same model. ONNX is the closest we have to that standard, but it&#x27;s hard to call anything Protobuf-related &quot;simple.&quot; If you start with ONNX, you quickly realize - Apple has no tool for converting ONNX-&gt;CoreML. But if you want to go ONNX-&gt;PyTorch-&gt;CoreML, be warned that PyTorch has no ONNX import functionality. (issue #21683).<p>When you convert to CoreML, you can choose the `precision`. Evaluating modern models in full precision seems wasteful, so I&#x27;ve used half-precision variants over single precision. Unlike ONNX, with CoreML, you can&#x27;t get to data types under 16 bits in size. 16-bit variants also didn&#x27;t work for me, so UForm is currently stuck with 32 bits. This means our iOS-targeting checkpoints are heavier than PyTorch, SafeTensor, and ONNX exports of the same model (bf16, bf16, and u8, respectively).<p><a href=\"https:&#x2F;&#x2F;huggingface.co&#x2F;unum-cloud&#x2F;uform3-image-text-english-small&#x2F;tree&#x2F;main\" rel=\"nofollow\">https:&#x2F;&#x2F;huggingface.co&#x2F;unum-cloud&#x2F;uform3-image-text-english-...</a><p>CoreML tooling allows you to specify the `RangeDim`, marking the variable-length sides of the input tensor. This is handy if you want to support different batch sizes. ONNX has that functionality and works fine, while CoreML fails. So, for now, I stick to batch size one.<p>Last, Xcode provides a profiler to measure your models&#x27; latency&#x2F;throughput. The profiler covers CPUs, GPUs, and NPUs, but no model I&#x27;ve tested could run on the NPU. I assume those are reserved for first-party models. Interestingly, Apple Silicon contains specialized AMX (Advanced Matrix eXtensions) clusters near Macs&#x27; performance and efficiency cores. Those differ from Intel&#x27;s AMX (Advanced Matrix eXtensions) and Arm&#x27;s SME (Scalable Matrix Extensions). They aren&#x27;t publicly documented, but their whole purpose is AI acceleration. If I run the inference on the CPU - it&#x27;s 10x slower than on the GPU on M2 Pro, so AMX is probably not used. It would be great to get clarifications from Apple on the purpose of all those specialized enclaves.<p>---<p>Model aside, there are a lot of other issues with the developer experience.<p>Let&#x27;s address the Xcode in the room. I mostly write code using VS Code. When it gets too slow and buggy, I switch to the native Sublime Text. In the Apple ecosystem, you are lost without a good second option once Xcode fails.<p>The most common issues I&#x27;ve faced were adding&#x2F;updating&#x2F;removing app dependencies. Another big one is running a build and wondering if it&#x27;s the latest or some internal cached version. When something breaks, you must navigate Plist (XML) files to clean up the mess. That&#x27;s similar to manually editing the `yarn.lock` or `poetry.lock` if you are coming from JS or Python.<p>One of VS Code&#x27;s handiest features is &quot;Format on Save.&quot; Keeping the code sane is necessary for popular open-source projects. Xcode has no such feature. Apple has a first-party tool called `swift-format`, which uses the `.swift-format` config. Xcode doesn&#x27;t respect that config. Moreover, I couldn&#x27;t make the `swift-format` tool mimic the native Xcode style for empty lines.<p>Last, I needed help finding a way to use Sphinx for Swift and Objective-C documentation. Generating API references for projects with many language bindings is extremely hard, and Swift isn&#x27;t making it more accessible.<p>---<p>Overall, Apple ships some fantastic hardware, but we are in the very early days of software adoption, and I hope these notes help the company patch some rough corners before the WWDC.",
"time": 1714754129,
"title": "Show HN: Swift-powered AI apps on iOS: Real-time multimodal semantic search",
"type": "story",
"url": "https://github.com/ashvardanian/SwiftSemanticSearch"
}
{
"author": "ashvardanian",
"date": null,
"description": "Real-time on-device text-to-image and image-to-image Semantic Search with video stream camera capture using USearch & UForm AI Swift SDKs for Apple devices 🍏 - ashvardanian/SwiftSemanticSearch",
"image": "https://repository-images.githubusercontent.com/684121083/9d8f9794-f704-48f6-b73a-30d8b0d9bcd4",
"logo": "https://logo.clearbit.com/github.com",
"publisher": "GitHub",
"title": "GitHub - ashvardanian/SwiftSemanticSearch: Real-time on-device text-to-image and image-to-image Semantic Search with video stream camera capture using USearch & UForm AI Swift SDKs for Apple devices 🍏",
"url": "https://github.com/ashvardanian/SwiftSemanticSearch"
}
{
"url": "https://github.com/ashvardanian/SwiftSemanticSearch",
"title": "GitHub - ashvardanian/SwiftSemanticSearch: Real-time on-device text-to-image and image-to-image Semantic Search with video stream camera capture using USearch & UForm AI Swift SDKs for Apple devices 🍏",
"description": "Real-time on-device text-to-image and image-to-image Semantic Search with video stream camera capture using USearch & UForm AI Swift SDKs for Apple devices 🍏 - ashvardanian/SwiftSemanticSearch",
"links": [
"https://github.com/ashvardanian/SwiftSemanticSearch"
],
"image": "https://repository-images.githubusercontent.com/684121083/9d8f9794-f704-48f6-b73a-30d8b0d9bcd4",
"content": "<div><article><p></p><h2>Swift Semantic Search 🍏</h2><a target=\"_blank\" href=\"https://github.com/ashvardanian/SwiftSemanticSearch#swift-semantic-search-\"></a><p></p>\n<p><a target=\"_blank\" href=\"https://github.com/ashvardanian/ashvardanian/blob/master/repositories/SwiftSemanticSearch.jpg?raw=true#center\"><img src=\"https://github.com/ashvardanian/ashvardanian/raw/master/repositories/SwiftSemanticSearch.jpg?raw=true#center\" alt=\"Preview\" /></a></p>\n<p>This Swift demo app shows you how to build real-time native AI-powered apps for Apple devices using Unum's Swift libraries and quantized models.\nUnder the hood, it uses <a target=\"_blank\" href=\"https://github.com/unum-cloud/uform\">UForm</a> to understand and \"embed\" multimodal data, like multilingual texts and images, processing them on the fly from a camera feed.\nOnce the vector embeddings are computed, it uses <a target=\"_blank\" href=\"https://github.com/unum-cloud/usearch\">USearch</a> to provide a real-time search over the semantic space.\nThat same engine also enables geo-spatial search over the coordinates of the images and has been shown to scale even to 100M+ entries on an 🍏 iPhone easily.</p>\n<p>The demo app is capable of text-to-image and image-to-image search and uses <code>vmanot/Media</code> libra to fetch the camera feed, embedding, and searching frames on the fly.\nTo test the demo:</p>\n<div><pre><span><span>#</span> Clone the repo</span>\ngit clone https://github.com/ashvardanian/SwiftSemanticSearch.git\n<span><span>#</span> Change directory &amp; decompress the images dataset.zip, which brings:</span>\n<span><span>#</span> - `images.names.txt` with newline-separated image names</span>\n<span><span>#</span> - `images.uform3-image-text-english-small.fbin` - precomputed embeddings</span>\n<span><span>#</span> - `images.uform3-image-text-english-small.usearch` - precomputed index</span>\n<span><span>#</span> - `images` - directory with images</span>\n<span>cd</span> SwiftSemanticSearch\nunzip dataset.zip</pre></div>\n<p>After that, fire up the Xcode project and run the app on your fruity device!</p>\n<hr />\n<p>Links:</p>\n<ul>\n<li><a target=\"_blank\" href=\"https://github.com/ashvardanian/SwiftSemanticSearch/blob/main/images.ipynb\">Preprocessing datasets</a></li>\n<li><a target=\"_blank\" href=\"https://unum-cloud.github.io/usearch/swift\">USearch Swift docs</a></li>\n<li><a target=\"_blank\" href=\"https://unum-cloud.github.io/uform/swift\">Form Swift docs</a></li>\n</ul>\n</article></div>",
"author": "",
"favicon": "https://github.githubassets.com/favicons/favicon.svg",
"source": "github.com",
"published": "",
"ttr": 40,
"type": "object"
}