{"id":22387,"date":"2024-02-16T06:08:41","date_gmt":"2024-02-16T11:08:41","guid":{"rendered":"https:\/\/me-en.kaspersky.com\/blog\/?p=22387"},"modified":"2024-02-16T16:04:27","modified_gmt":"2024-02-16T12:04:27","slug":"how-to-use-ai-locally-and-securely","status":"publish","type":"post","link":"https:\/\/me-en.kaspersky.com\/blog\/how-to-use-ai-locally-and-securely\/22387\/","title":{"rendered":"How to install and use an AI assistant on your computer"},"content":{"rendered":"<p>Many people are already experimenting with generative neural networks and finding regular use for them, including at work. For example, ChatGPT and its analogs are regularly used by almost <a href=\"https:\/\/www.business.com\/technology\/chatgpt-usage-workplace-study\/\" target=\"_blank\" rel=\"nofollow noopener\">60% of Americans<\/a> (and not always with permission from management). However, all the data involved in such operations \u2014 both user prompts and model responses \u2014 are stored on servers of OpenAI, Google, and the rest. For tasks where such information leakage is unacceptable, you don\u2019t need to abandon AI completely \u2014 you just need to invest a little effort (and perhaps money) to run the neural network locally on your own computer \u2013 even a laptop.\n<\/p>\n<h2>Cloud threats<\/h2>\n<p>\nThe most popular AI assistants run on the cloud infrastructure of large companies. It\u2019s efficient and fast, but your data processed by the model may be accessible to both the AI service provider and completely unrelated parties, <a href=\"https:\/\/www.bbc.com\/news\/technology-65047304\" target=\"_blank\" rel=\"nofollow noopener\">as happened last year with ChatGPT<\/a>.<\/p>\n<p>Such incidents present varying levels of threat depending on what these AI assistants are used for. If you\u2019re generating cute illustrations for some fairy tales you\u2019ve written, or asking ChatGPT to create an itinerary for your upcoming weekend city break, it\u2019s unlikely that a leak will lead to serious damage. However, if your conversation with a chatbot contains confidential info \u2014 personal data, passwords, or bank card numbers \u2014 a possible leak to the cloud is no longer acceptable. Thankfully, it\u2019s relatively easy to prevent by pre-filtering the data \u2014 we\u2019ve written a <a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-use-chatgpt-ai-assistants-securely-2024\/50562\/\" target=\"_blank\" rel=\"noopener nofollow\">separate post<\/a> about that.<\/p>\n<p>However, in cases where either all the correspondence is confidential (for example, medical or financial information), or the reliability of pre-filtering is questionable (you need to process large volumes of data that no one will preview and filter), there\u2019s only one solution: move the processing from the cloud to a local computer. Of course, running your own version of ChatGPT or Midjourney offline is unlikely to be successful, but other neural networks working locally provide comparable quality with less computational load.\n<\/p>\n<h2>What hardware do you need to run a neural network?<\/h2>\n<p>\nYou\u2019ve probably heard that working with neural networks requires super-powerful graphics cards, but in practice this isn\u2019t always the case. Different AI models, depending on their specifics, may be demanding on such computer components as RAM, video memory, drive, and CPU (here, not only the processing speed is important, but also the processor\u2019s support for certain vector instructions). The ability to load the model depends on the amount of RAM, and the size of the \u201ccontext window\u201d \u2014 that is, the memory of the previous conversation \u2014 depends on the amount of video memory. Typically, with a weak graphics card and CPU, generation occurs at a snail\u2019s pace (one to two words per second for text models), so a computer with such a minimal setup is only appropriate for getting acquainted with a particular model and evaluating its basic suitability. For full-fledged everyday use, you\u2019ll need to increase the RAM, upgrade the graphics card, or choose a faster AI model.<\/p>\n<p>As a starting point, you can try working with computers that were considered relatively powerful back in 2017: processors no lower than Core i7 with support for AVX2 instructions, 16GB of RAM, and graphics cards with at least 4GB of memory. For Mac enthusiasts, models running on the Apple M1 chip and above will do, while the memory requirements are the same.<\/p>\n<p>When choosing an AI model, you should first familiarize yourself with its system requirements. A search query like \u201c<em>model_name<\/em> requirements\u201d will help you assess whether it\u2019s worth downloading this model given your available hardware. There are detailed studies available on the impact of memory size, CPU, and GPU on the performance of different models; for example, <a href=\"https:\/\/blog.nomic.ai\/posts\/gpt4all-gpu-inference-with-vulkan\" target=\"_blank\" rel=\"nofollow noopener\">this one<\/a>.<\/p>\n<p>Good news for those who don\u2019t have access to powerful hardware \u2014 there are simplified AI models that can perform practical tasks even on old hardware. Even if your graphics card is very basic and weak, it\u2019s possible to run models and launch environments using only the CPU. Depending on your tasks, these can even work acceptably well.<\/p>\n<div id=\"attachment_50579\" style=\"width: 1854px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/02\/16151112\/how-to-use-AI-locally-01.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-50579\" class=\"size-full wp-image-50579\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/02\/16151112\/how-to-use-AI-locally-01.png\" alt=\"GPU throughput tests\" width=\"1844\" height=\"1140\"><\/a><p id=\"caption-attachment-50579\" class=\"wp-caption-text\">Examples of how various computer builds work with popular language models<\/p><\/div>\n<h2>Choosing an AI model and the magic of quantization<\/h2>\n<p>\nA wide range of language models are available today, but many of them have limited practical applications. Nevertheless, there are easy-to-use and publicly available AI tools that are well-suited for specific tasks, be they generating text (for example, Mistral 7B), or creating code snippets (for example, Code Llama 13B). Therefore, when selecting a model, narrow down the choice to a few suitable candidates, and then make sure that your computer has the necessary resources to run them.<\/p>\n<p>In any neural network, most of the memory strain is courtesy of weights \u2014 numerical coefficients describing the operation of each neuron in the network. Initially, when training the model, the weights are computed and stored as high-precision fractional numbers. However, it turns out that rounding the weights in the trained model allows the AI tool to be run on regular computers while only slightly decreasing the performance. This rounding process is called quantization, and with its help the model\u2019s size can be reduced considerably \u2014 instead of 16 bits, each weight might use eight, four, or even two bits.<\/p>\n<p>According to <a href=\"https:\/\/arxiv.org\/abs\/2305.17888\" target=\"_blank\" rel=\"nofollow noopener\">current research<\/a>, a larger model with more parameters and quantization can sometimes give better results than a model with precise weight storage but fewer parameters.<\/p>\n<p>Armed with this knowledge, you\u2019re now ready to explore the treasure trove of open-source language models, namely the top <a href=\"https:\/\/huggingface.co\/spaces\/HuggingFaceH4\/open_llm_leaderboard\" target=\"_blank\" rel=\"nofollow noopener\">Open LLM leaderboard<\/a>. In this list, AI tools are sorted by several generation quality metrics, and filters make it easy to exclude models that are too large, too small, or too accurate.<\/p>\n<div id=\"attachment_50581\" style=\"width: 1782px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/02\/16151128\/how-to-use-AI-locally-02.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-50581\" class=\"size-full wp-image-50581\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/02\/16151128\/how-to-use-AI-locally-02.jpg\" alt=\"List of language models sorted by filter set\" width=\"1772\" height=\"846\"><\/a><p id=\"caption-attachment-50581\" class=\"wp-caption-text\">List of language models sorted by filter set<\/p><\/div>\n<p>After reading the model description and making sure it\u2019s potentially a fit for your needs, test its performance in the cloud using <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"nofollow noopener\">Hugging Face<\/a> or <a href=\"https:\/\/colab.research.google.com\/\" target=\"_blank\" rel=\"nofollow noopener\">Google Colab<\/a> services. This way, you can avoid downloading models which produce unsatisfactory results, saving you time. Once you\u2019re satisfied with the initial test of the model, it\u2019s time to see how it works locally!\n<\/p>\n<h2>Required software<\/h2>\n<p>\nMost of the open-source models are published on <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"nofollow noopener\">Hugging Face<\/a>, but simply downloading them to your computer isn\u2019t enough. To run them, you have to install specialized software, such as <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\" target=\"_blank\" rel=\"nofollow noopener\">LLaMA.cpp<\/a>, or \u2014 even easier \u2014 its \u201cwrapper\u201d, <a href=\"https:\/\/lmstudio.ai\/\" target=\"_blank\" rel=\"nofollow noopener\">LM Studio<\/a>. The latter allows you to select your desired model directly from the application, download it, and run it in a dialog box.<\/p>\n<p>Another \u201cout-of-the-box\u201d way to use a chatbot locally is <a href=\"https:\/\/gpt4all.io\/index.html\" target=\"_blank\" rel=\"nofollow noopener\">GPT4All<\/a>. Here, the choice is limited to about a dozen language models, but most of them will run even on a computer with just 8GB of memory and a basic graphics card.<\/p>\n<p>If generation is too slow, then you may need a model with coarser quantization (two bits instead of four). If generation is interrupted or execution errors occur, the problem is often insufficient memory \u2014 it\u2019s worth looking for a model with fewer parameters or, again, with coarser quantization.<\/p>\n<p>Many models on Hugging Face have already been quantized to varying degrees of precision, but if no one has quantized the model you want with the desired precision, you can do it yourself using <a href=\"https:\/\/github.com\/IST-DASLab\/gptq\" target=\"_blank\" rel=\"nofollow noopener\">GPTQ<\/a>.<\/p>\n<p>This week, another promising tool was released to public beta: <a href=\"https:\/\/www.nvidia.com\/en-us\/ai-on-rtx\/chat-with-rtx-generative-ai\/\" target=\"_blank\" rel=\"nofollow noopener\">Chat With RTX<\/a> from NVIDIA. The manufacturer of the most sought-after AI chips has released a local chatbot capable of summarizing the content of YouTube videos, processing sets of documents, and much more \u2014 provided the user has a Windows PC with 16GB of memory and an NVIDIA RTX 30<sup>th<\/sup> or 40<sup>th<\/sup> series graphics card with 8GB or more of video memory. \u201cUnder the hood\u201d are the same varieties of Mistral and Llama 2 from <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"nofollow noopener\">Hugging Face<\/a>. Of course, powerful graphics cards can improve generation performance, but according to the <a href=\"https:\/\/www.theverge.com\/2024\/2\/13\/24071645\/nvidia-ai-chatbot-chat-with-rtx-tech-demo-hands-on\" target=\"_blank\" rel=\"nofollow noopener\">feedback from the first testers<\/a>, the existing beta is quite cumbersome (about 40GB) and difficult to install. However, NVIDIA\u2019s Chat With RTX could become a very useful local AI assistant in the future.<\/p>\n<div id=\"attachment_50582\" style=\"width: 1369px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/02\/16151142\/how-to-use-AI-locally-03.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-50582\" class=\"size-full wp-image-50582\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/02\/16151142\/how-to-use-AI-locally-03.png\" alt='The code for the game \"Snake\", written by the quantized language model TheBloke\/CodeLlama-7B-Instruct-GGUF' width=\"1359\" height=\"865\"><\/a><p id=\"caption-attachment-50582\" class=\"wp-caption-text\">The code for the game \u201cSnake\u201d, written by the quantized language model TheBloke\/CodeLlama-7B-Instruct-GGUF<\/p><\/div>\n<p>The applications listed above perform all computations locally, don\u2019t send data to servers, and can run offline so you can safely share confidential information with them. However, to fully protect yourself against leaks, you need to ensure not only the security of the language model but also that of your computer \u2013 and that\u2019s where our <a href=\"https:\/\/me-en.kaspersky.com\/premium?icid=me-en_bb2022-kdplacehd_acq_ona_smm__onl_b2c_kdaily_lnk_sm-team___kprem___\" target=\"_blank\" rel=\"noopener\">comprehensive security solution<\/a>\u00a0comes in. As confirmed in <a href=\"https:\/\/www.kaspersky.com\/top3\" target=\"_blank\" rel=\"noopener nofollow\">independent tests<\/a>, <a href=\"https:\/\/me-en.kaspersky.com\/premium?icid=me-en_bb2022-kdplacehd_acq_ona_smm__onl_b2c_kdaily_lnk_sm-team___kprem___\" target=\"_blank\" rel=\"noopener\">Kaspersky Premium<\/a>\u00a0has practically no impact on your computer\u2019s performance \u2014 an important advantage when working with local AI models.<\/p>\n<input type=\"hidden\" class=\"category_for_banner\" value=\"premium-geek\">\n","protected":false},"excerpt":{"rendered":"<p>Getting all the benefits of ChatGPT, Copilot, and Midjourney locally \u2014 without leaking your data to the internet.<\/p>\n","protected":false},"author":2722,"featured_media":22389,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[1481,1583,1217,2611,282,1415],"class_list":{"0":"post-22387","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tips","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-chatbots","11":"tag-chatgpt","12":"tag-cybersecurity","13":"tag-machine-learning"},"hreflang":[{"hreflang":"en-ae","url":"https:\/\/me-en.kaspersky.com\/blog\/how-to-use-ai-locally-and-securely\/22387\/"},{"hreflang":"en-in","url":"https:\/\/www.kaspersky.co.in\/blog\/how-to-use-ai-locally-and-securely\/27077\/"},{"hreflang":"ar","url":"https:\/\/me.kaspersky.com\/blog\/how-to-use-ai-locally-and-securely\/11436\/"},{"hreflang":"en-us","url":"https:\/\/usa.kaspersky.com\/blog\/how-to-use-ai-locally-and-securely\/29744\/"},{"hreflang":"en-gb","url":"https:\/\/www.kaspersky.co.uk\/blog\/how-to-use-ai-locally-and-securely\/27253\/"},{"hreflang":"es-mx","url":"https:\/\/latam.kaspersky.com\/blog\/how-to-use-ai-locally-and-securely\/27042\/"},{"hreflang":"es","url":"https:\/\/www.kaspersky.es\/blog\/how-to-use-ai-locally-and-securely\/29662\/"},{"hreflang":"it","url":"https:\/\/www.kaspersky.it\/blog\/how-to-use-ai-locally-and-securely\/28540\/"},{"hreflang":"ru","url":"https:\/\/www.kaspersky.ru\/blog\/how-to-use-ai-locally-and-securely\/36986\/"},{"hreflang":"x-default","url":"https:\/\/www.kaspersky.com\/blog\/how-to-use-ai-locally-and-securely\/50576\/"},{"hreflang":"fr","url":"https:\/\/www.kaspersky.fr\/blog\/how-to-use-ai-locally-and-securely\/21543\/"},{"hreflang":"pt-br","url":"https:\/\/www.kaspersky.com.br\/blog\/how-to-use-ai-locally-and-securely\/22254\/"},{"hreflang":"de","url":"https:\/\/www.kaspersky.de\/blog\/how-to-use-ai-locally-and-securely\/30951\/"},{"hreflang":"ja","url":"https:\/\/blog.kaspersky.co.jp\/how-to-use-ai-locally-and-securely\/35896\/"},{"hreflang":"nl","url":"https:\/\/www.kaspersky.nl\/blog\/how-to-use-ai-locally-and-securely\/29029\/"},{"hreflang":"ru-kz","url":"https:\/\/blog.kaspersky.kz\/how-to-use-ai-locally-and-securely\/27452\/"},{"hreflang":"en-au","url":"https:\/\/www.kaspersky.com.au\/blog\/how-to-use-ai-locally-and-securely\/33259\/"},{"hreflang":"en-za","url":"https:\/\/www.kaspersky.co.za\/blog\/how-to-use-ai-locally-and-securely\/32882\/"}],"acf":[],"banners":"","maintag":{"url":"https:\/\/me-en.kaspersky.com\/blog\/tag\/ai\/","name":"AI"},"_links":{"self":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/22387","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/users\/2722"}],"replies":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/comments?post=22387"}],"version-history":[{"count":4,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/22387\/revisions"}],"predecessor-version":[{"id":22392,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/22387\/revisions\/22392"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media\/22389"}],"wp:attachment":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media?parent=22387"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/categories?post=22387"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/tags?post=22387"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}