Question 1

What is local AI inference hardware and how is it different from cloud AI?

Accepted Answer

Local AI inference hardware is a physical server with GPUs that runs AI models on your premises instead of sending your data to someone else's data center. Inference means the system processes your prompts and generates responses. It does not train new models. It runs pre-trained models locally. When you use a cloud AI service like ChatGPT, Claude, or Gemini, your prompts leave your network, get processed on the provider's GPUs, and the response comes back over the internet. The provider sees your data and processes it on shared infrastructure. With local hardware, the entire cycle happens inside your building. Prompt goes to your GPU. Response comes back from your GPU. Nothing leaves your network. You own the hardware, you own the models (MIT licensed, open weights), and you control every variable. The tradeoff: cloud AI gives you access to the latest proprietary models without hardware investment. Local AI gives you data sovereignty, zero recurring costs after purchase, and models you control entirely, but you are limited to open-source models that typically lag cloud releases by weeks or months.

Question 2

Does local AI hardware work without an internet connection?

Accepted Answer

Yes. Every Island Mountain system is designed to operate completely air-gapped. The models, the inference engine (vLLM and Ollama), and the interface (OpenWebUI) all run locally. Once the system is set up and models are loaded, you can disconnect the ethernet cable entirely and everything still works. Users access the system through a web browser on your local network. As long as the device and the server are on the same network, even an isolated non-internet-connected network, inference runs normally. No phone-home. No license checks. No cloud dependencies. The only things that require internet access are downloading new models and applying OS security updates. Both are optional and can be done on a schedule that fits your security posture. Some organizations in high-security environments load models via physical media and apply security patches through an isolated update process. Island Mountain applies a documented air-gap configuration before delivery, disabling all outbound features at the environment variable level: OFFLINE_MODE, telemetry, community sharing, and web search are all turned off during the 72-hour burn-in. Your IT team can independently verify every setting on the delivered system.

Question 3

Who should NOT buy local AI hardware?

Accepted Answer

Local AI hardware is the wrong purchase for several types of buyers. If your organization needs the absolute latest proprietary models (GPT-4o, Claude Opus, Gemini Ultra) on the day they release, local hardware will not deliver that. Open-source models lag proprietary releases. If you need to serve more than 50 concurrent heavy users, a dual-GPU system at this price point is not the right architecture. You need enterprise cloud infrastructure or multiple local systems. If your organization has no IT capability at all and cannot manage a server, be aware that after 30 days of included support, your team owns the maintenance. We offer ongoing support retainers, but the system lives in your server room. If your data sensitivity does not require local processing and cloud AI is working fine for your team at $200 per month, spending $75,000 on hardware does not make financial sense. Cloud is cheaper for low-volume, non-sensitive use cases.

Question 4

Does running AI locally comply with HIPAA?

Accepted Answer

Running AI on local hardware eliminates the data transmission vector that creates HIPAA exposure in the first place. When you use a cloud AI service, Protected Health Information leaves your network and is processed on shared infrastructure controlled by a third party. Under 45 CFR 164.402, that transmission can constitute a reportable disclosure event. With local inference hardware, PHI never leaves your premises. There is no cloud provider to execute a Business Associate Agreement with because no business associate is involved. The prompts stay on your server. The responses stay on your server. The model weights sit on drives you physically control. That said, local hardware does not automatically make your entire AI workflow HIPAA-compliant. You still need proper access controls, audit logging, encryption at rest, and staff training. The hardware eliminates the cloud transmission risk. Your organization still owns the operational compliance. Island Mountain is not a compliance attorney, and we do not certify HIPAA compliance.

Question 5

Can a law firm use local AI without violating attorney-client privilege?

Accepted Answer

The core issue with cloud AI for law firms is this: client data sent to a third-party API is data disclosed to a third party. Under ABA Model Rule 1.6, attorneys have a duty to make reasonable efforts to prevent unauthorized disclosure of client information. Courts have found privilege waived when confidential information is processed through systems outside the firm's control. Local AI hardware keeps every prompt, every document, and every response on a server inside your office. No data travels to an external API. No third party processes your client's information. The model runs on hardware you own, connected to your network, behind your firewall. This does not mean local AI automatically preserves privilege in every scenario. Your firm still needs internal policies governing who accesses the system, what data can be input, and how outputs are treated in the work product. But the fundamental architectural risk of third-party disclosure is eliminated.

Question 6

Can a tribal government run AI without cloud providers?

Accepted Answer

Yes. That is one of the primary use cases Island Mountain hardware is built for. Tribal nations exercise inherent sovereignty over constituent data, enrollment records, health information, and internal governance documents. Routing that data through cloud AI providers means processing sovereign data through jurisdictions and infrastructure outside tribal authority. Local AI hardware keeps everything on tribal premises, under tribal law, processed by hardware the nation owns outright. No data leaves the reservation. No cloud provider terms of service apply to your data. No federal or state jurisdiction touches the processing. The models pre-installed on every system (DeepSeek V4-Flash quantized, Llama 3.1 70B, Mixtral 8x22B) are MIT licensed, meaning the tribe owns the models with no usage restrictions. The system works fully air-gapped if your security posture requires it. Disconnect the ethernet cable and it still runs.

Question 7

Does local AI hardware meet ITAR requirements for defense contractors?

Accepted Answer

ITAR restricts the processing and storage of controlled technical data to environments that prevent foreign access. Cloud infrastructure operated by providers with multinational operations, overseas data centers, or foreign-national employees creates structural compliance risk for ITAR-regulated data. Local AI hardware that you own, operate on US soil, and control with US-person-only access eliminates the cloud processing vector entirely. The data stays on your premises, processed by hardware under your physical control, with no third-party infrastructure involved. Island Mountain systems run open-source models (MIT licensed, publicly available weights) which are not themselves controlled under ITAR. The controlled element is your data, and local hardware ensures that data never transits infrastructure you do not control. However, Island Mountain does not provide ITAR compliance certification, documentation, or legal guidance. ITAR compliance is an organizational responsibility that covers physical security, personnel screening, access controls, and documentation well beyond the hardware layer.

Question 8

Is DeepSeek V4-Flash safe to use with sensitive organizational data?

Accepted Answer

The safety question about DeepSeek V4-Flash usually comes from two concerns: the model's Chinese origin and data handling. Here are the facts.

DeepSeek V4-Flash is an open-weights model released under the MIT license. When you run it on Island Mountain hardware, the model executes entirely on your local GPUs. No data is sent to DeepSeek, to any Chinese server, or to any external endpoint. The model weights are publicly available and have been independently audited by the open-source community. The model does not phone home. It cannot phone home. It is a file sitting on your hard drive being processed by your GPUs.

The risk with DeepSeek exists only when you use it through DeepSeek's cloud API, which routes your data through their servers. That is not how Island Mountain systems work. Our systems run the model locally, air-gapped if you want, with zero external communication.

The model itself does not contain backdoors or data exfiltration code. It is a set of numerical weights processed through standard inference engines (vLLM/Ollama). Treat it like any other software asset: verify the download hash, review your organization's acceptable use policy, and run it on infrastructure you control. For more on DeepSeek and the full model stack, see our Technology page.

Question 9

What VRAM do I need to run DeepSeek V4-Flash?

Accepted Answer

DeepSeek V4-Flash requires approximately 160GB of VRAM to run quantized on the Summit Base tier or 282GB VRAM to run at full FP16 precision. A single NVIDIA H100 80GB GPU cannot run it, but running it across two GPUs using tensor parallelism delivers faster inference because the model layers are split and processed in parallel. On the Summit Base tier (2x H100 80GB refurbished, 160GB total VRAM), you can run DeepSeek V4-Flash in quantized form. The quantized version trades some numerical precision for inference speed. On the Summit Ridge tier (2x H100 80GB, 160GB total VRAM), the same quantized performance applies, with expect approximately 60-90 tokens per second for single-user inference thanks to the H100's 3.35 TB/s memory bandwidth. If you need to run DeepSeek V4-Flash at full FP16 quality, you will need the Summit Pinnacle tier with dual H200 141GB GPUs (282GB total VRAM), coming Q3 2026.

Question 10

What models come pre-installed on Island Mountain hardware?

Accepted Answer

Every Summit Base and Summit Ridge system ships with three models installed, configured, and burn-tested. DeepSeek V4-Flash (quantized) is a reasoning and general-purpose model built for complex analysis, multi-step problem solving, code generation, and structured thinking. The quantized version uses approximately 160GB of VRAM on the Summit Base tier. Llama 3.1 70B is Meta's general-purpose model, strong across writing, summarization, question answering, and conversational tasks. It uses approximately 40-48GB of VRAM. Mixtral 8x22B is Mistral's mixture-of-experts model with strong multilingual capability and efficient multi-task inference. It uses approximately 80GB of VRAM. All three are MIT licensed with no usage restrictions. You own them outright. You can switch between them instantly through the OpenWebUI dropdown menu. Additional open-source models can be downloaded and installed after delivery through Ollama or the OpenWebUI interface.

Question 11

What is OpenWebUI and how does it work?

Accepted Answer

OpenWebUI is a free, open-source, browser-based interface for interacting with local AI models. Think of it as a private version of ChatGPT that runs entirely on your network. No account with any cloud provider. No external data transmission. You open a web browser on any device connected to your network (Chrome, Firefox, Safari, Edge), navigate to the server's local address, and start prompting. The interface includes a dropdown menu to switch between installed models, full conversation history stored locally on the server, and an admin panel for user management and model access controls. Your administrator can create accounts for each team member, control which models each user can access, and monitor usage. Conversations are stored on the server's local drive, not in any cloud. OpenWebUI requires no command-line knowledge to use. If someone can use ChatGPT, they can use OpenWebUI.

Question 12

What is the difference between H100 and H200 GPUs for AI inference?

Accepted Answer

The two GPUs differ in VRAM capacity, memory bandwidth, and compute performance. The NVIDIA H100 80GB (PCIe, refurbished) uses HBM3 memory with 3.35 TB/s bandwidth and delivers 989 TFLOPS of FP16 compute. Two H100s give you 160GB of total VRAM. This is the GPU in the Summit Base tier (pre-made) at $75,000-$85,000 and runs quantized models at high speed. The NVIDIA H200 141GB uses HBM3e memory with 4.8 TB/s bandwidth. Two H200s give you 282GB of total VRAM, the minimum required to run DeepSeek V4-Flash at full quality with its 1M token context window. This is the Summit Pinnacle tier at $350,000-$400,000, coming Q3 2026. The Summit Ridge tier (H100, build-to-order) costs $150,000-$160,000 and delivers intermediate capability between Summit Base and Summit Pinnacle.

Question 13

How many users can simultaneously access an Island Mountain system?

Accepted Answer

A dual-GPU Island Mountain system comfortably serves a team of 5-15 simultaneous users for typical business inference tasks like document drafting, research queries, contract review, and summarization. The exact number depends on the model being used, prompt complexity, and response length. Simple queries return fast and queue efficiently. Long-form document generation takes more time per request, which means fewer concurrent users before response times stretch. The Summit Ridge tier (H100 GPUs build-to-order) handles concurrent workloads significantly better than the Summit Base tier (H100 GPUs pre-made refurbished) because vLLM tensor parallelism and the H100's 3.35 TB/s memory bandwidth allow fast processing of queued requests. This system is not designed to serve 50 or 100 simultaneous heavy users. If your organization has that kind of demand, you need multiple systems or enterprise cloud infrastructure.

Question 14

What power requirements does local AI inference hardware need?

Accepted Answer

All Island Mountain systems require a dedicated 208V/30A power circuit with a NEMA L6-30R outlet. This is the same kind of circuit found in server rooms, data closets, and commercial facilities. The power supply operates at 200-240V only. It will not run on a standard 120V wall outlet. If your facility does not already have this circuit, a licensed electrician can install one. Typical installation cost ranges from $500 to $2,000 depending on your building's electrical infrastructure. Average power draw under typical inference loads is 1.5-2.5 kW. At $0.12/kWh, that translates to roughly $100-$200 per month in electricity. The system runs at standard server room temperatures (64-80 degrees Fahrenheit) and does not require specialized cooling beyond normal HVAC. The chassis includes redundant 2000W power supplies for reliability.

Question 15

Can I add models after the system ships?

Accepted Answer

Yes. The system is not locked to the three pre-installed models. You can download and install any compatible open-source model through the Ollama interface or directly through OpenWebUI's model management panel. The constraint is VRAM. Your system has a fixed amount of GPU memory, and each model consumes VRAM when loaded. On the Summit Base and Summit Ridge tiers (160GB total VRAM), you can run any model that fits within that memory. Most 70B-parameter models require 40-48GB of VRAM at FP16 precision. Smaller models (7B, 13B, 30B) use less and can run alongside larger models. You cannot run models that exceed your total VRAM. For example, DeepSeek V4-Flash (284B parameters) requires approximately 282GB of VRAM, which only fits on the Summit Pinnacle tier. Quantized versions of larger models can sometimes fit on lower tiers, but with reduced quality. For the first 30 days after delivery, we walk you through the model installation process directly.

Question 16

How much does it cost to set up a private AI server for a small organization?

Accepted Answer

Island Mountain systems range from $75,000 to $400,000 depending on GPU configuration. The Summit Base tier with 2x NVIDIA H100 80GB PCIe (refurbished, pre-made) costs $75,000-$85,000. The Summit Ridge tier with 2x H100 80GB (build-to-order) costs $150,000-$160,000. The Summit Pinnacle tier with 2x H200 141GB GPUs costs $350,000-$400,000, coming Q3 2026. Every system is a one-time purchase. There are no subscription fees, no per-token charges, and no recurring software licensing costs. The models are MIT licensed and free to use. The interface (OpenWebUI) is open-source. Beyond the hardware purchase, plan for a dedicated 208V/30A electrical circuit if you do not already have one. A licensed electrician typically installs this for $500-$2,000. Ongoing electricity costs run approximately $100-$200 per month. No hidden fees.

Question 17

What is the total cost of cloud AI versus local AI hardware over five years?

Accepted Answer

The math depends on your team size and usage, but here is a representative comparison for a 10-user organization:

Cloud AI subscriptions cost $12,000-$120,000 over five years for a 10-user team. An Island Mountain Summit Base system costs $75,000-$85,000 once, with a five-year total cost of ownership of approximately $81,000-$97,000 including electricity.

Cloud options (ChatGPT Enterprise, Azure OpenAI, Anthropic) run $20-$200 per user per month, plus per-token overages. For 10 users, that's $2,400-$24,000 per year, with prices that historically only increase. The Island Mountain Summit Base system's annual electricity runs $1,200-$2,400, and that is your only recurring cost.

The crossover point where local hardware becomes cheaper than cloud depends on your cloud spend. At $500/month in cloud AI costs (modest usage), local hardware pays for itself in roughly 13 years. At $2,000/month, the payback period is under 4 years. At $5,000/month or above, you break even in under two years.

The cost advantage of local hardware grows every month it runs, because the marginal cost of each additional inference is just electricity. Cloud costs compound. Hardware costs don't. See the full comparison table on our Pricing page.

Question 18

What are the ongoing costs after purchasing local AI hardware?

Accepted Answer

After the one-time hardware purchase, your recurring costs are electricity and optional support. Electricity runs approximately $100-$200 per month, or $1,200-$2,400 per year. This estimate assumes 1.5-2.5 kW average power draw at $0.12/kWh. Your actual cost depends on local utility rates and usage intensity. There are no software licensing fees. The models (DeepSeek V4-Flash quantized, Llama 3.1 70B, Mixtral 8x22B) are MIT licensed. OpenWebUI is open-source. The operating system is Ubuntu Server LTS, also free. After the included 30-day support period, ongoing support is available on a per-incident basis or through an annual retainer. This covers model configuration, performance tuning, troubleshooting, and remote diagnostics. The only additional cost you might encounter is a future GPU upgrade, where your existing cards are credited at secondary market value toward replacement GPUs.

Question 19

How long does it take to receive and set up an Island Mountain system?

Accepted Answer

From deposit to delivery, expect 3-5 weeks. The process has four phases: component sourcing and verification (GPUs sourced from verified enterprise resellers with documented provenance), assembly and configuration (15-phase build process), 72-hour continuous burn-in testing, and full benchmarking with delivery manifest. The system arrives pre-configured. Models are installed, OpenWebUI is set up, and the server is ready to run. Setup on your end means racking the server, connecting power (208V/30A circuit), connecting to your network, and opening a browser. Most organizations are running their first prompts within hours of receiving the hardware. The Summit Ridge tier may take 2-4 additional weeks because H100 GPUs are sourced build-to-order. 30 days of hands-on setup support are included with every purchase.

Question 20

How does the 50% deposit and payment structure work?

Accepted Answer

The purchase process starts with a conversation about your workload and requirements, followed by a custom quote. Quotes include a 14-day price lock, meaning the quoted price holds for 14 days from acceptance. If GPU market prices spike above 10% during that window, you can cancel with a full deposit refund. Once you accept the quote, a 50% deposit initiates component sourcing and the build process. We do not build speculatively. Your deposit triggers the purchase of your specific GPUs and components. The remaining 50% is due upon delivery. You do not pay the balance until the system arrives, burn-tested and benchmarked, with a complete delivery manifest documenting every component serial number, test result, and configuration detail.

Question 21

What does the 72-hour burn-in test verify?

Accepted Answer

Every Island Mountain system runs 72 hours of continuous stress testing before it ships. This is not a quick benchmark or a 10-minute smoke test. It is three straight days of sustained GPU compute at high load with automated monitoring. The test verifies thermal stability under continuous load (GPUs maintain safe operating temperatures without throttling), memory integrity (no VRAM errors across billions of operations), inference consistency (model outputs remain stable and correct across thousands of sequential prompts), power supply reliability under sustained draw, and storage performance under continuous read/write activity. Automated monitoring tracks temperature, clock speeds, error rates, and performance metrics throughout the entire 72-hour window. Any anomaly triggers an alert. Systems that show instability, thermal throttling, VRAM errors, or any deviation from expected performance do not ship.

Question 22

What happens if a component fails during the warranty period?

Accepted Answer

Every Island Mountain system ships with a 1-year hardware warranty covering all components, including GPUs, CPU, RAM, storage drives, and power supplies. If a component fails within the first year, we handle the replacement. GPU failures are managed through supplier RMA agreements with documented replacement timelines. We maintain a 20% warranty reserve per unit to ensure we can cover replacements without delay. The process: you contact us (direct phone at 1-801-609-1130 or email to the builder), we diagnose the issue remotely, and if a physical component needs replacement, we ship the part or arrange a swap. Extended warranty options beyond the first year are available at the time of purchase. After warranty, ongoing support is available on a per-incident or annual retainer basis.

Question 23

Island Mountain is a new company with no shipped units. Why should I trust a first purchase?

Accepted Answer

This is a fair question and you should ask it. Here is the honest answer.

Island Mountain is a Colorado company co-founded by John Dougherty (hardware engineer, 25-year technology veteran) and Basho Parks (marketing and sales). The company is new. The team is not. Every system is built, tested, and delivered by people with decades of combined experience deploying technology infrastructure in demanding environments.

The specific protections built into the purchase process exist because we know trust has to be earned: the 14-day price lock lets you cancel if GPU prices spike above 10%. The 50% deposit / 50% on delivery structure means you do not pay in full until you have a working, tested system in your hands. The 1-year hardware warranty is backed by a 20% warranty reserve per unit. The 30-day setup support gives you direct access to the builder, not a call center.

The hardware is standard enterprise components (NVIDIA GPUs, AMD EPYC CPUs, Supermicro chassis) with established supplier RMA agreements. The software is open-source (OpenWebUI, vLLM, Ubuntu Server LTS). Nothing is proprietary, nothing is locked, and if Island Mountain disappeared tomorrow, your system would keep running because every piece of it is built on publicly available, open-source infrastructure. See our Why Local AI page for our honest positioning on what we are and what we are not.

Question 24

What happens if Island Mountain goes out of business after I buy?

Accepted Answer

Your system keeps running. This is by design, not by accident. Every component is built on open-source software and standard enterprise hardware. The operating system is Ubuntu Server LTS (free, community-supported). The inference engines are vLLM and Ollama (open-source). The interface is OpenWebUI (open-source). The models are MIT licensed with no usage restrictions. None of it depends on Island Mountain's continued existence. If Island Mountain were to close, you would lose access to our specific support, warranty coverage, and GPU upgrade program. But the system itself requires no license, no activation, and no connection to any Island Mountain server. OS updates come from Ubuntu's repositories. CUDA driver updates come from NVIDIA. Model updates come from Hugging Face and Ollama's public model library. Any competent Linux administrator can maintain the system independently. We built it this way because vendor lock-in is exactly the problem we are solving.

Question 25

Can local AI keep up with cloud AI as models improve?

Accepted Answer

Not on day one. But within weeks or months, yes. Cloud providers release proprietary models immediately because they control the infrastructure. Open-source models lag that release cycle. When OpenAI ships a new flagship, you will not have an equivalent open-source model on your local hardware that afternoon. What has consistently happened over the past two years is that open-source models close the gap rapidly. DeepSeek V4-Flash, Llama 3.1, and Mixtral all reached or exceeded the performance of the proprietary models they followed within weeks to months of release. Your Island Mountain system can run any new open-source model that fits within its VRAM. When a new model releases, you download it through Ollama or the OpenWebUI interface. No hardware swap needed unless the new model exceeds your VRAM capacity. The risk is that a future model generation requires significantly more VRAM than your system has. That is why we offer a GPU upgrade path where existing GPUs are credited at secondary market value toward next-generation cards.

Question 26

Why should I pay $75,000+ when I can build a similar AI server for $5,000-$10,000?

Accepted Answer

You can. The guides exist, and the software stack is the same: Ollama, vLLM, Open WebUI, and open-weight models like DeepSeek V4-Flash and Llama 3.1 70B. If you have the technical staff to build, configure, test, and maintain it, a DIY system running consumer RTX 4090s will run inference. Here is what that build does not include: Enterprise GPU provenance documentation. Every A100 and H100 in an Island Mountain system has a documented procurement chain - purchase receipts from authorized NVIDIA channels, serial number registry, and RMA history. Consumer GPUs purchased from Amazon or Newegg do not carry it. 72-hour continuous burn-in testing. Island Mountain tests whether the system sustains full-load inference for 72 continuous hours without thermal throttling, memory errors, or performance degradation. NVIDIA enterprise RMA chains. When an A100 or H100 fails, the replacement GPU enters the same provenance documentation chain, which matters for HIPAA, ITAR/DFARS, and CMMC audit trails. Direct builder support - you talk to the person who built your system. The honest answer: if you are a developer running experiments, build your own. If you are a regulated organization putting AI into production workflows where compliance documentation and hardware provenance matter, that is what the price delta covers.

Question 27

How can OpenWebUI be air-gapped if it is a web application?

Accepted Answer

The 'web' in OpenWebUI refers to the browser-based interface, not internet dependency. OpenWebUI is a self-hosted application that runs entirely on the server hardware. Users access it through a web browser pointed at the server's local network address. No cloud account required. No external API calls for inference. Out of the box, OpenWebUI includes features that make outbound network connections: model downloads, version update checks, HuggingFace Hub access for embedding models, optional community sharing, web search integration for RAG, and anonymized telemetry. None of these are required for AI inference. Island Mountain disables every outbound feature before shipping. The key environment variables - OFFLINE_MODE, HF_HUB_OFFLINE, ENABLE_COMMUNITY_SHARING, ANONYMIZED_TELEMETRY, ENABLE_RAG_WEB_SEARCH, and SAFE_MODE - are all set to their air-gapped states during the build process. The system is tested in this configuration during the 72-hour burn-in. Your IT team can verify it independently by inspecting the container environment variables on the delivered system, or by connecting it to a monitored network segment and confirming zero outbound connections during operation.

Frequently Asked Questions

Is Local AI Right for Us?