Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance-after-reboot-or-bios-change workaround is to power cycle, bug fixed in PRO/EVO firmware 2B6QCXP7/2B7QCXE7
This article has had several significant additions since initially published, to incorporate new findings from amazingly helpful readers who commented below, including a new 960 EVO firmware 2B7QCXE7 that seems to have solved the issue for me on Jan 29 2017, and firmware 2B6QCXP7 fixed the PRO several months later, once I was finally able to download it.
I'm apparently starting out 2017 smoothing out bumps-in-the-road that a few home lab enthusiasts ran into recently during their normally fun holiday season of home lab rebuilds. 'Twas the season for many IT workers to be off from work, myself included. This is my second such article this year, but at least we've put these issues to rest quickly, together. It's times like these that I'm so glad I have a web site with comments, so we can all spend more time working and playing than fixing.
Backstory
Turns out Supermicro is making progress with M.2 NVMe SSD support for Xeon D by posting links to their Tested M.2 List on their Xeon D Product Pages, screenshot samples above. Even though the list only includes SSDs they've tested by Intel, Micron, and Toshiba, it's still a baby step forward. Supermicro support folks have actually told me that they didn't support M.2 NVMe drives at all, only the much slower AHCI M.2 drives or SATA3 SSDs. Yikes! Of course, their scripted words were understandably horrifying for me personally, given how many turn-key Bundle 2 systems had already shipped. So I politely pointed out how wonderfully well the Samsung 950 PRO was working with my SYS-5028D-TN4T, for many months, with any OS I threw at it. I even shared some of the fastest benchmarks out there that featured there products:
- World's fastest consumer SSD - Samsung 950 PRO M.2 NVMe benchmark results
and my surprisingly popular related article from way back in November of 2015, oddly, the very top Google result for "how to boot from NVMe": - How to boot Windows 10 from NVMe based PCIe storage, featuring Samsung 950 PRO M.2 SSD in a Supermicro SYS-5028D-TN4T
Despite all that, and the year that has gone by since, Supermicro apparently still has an inexplicable reluctance to testing Samsung M.2 NVMe drives, which might well be the most popular M.2 NVMe drives on Amazon for over a year now. It would seem they could sure use a friend at Samsung, and a budget to give them time to expand well beyond their current list of only SATA3 M.2 devices, and half-height PCIe devices. I would propose they divert most of those testing funds toward the forward-looking M.2 NVMe form-factor instead, since apparently it's needed now that we've seen this strange issue crop up. It still makes me nervous that no M.2 NVMe devices have made their list yet.
First sign of a problem, along with a power-cycle fix (phew!)
Robert from Austria reporting a the strange Samsung 960 PRO disappearance behavior, as one of the very first reports of anybody getting their hands on one of these world's fastest "gumstick" storage devices, anywhere. Well, somebody who wasn't a lucky blogger receiving an early review unit anyway. Just one data point and something for me to watch, but with my own early pre-order messed up and delayed, so all I could really do was wait and see:
More reports came in
wdmia reporting his 960 EVO experiencing the same issue, with the same fix working. Now this really had my attention.
TheGrayGhost reporting his Supermicro case #SM1701012591, a 960 EVO with a Flex ATX Xeon D motherboard with the latest BIOS 1.0b, same issue, same fix. Dang, this is all Xeon D, Mini-ITX and Flex ATX models, aka, all SuperServers with X10SDV in the motherboard model number.
The problem
Supermicro SuperServer owner goes into their BIOS and saves their changes. The OS then boots, but doesn't "see" the 960 PRO or EVO drives, at all.
The workaround
After saving any BIOS changes, don't just power down, but also remove the power cord for a few seconds. When plugging back in and powering up, the drives are visible to whatever OS you're running again, with no data lost.
This is more like a workaround, while we wait for a proper fix that will likely come in the form of a BIOS update at some future date.
This is one of the stranger issues I've seen, and I don't have any technical explanation for it, that's be pure conjecture. This is a relatively minor inconvenience, but I'd say it could be pretty darn serious if you don't have a way to remove all power from a remote system!
Of course, I've reported this list of reports, and the below video, to Supermicro.
The fix
There is currently no known fix, it's likely a proper fix would involve installing a future BIOS update, whatever comes after BIOS 1.1c for Mini ITX models, or after BIOS 1.0b for Flex ATX models.
Video
I was able to easily replicate the issue today using my Supermicro SuperServer SYS-5028D-TN4T system today, as seen in the below video. I'm currently using BIOS 1.1c and IPMI 3.46. No data loss, only time and a bit of product confidence lost.
Jan 07 2017 Update
I am so glad I published this article, because comments are coming in that this problem seems to be more serious than I initially thought. Wanted to be sure folks see what blogthis wrote at 8:06pm ET today:
I have additional serious information on this M2 Samsung 960 PRO NVMe issue (1TB).
Important: This is WITHOUT ever entering the BIOS setup screen whatsoever.
Reproduced on: X10SDV-TLN4F (SYS-5028D-TN4T-12C). BIOS: 1.1c. IPMI: 3.46.
--Scenario-1:--
Configure internal SATA drive to boot any OS, in my case Win10x64; graceful shutdown.
Insert a bootable USB stick, in my situation one that has ESXi 6 installed.
BIOS auto-inserts the new boot device in its list, and changes its boot order to accommodate.
System hangs, reboots, then exhibits the problem discussed in this article (NVMe vanishes until unplug-replug physical power).
--Scenario-2:--
Configure IPMI to boot from external ISO, in my case IPMI 3.46 uses HTML5 Net Path.
BIOS auto-changes the boot device name and settings in its boot device list.
System then exhibits the problem discussed in this article (NVMe vanishes until unplug-replug physical power).
--Summary--
You can reproduce this issue multiple ways, WITHOUT ever even entering the BIOS setup. I feel this makes this issue worthy of escalation from a mere inconvenience to a seriously high priority issue. [edit: IPMI typo v1.46 corrected to v3.46]
Jan 08 2017 Update
Gert commented:
Thanks to reporting what you have discovered. I can also report, that it is not possible to run vSAN with Samsung SM961 MZVKW512HMJP. It is possible to configure vSAN and everything is reported okay, but when you try to created a VM and install OS, the connection to the NVMe drive is unstable/lost, so the installation or operation is hanging/stalled. When you use the NVMe drive as a a VMFS partition every works fine until you shut down the server and the NVMe drive disappear. I don't have the time to go through the log files, but I agree, that Supermicro has serious issues with NVMe drives from Samsung, so I warns against buying the drives for Supermicro servers. Supermicro has confirmed in an email to me that the drive is supported :-(
This is obviously a serious bug here. While it's unlikely full vSAN support (see VMware vSAN HCL) will arrive for consumer drives like the 960 series because of the generally lower TBW ratings than enterprise drives and lack of supercapacitors, the sort of nasty behavior that Gert reports makes even simple experimentation with an unsupported vSAN quite challenging.
There's more from blogthis, who also wrote:
Thank you for that, hopefully this helps too..
Did you catch another reader's post that m.2 NVMe SSD's are indeed supported as per this official SuperMicro link(?):
https://www.supermicro.nl/products/nfo/M.2.cfmDuplicate info on their USA/Global site:
https://www.supermicro.com/products/nfo/M.2.cfmAnd yes, SuperMicro is now listing official support for Samsung-brand m.2 NVMe drives:
https://www.supermicro.com/products/nfo/M.2.cfm?pg=Vendors&show=SELECT&type=Samsung#Vendor(You may want to pass along to your SuperMicro contact, that it appears SuperMicro is now specifically ADVERTISING Samsung m.2 NVMe compatibility.... so hopefully that lights a fire and motivates them to push this issue up their support-chain for the greater good).
Edit: The direct-link to Samsung m.2 NVMe drives has an "&" in their URL so it won't save here properly, but if you click the link, then the "Samsung" link for "Show Qualified M.2-SSD SKUs" on that page, you'll see the list..
AND: On that same page, note Supermicro has listed in the "..Systems which support M.2-NVMe-SSD: (Mini-Tower Systems.. SYS-5028D-TN4T).." I think that says pretty clearly this is intended to work.
I've changed the article title accordingly, from:
How to fix Samsung 960 PRO/EVO M.2 NVMe drive disappearance after any Supermicro Xeon D SuperServer BIOS change is saved
to:
How to workaround Samsung 960 PRO/EVO/SM961 M.2 NVMe drive disappearance after Supermicro Xeon D SuperServer BIOS 1.1c change or other boot changes
and added an appropriate warning sentence to the related articles.
Let's dive in here, first, check out the table of supported Samsung drive results:
https://www.supermicro.com/products/nfo/M.2.cfm?pg=Vendors&show=SELECT&type=Samsung#Vendor
Below is a cut-and-paste of the first column of that table, where I've done a Google search of samsung.com for you, which lands you on this general page:
http://www.samsung.com/semiconductor/products/flash-storage/client-ssd/
I've gone ahead with creating hyperlinks to each product page, and added the key bit of detail that Supermicro's doesn't say anywhere, these are SM951 and SM953 drives:
-
Samsung SM951 OEM M.2 NVMe SSD
Supermicro Part Number
HDS-M2M-MZVPV128HDGM000
HDS-M2M-MZVPV256HDGL000
HDS-M2M-MZVPV512HDGL000 - Samsung SM953 OEM M.2 NVMe SSD
HDS-M2M-MZ1WV480HCGL003
HDS-M2V-MZ1LV480HCHP0003
HDS-M2V-MZ1LV960HCJH0003
The point here is that Supermicro does actually have some 2015 vintage Samsung M.2 NVMe drives on their supported SSD lists. This sets a precedent that some Samsung M.2 NVMe drives have been tested by Supermicro in the past. The hope here is that Supermicro would test and support newer Samsung M.2 NVMe drives such as the Samsung 960 series of products (960 PRO/960 EVO/SM961/PM961).
Jan 15 2017 Update
This fascinating and sad saga is roaring along, with no end yet in sight. I thing this particular comment by Aaron sums things up very nicely:
Aaron Paul Braren • 9 hours ago
I've been trawling forums all day and the problem is reported pretty much being reported everywhere with the new Samsung 960s, might not be anything to do with Supermicro at all. My money is on either a bad firmware bug on the 960 end or problems related to NVMe 1.2 protocol implementation on most chipsets (could explain why the 950 with NVMe 1.1 works so well).
A lot of forum replies to posters with the problem are just putting it down to the usual newbie problems with NVMe booting/UEFI and stuff but it seems pretty obvious there is a common cause only affecting the 960s.I tested the bejesus out of my TN4T systems (bios 1.1b) last night and have simplified the observed symptoms to these:
SSD cannot be booted from or chosen as a boot device
any shutdown of the system will result in the SSD being missing next boot until you physically remove power (OS shutdown or IPMI behave exactly the same)
SSD seems to work fine writing and reading data
otherwise
any reboot of the device (OS, IPMI, physical button) retains detecting the SSD next bootHaving been burned by Samsung firmware issues on the 850 Evo my bet is they've messed something up rushing the release of this series. I fear this is going to be one of those problems that lurks around for a good 6-12 months before it gets fixed :(
damnit
and note that Aaron opened up Supermicro case# SM1701142769 as well.
I've changed the article title accordingly, from:
How to workaround Samsung 960 PRO/EVO/SM961/PM961 M.2 NVMe drive disappearance after Supermicro Xeon D SuperServer BIOS 1.1c change or other boot changes
to:
Samsung 960 PRO/EVO/SM961/PM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported
The PM961 isn't in the title of this article, since I've not yet found reports of problems with that drive. I would expect it will show the same issue though, since it's closely related OEM twin called the 960 EVO is afflicted with drive invisibility problems.
Jan 17 2017 Update
It does not appear that a proper resolution to this is coming near term. This naturally has folks asking about return options. I found the following info about my Amazon order's return window:
Samsung 960 EVO Series - 1TB PCIe NVMe - M.2 Internal SSD (MZ-V6E1T0BW)
Sold by: Amazon.com LLC
Return eligible through Jan 31, 2017
Jan 26 2017 Update
Samsung 960 EVO firmware 2B7QCXE7 arrives
Report of a new firmware first arrived earlier today from Peter M. Weidich in Copenhagen, so nice of him to leave his comment below. This gives 960 EVO owners hope that we might be might have a fix. No word yet on the 960 PRO or SM961 though, and I'm not sure the OEM SM961 can be updated, at least not as easily. All of Samsung's Consumer SSD downloads start here:
http://www.samsung.com/semiconductor/minisite/ssd/download/consumer.html
I reached Samsung and explained that I hadn't heard back from support in days, and that I had just passed my 30 day RMA request Window. They took note of my SR# and said I should have an authorization in 3-5 days. I then finally got a tip from them that allowed me to reach them by phone, it goes like this:
How to reach Samsung Consumer SSD Technical Support Representatives
- Dial 800-SAM-SUNG (800.726.7864)
- wait for automated attendant to stop talking
- say "Solid State Drive"
- wait for automated attendant to stop talking
- say "Representative"
- wait for automated attendant to stop talking
- say "Representative"
I found the fake keyboard noises and long "thinking" intervals kind of funny, then waited patiently after a ring was heard, with eventual success.
Here's what one Samsung Consumer SSD Technical Support said
- the polite person I spoke with told me I was the first 960 owner reporting this issue that he heard of
- the Samsung 960 EVO (any size) did get a new firmware called 2B7QCE7 this week, not sure which day, doesn't know if it will fix this
- there won't likely be an ISO bootable DOS media firmware updater for a few more weeks
- there was a boot ROM in the 950 PRO products for legacy systems that isn't included with the 960 line
- this 2B7QCXE7 is not available as a separate download at the usual site:
http://www.samsung.com/semiconductor/minisite/ssd/download/tools.html - this 2B7QCXE7 is available from within the new Samsung Magician, currently at 5.0.0.790, which it pulls down automatically from the web, direct download URL:
http://www.samsung.com/semiconductor/minisite/ssd/downloads/software/Samsung_Magician_Installer.zip - Samsung doesn't supply Release Notes for consumer drive firmware
I hope to try 2B7QCE7 2B7QCXE7 (strike-thru correction made on 1/29/2017) on one of my two 1TB 960 EVO SSDs tonight, and report back here with how it goes. My focus is primarily on whether the vanishing-drive after BIOS change (or graceful power-down) issue goes away. If it does, I would think it's highly likely the other 960 EVO owners reporting similar issues on various brands of motherboards will also enjoy the same easy fix. But I'm getting ahead of myself...
Jan 27 2017 Update
I have not been able to reproduce the loss of drive visibility in the slightly different drive configuration my system was in. This has meant it's taking me more time to get this test/upgrade fw/retest done than I had thought it would. I will have an update to this article later today.
Jan 28 2017 Update
Still not able to reproduce the issue reliably, so I didn't flash my firmware yet. But preliminarily, there appears to be good news reported by Dan below:
Updated Samsung 960s firmware with magician software today, now 960 nvme persisted through non-power cycling reboot. Still some issues with use as boot device though.
No word on which Samsung 960 he's referring to yet, 960 EVO or 960 PRO? I asked, awaiting response...
As for folks with home vSAN aspirations, I never claimed using consumer drives for such things would be a good idea, see Dispelling myths about VSAN and flash. for details about Power Loss Protection.
Samsung 960s should be fantastic for straight up ultra-fast VMFS datastores, assuming this firmware fixes arrive for both drives, to hopefully resolve these early adopter pains only some folks are experiencing. I'm basing this on my own Samsung 950 PRO 512GB experience that been performing admirably, holding up to a solid year of vSphere 6.0/6.5 abuse, see also:
Jan 29 2017 Update
The original article above erroneously stated the new firmware was 2B7QCE7, that was a typo, it's actually 2B7QCXE7, with all instances above now fixed.
Good news (preliminarily)
As seen on video below!
Today, I was able to replicate the drive disappearance after a simple reboot, when using ESXi 6.5 booted from USB. I then recorded video, performing the following steps, uncut, and the only edit was to 5X speed-up reboot segments that had no voice-over anyway:
- shut the Xeon D system / ESXi 6.5 system down gracefully
- removed all wall power for about 25 seconds
- restored power
- powered up
- showed 960 EVO was now visible again in ESXi 6.5
- rebooted over to a fresh copy of Windows 10
- downloaded and installed Samsung Magician 5
- updated the firmware on one 960 EVO 1TB from 1B7QCXE7 to 2B7QCXE7
- when prompted to shut down afterward, I did so
- powered up, verified 2B7QCXE7 in Magician (it didn't show right away)
- rebooted to ESXi 6.5
- my VMFS datastore on my Samsung 960 EVO 1TB was visible
- rebooted, drive still visible, success!
- rebooted, made BIOS change, drive still visible, success!
Not as good news
I still have not heard if a firmware update is available for the Samsung 960 PRO owners, and the outlook for SM961 owners afflicted is may be glum, since historically Magician won't update the firmware on OEM drives.
Next steps
I'm on a plane early tomorrow and away all week, so I won't be deciding on processing those 2 RMAs for my 960 EVOs anytime all that soon. I'll test a few more reboots tonight to continue to build confidence that this issue is resolved, updating the article if something new crops up.
Ideally, I'll sneak in some boot from NVMe tests as well, after first evacuating all the data off that VMFS datastore that's on there now. Luckily, that's easy with vSphere's Storage vMotion.
I'll also leave my other 960 at the original 1B7QCXE7 firmware for now, in case I need to test with Angelbird Wings or (new/just received) Amfeltec 2 way and 1 way riser cards later on. Even if those do work-around the issue though, I'd much prefer a firmware fix, of course.
Video #2
Jan 30 2017 Update
I used Veeam Agent for Microsoft Windows (beta) to restore my fresh Windows 10 installation onto the 960 EVO 1TB, and yes, boot from NVMe is working fine on this SuperServer, this is good! Since the firmware upgrade, no further incidents of drive disappearance have been seen.
With many reboots and BIOS changes and reboots tested without incident, including a full test of boot from NVMe functionality, I left it was safe to make the following additional article change in the title, from:
- Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported
to:
- Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported, firmware 2B7QCXE7 appears to fix EVO
Feb 10 2017 Update
The ISO way to save your day is here! Yes, Samsung has now published the 960 EVO's firmware update in ISO form, so no need to boot your VMware ESXi system over to Windows just to get your get your 960 EVO firmware updated Samsung_SSD_960_EVO_2B7QCXE7. Instead, you can get your Samsung_SSD_960_EVO_2B7QCXE7.iso
from the usual samsung.com/semiconductor/minisite/ssd/download/tools.html site that looks like this:
Alternatively, you can use these direct download links:
-
960 EVO Firmware
Enhanced compatibility and stability for systems supporting PCIe. - 960 EVO Firmware Installation Guide
Firmware Update Utility Installation Guide
I have not yet had a chance to test this ISO, but glad I kept my 2nd 1TB 960 EVO at the older firmware so I can give this method a try!
Still no info on 960 PRO firmware.
Feb 14 2017 Update
I have ordered an AOC-SLG3-2M2 to add to my home lab testbed:
I also have this very kind comment that I shared on Twitter:
"Thanks to Paul and http://Tinkertry.com for being the only place on the internet that was talking about this!" (960 EVO firmware fixes)
Jan 30 2017 by Dan Emmelhainz at TinkerTry here
Jul 24 2017 Update
I change the article's title from
Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported, firmware 2B7QCXE7 appears to fix EVO
to
Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance-after-reboot-or-bios-change workaround is to power cycle, bug fixed in PRO/EVO firmware 2B6QCXP7/2B7QCXE7
since months have testing seem to confirm that the firmware fixes really worked.
See also:
See also at TinkerTry
-
Samsung 960 EVO vs 950 PRO M.2 NVMe SSD - FLIR thermal video of VMware vSphere 6.5 Windows 10 VM cloning
Jan 03 2017 -
Where to buy your Samsung 960 EVO or PRO M.2 NVMe SSDs, featuring the latest ordering and availability info
Nov 30 2016 -
Supermicro SuperServer Xeon D / X10SDV IPMI Release Notes Changelog
includes known-issues
Nov 07 2016 - Supermicro SuperServer Xeon D / X10SDV BIOS Release Notes Changelog
includes known issues
-
New Supermicro BIOS 1.1c and IPMI 3.46 for Xeon D SuperServers features HTML5 iKVM
-
First look at Ubiquiti mPower Pro power strip, home lab pricing, enterprise features, uncertain future
Aug 17 2016 -
Samsung announces 960 PRO, 960 EVO, SM961, and PM961 NVMe M.2 SSDs in up to 1TB capacity
Jun 22 2016 - How to boot Windows 10 from NVMe based PCIe storage, featuring Samsung 950 PRO M.2 SSD in a Supermicro SYS-5028D-TN4T
Jan 15 2017