Rowhammer and Microarchitectural Attacks
Northeastern University
Department of Electrical and Computer Engineering
2017 PHD QUALIFYING EXAMINATION in Computer Engineering
Assigned on: March 3, 2017 Due on: March 13, 2017
Ludovico Ferranti
March 12th , 2017
Problem 1: Hardware Oriented Security and Trust
Problem 2: Wireless Networking
A report submitted in partial fulfillment of the requirements for the 2017 Qualifying Exam in Computer Engineering at Northeastern University.
Problem 1
- Introduction
The analyzed paper deals with Side-Channel attacks on mobile devices, providing a thorough categorization based on several factors. Side-channel attacks aims to extract sensitive information taking advantage of apparently harmless information leakage of computing devices, both from the SW and HW point of view. Side-channel attacks are initially categorized as active or passive, depending on the level of influence and involvement the attack has on the system. The concept of Software and Hardware attacks are identified to separate attacks that exploit, respectively, logical and physical properties of a device. Also the distance of an attacker is a relevant element in the analysis of Side-channel attacks. The authors distinguish among Local, Vicinity and Remote Side Channel Attacks, depending on how close is the attacker to the attacked device. A comprehensive list of examples for every type of attacks is given, along with a constructive discussion on possible countermeasures.
In this report, we will focus on the Rowhammer and Microarchitectural attacks that will be discussed in the following paragraphs.
1.a) Rowhammer Attack
As miniaturization of hardware architectures is pushed more and more, the density of memory cells of the DRAM drives the size of these cells to a dramatic reduction in dimensions. For the intrinsic properties of DRAMs, this leads to a decreases in the charge of single cells and could cause electromagnetic coupling effects between cells. Rowhammer attack takes advantage of this Hardware vulnerability.
1.a.i) Principle
The Rowhammer glitch takes place in a densely-populated cell hardware environment allowing an attacker to modify memory cells without directly accessing it. The aforementioned vulnerability in DRAM cells can be exploited by repeatedly accessing a certain physical memory location until a bit flips in an adjacent cell. A well-orchestrated Rowhammer attack could have devastating power, even getting to have root privileges. Rowhammer base its strength on a principle called Flip Feng Shui [2] where the attacker abuses the physical memory allocator to strike precise hardware locations and cause bits to flip in attacker-chosen sensitive data. Rowhammer can be either probabilistic [3] or deterministic [4]. The latter shows a greater impact as the lack of control of the first one could corrupt unintended data. The most effective Rowhammer attack is the double-sided Rowhammer [5], capable of having more flips in less time than other approaches.
1.a.ii) Architecture
The objective of Rowhammer attack is the DRAM. DRAM usually stores electric charges
in an array of cells, typically implemented through a capacitor and an access transistor. Cells are then organized in rows. Thus memory cells inherently have a limited retention time and they have to be refreshed regularly in order to keep their data. From an OS point of view, a page frame is the smallest fixed-length adjacent block of physical memory that maps an OS memory page. From a DRAM point of view, a page frame is just a contiguous collection of memory cells with a fixed page size (usually 4KB). With this in mind, triggering bit flips through Rowhammer is basically a race against the DRAM internal memory refresh scheme to have enough memory accesses and cause sufficient disturbance to adjacent rows.
1.a.iii) Instruction Set Architecture
The Instruction Set Architecture (ISA) is a functional specification of a processor programming interface. It is used to abstract over microarchitecture implementation details (e.g. pipelines, issue slots and caches) that are functionally irrelevant to a programmer. Even though it is practically transparent, the microarchitecture incorporates a hidden state, which can be observed in several ways. To test whether Rowhammer can be exploited, a precise knowledge of memory cells dimension is crucial. In mobile devices, ARM processor represents the most widespread and used microprocessor. In [4] the authors determine the minimum memory access time that still results in bit flips by hammering 5MB of physical memory while increasing the time between two read operations by means of inserting NOP instructions. The rows are all initialized to a certain value, therefore all the changes are due to Rowhammer. Results show that up to 150 bit flips happen per minutes with around 150 ns read time.
1.a.iv) Procedure
The Rowhammer attack procedure is a combination of three main system primitives:
- P1. Fast Uncached Memory Access: Enable attackers to activate alternating rows in each bank fast enough to trigger the Rowhammer bug;
- P2. Physical Memory Massaging: The attacker tricks the victim component into storing security-sensitive data (e.g., a page table) in an attacker-chosen, vulnerable physical memory page.
- P3. Physical Memory Addressing: To perform double-sided Rowhammer, an attacker needs to repeatedly access specific physical memory pages.
Mobile devices have Direct Memory Access (DMA) mechanisms that “facilitates” the implementation of P1 and P3. In particular, Android devices run ION, a DMA that allows user unprivileged apps to access uncached physically contiguous memory. To enforce P2 the attacker tricks the physical memory allocator built in Linux (buddy allocator) so as to partition the memory in a predictable way. Accurately selecting the dimensions of memory chunks to allocate, memory cells can be exhausted through Phys Feng Shui. Once the position of Page Table Pages (PTPs) and Page Table Entries (PTEs) is indirectly known, double-sided Rowhammer is performed. Once the desired flip triggered, write access is gained to the page table by mapping into the attacker address space. Modifying one of the attacker PTPs, any page in physical memory can be accessed, including kernel memory.
1.b) Microarchitectural attack
ÂThe evolution of hardware architecture lead to a wide use of cache memories. Having several levels of cache between a CPU and the main memory, helps optimizing the memory access time with respect to the clock frequency. Microarchitectural attacks take advantage of the timing behavior of caches (e.g. execution times, memory accesses) to read into sensitive information.
In [6] a comprehensive survey that presents microarchitectural attacks is given.
1.b.i) Principle
Microarchitectural attacks are based on different cache exploitations. Among them, three main methods are identified:
- Prime + Probe: The attacker fills one or more sets of the cache with its own lines. Once the victim has executed, the attacker accesses its previously-loaded lines, to probe if any were evicted showing the victim have modified an address mapping the same set.
- Flush + Reload: It’s the inverse of Prime+Probe where the attacker first flushes a shared line of interest. Once the victim has executed, the attacker then reloads the evicted line by touching it, measuring the time taken. A fast reload indicates that the victim touched this line (reloading it), while a slow reload indicates that it didn’t.
- Evict + Time: The attacker first tricks the victim to run, through the preload of its working set, and establish a baseline execution time. In a second step the attacker then eliminates a line and runs the victim again. The difference in execution time indicates that the analyzed line was accessed.
All microarchitectural attacks are a combination of those previously explained principles. Another noteworthy approach is causing Denial of Service (DoS) saturating the lower-level cache bus [7].
1.b.ii) Architecture
As mentioned before, microarchitectural attacks objective is the cache. Caches are organized into lines. A cache line holds a block of adjacent bytes that are taken from memory. Cache are further organized in levels. Each level has a different size and is carefully selected to balance service time to the next highest (smaller in dimension therefore faster) level. Caches can enforce either Virtual or Physical addressing. In Virtual addressing, L1 cache level stores the index of virtual-to-physical addresses.
1.b.iii) Instruction Set Architecture
The inference process of the internal state of the cache is a key parameter to perform devastating microarchitectural attacks. Analyzing the ISA of a cache can provide an attacker with useful information about the hardware structure. Several different states can be exploited and are briefly summarized here:
- Thread-shared State: cache stores information that are shared between threads. Accessing them could lead to performance degradation of the involved threads.
- Core-shared state: Analyzing L1 and L2 cache contention usage between competing threads, it is possible to infer the encryption keys for algorithm used in internal communication (e.g. RSA, AES).
- Package-shared State: Running a program concurrently in different cores residing in the same package, could lead to the saturation of that package’s last-level cache (LLC). The saturation affects all the lower levels, exposing sensitive data.
- Numa-shared State: Memory controllers memory in multi-core systems are exploited to enforce DoS attacks.
1.b.iv) Procedure
A plethora of attacks are presented in [6], therefore the procedure of the Flush + Reload for Android systems using ARM processors [8] is discussed.
The most powerful methods to perform Flush + Reload is to use the Linux System Call clflush. However it is provided by the OS on x86 systems, on mobile devices using ARM this function is not available. A less “powerful” version of it is clearcache and is used in [8].
When the attack starts, the service component inside the attacker app creates a new thread,
which calls into its native component to conduct Flush-Reload operations in the background:
 Flush: The attacker invokes clearcache to flush a function in the code section of this shared line.
 Flush-Reload interval: The attacker waits for a fixed time for the victim to execute the function.
 Reload: The attacker executes the function and measures the time of execution. With a small execution time, the function has been executed (from L2 cache) by some other apps (possibly the victim’s).
In [8] the authors show that this method is capable of detecting hardware events (touchscreen interrupts, credit card scanning) and also tracing software executions paths.
1.c) Rowhammer vs Microarchitectural attack
Following the categorization used in [1], both Rowhammer and Microarchitectural attacks are active software attacks that exploits physical properties of the victim device. In particular Rowhammer uses the coupling effect of DRAM cells while Microarchitectural attacks gather sensitive information through the analysis of cache timing. The two attacks act at two different levels: while Rowhammer needs to work fast on an uncached DRAM, Microarchitectural attack objective are cache memories that are usually SRAM. Both of them can be applied to desktop and to mobile OS [4][8], as well as cloud environments.
1.c) Mobile vs Desktop attacks
Mobile devices are inherently more vulnerable than Desktop computers. Their portability and close integration with everyday life make them more available to attackers. Moreover, apps are way more easy to install on mobile devices and general carelessness helps hackers in installing malicious software. Also, with respect to desktop computers, mobile phones have several sensors that can be exploited to gather information about users’ behavior. But from a OS point of view, mobile OS are way more limited than Desktop OS. Specifically, Rowhammer suffers from the limited subset of features available in desktop environments (e.g. no support for huge pages, memory deduplication, MMU paravirtualization). Same limitations happens in Microarchitectural attacks for ARM , where clflush function to perform Flush + Reload is not supported.
2) NAND Mirroring
NAND mirroring is categorized in [1] as an active local Side Channel attack that exploits physical properties out of a device chip. In particular, in [13] a NAND mirroring attack is performed on an iPhone 5c. The security of Apple iPhone 5c became an objective of study after FBI recovered such mobile device from a terrorist suspect in December 2015. As FBI was unable to retrieve data, NAND mirroring was suggested by Apple technology specialists as an optimal way to gain unlimited passcode attempts so as to bruteforce it. As the encryption key is not accessible from runtime code and it’s hardcoded in the CPU, it is impossible to brute-force the Passcode key without the getting at the hardware level. In iPhones such memory is a NAND flash memory. In NAND memories the cells are connected in series which reduces the cell size, but increases the number of faulty cells. For this reason, external error correction strategies are required. To help with that, NAND memory allocates additional space for error correction data. In [13] the authors desoldered the NAND memory and mirrored it on a backup file. Although this method seems promising, several challenges were encountered by the authors, who had to balance some electrical anomalies with additional circuitry and also mechanically plug in a PCB at every attempt of bruteforcing the iPhone code. Such method could be applied to Desktop Computers, but the complexity of NAND memories would be way higher and it may unfeasible, in terms of time and complexity, to perform such attack.
- Countermeasures
Side-channel attacks are discovered and presented to the scientific world on a daily basis and suitable defense mechanisms are often not yet implemented or cannot be simply deployed.
Even though countermeasures are being studied, it looks like a race between attackers and system engineers trying to make systems more secure and reliable.
3.a) Rowhammer Attack
Countermeasures against Rowhammer have already been thoroughly explored, but not many are actually applicable in the mobile context. Powerful functions as CLFLUSH [9] and pagemap [10] have been disabled for users apps, but Rowhammer can still be performed through JavaScript. Furthermore, analyzing the cache hits and miss could raise a flag of alarm, but methods such as [4] don’t cause any miss. Error correcting codes aren’t even that efficient in correcting bit flips. Most hardware vendors doubled the DRAM refresh rate, but results in [11] show that refresh rate would need to be improved by 8 times. Moreover, the power consumption would increase, making this solution not suitable for mobile devices. In Android devices Rowhammer attacks’, the biggest threat is still user apps being able to access ION. Google is developing mechanisms so as to avoid it to happen in a malicious way. One solution could be to isolate ION regions controlled by user apps from kernel memory, in order to avoid adjacent regions. But even in the absence of ION an attacker could force the buddy allocator to reserve memory in kernel memory zones by occupying all the memory available for user’s apps. Prevention of memory exhaustion need to be considered to avoid Rowhammer countermeasures workarounds.
3.b) Microarchitectural Attack
As the final goal of microarchitectural attack is deciphering cryptographic codes (e.g. AES), a straightforward approach to protect them would be to avoid having tight data-dependencies (e.g sequence of cache line accesses or branches must not depend on data). If they depend on private data, the sequence, the program is destined to leak information through the cache. The “constant-time” implementation of modular exponentiation approach [12] represents a good way to fight data dependency. These are more general rules to follow, whether to combat specific attacks such as Flush + Reload in mobile devices with ARM [8]. Disabling the system interfaces to flush the instruction caches, the Flush-Reload side channels can be removed entirely from ARM-
based devices, but feasibility and security of this method haven’t been studied yet. Also, by removing system calls to have accurate time from Android could mitigate all timing side channels. Another way to fight Flush + Reload would be by preventing physical memory sharing between apps, but that would cause the memory footprint to expand and therefore exposing the system to other Sidechannel attacks.
Problem 2
- Protocol Design
The proposed solution for Problem 1 is represented in Figure 1. To solve this problem, four moments in which the Path-centric channel assignment algorithm from [14] are identified:
- : B receives a packet on its Channel 1 and, as an interferer is acting on Channel 1 on node A, B can’t transmit. B1 is the active subnode, B2 and B3 are inactive subnodes.
- : B switches from Channel 1 to Channel 2 (total cost: 3), and forward the packet to A through Channel 2 (total cost: 3+6=9). B2 is the active subnode, B1 and B3 are inactive subnodes. A2 is the active subnode, A1 and A3 are inactive subnodes.
- : A can either transmit on Channel 2 and Channel 3, but transmitting on Channel 2 is more expensive, so it switches to Channel 3 (total cost: 9+3=12). A3 is the active subnode, A1 and A2 are inactive subnodes;
- : A send the packet at C through Channel 3 (total cost: 12+2=14).
- Network Applications
In our K-out-of-N system we are interested in understanding how much is a probability of getting errors in sensing from N sensor, where K represent a threshold for accepting a reliable measurement. This reasoning follows the binomial distribution:
In our case at each node, errors can be induced by a false measurement (with probability ) or by channel flipping a bit during the over-the-air time (with probability ). Therefore for our N-out-of-K nodes system we have:
Assuming that and are independent, the final probability of having an erroneous detection is a linear combination of the two:
For completion, the probability of a successful measurement and transmission is .
- Network Standards
Spectrum scarcity is a widely known problem in the world of wireless communications. The explosive wireless traffic growth pushes academia and industry to research novel solutions to this problem. Deploying LTE in unlicensed spectrum brings up the conflict problem of LTE-WiFi coexistence. This conflict can be analyzed with a close look at 802.11 MAC level. In Figure 2, a comparison between WLAN MAC layer and what is “casually” called MAC in LTE is depicted [19].
WiFi 802.11 uses CSMA/CA to regulate accesses in MAC layer. In CSMA, a node senses the traffic before transmitting over the channel. If a carrier signal is sensed in the channel, the node waits until it’s free. In particular, in CSMA/CA the backoff time of a node is exponential.
In LTE, multiple access is handled through TDMA (Time Division Multiple Access) meaning that all accesses to the channel are scheduled.
Historically LTE has been developed for environments with little interference, while WiFi combats interference in ISM with CSMA. Using them in the same spectrum would see LTE dominating over WiFi, causing sever performance degradation in both the cases. Several solutions has been proposed and implemented in the past years. Qualcomm [15] and Huawei [16] proposed a separation in time and frequency domain. In [17] a Technology Independent Multiple-Output antenna approach is presented so as to clean interfered 802.11 signals. This method was made more robust in [18] but still they relied on the fact that at least one signal from the two technologies had a clear reference. Traffic demands analysis could help mitigate the performance drop due to interference, but even with an accurate demand estimation, only one can be active at a certain time and frequency, limiting the overall throughput.
When interference is high, packet transmission is corrupted and error correction strategies are needed.
In WiFi, standard Forward Error Correction (FEC) is used. In FEC, a redundancy is added to the transmitted packet, so as a receiver can detect and eventually correct the wrong received bits.
On the other hand, LTE uses HARQ (Hybrid-Automated Repeat reQuest) which is a combination of FEC and ARQ. In the standard implementation of ARQ, redundancy bits are embedded in the packets for error detection. When a corrupted packet is received, the receiver request a new packet to the transmitter. In HARQ, FEC codes are encoded in the packet, so as the receiver can directly correct wrong bits, when a known subset of errors is detected. If an uncorrectable error happens, the ARQ method is used to request a new packet. Hybrid ARQ performs better than ARQ in low signal conditions, but leads to an unfavorable throughput when the signal is good.
To better see this interference behavior, a small simulation has been performed using ns3, in particular the LAA-WiFi-coexistence library [20]. The scenario was built using two cells whose radio coverage overlaps. The technologies used are LTE Licensed Assisted Access (LAA) operating on EARFCN 255444 (5.180 GHz), and Wi-Fi 802.11n operating on channel 36 (5.180 GHz). Two base station positioned at 20 mt distance from another, and they both have one user connected to them at a distance of 10 mt. Both BS are connected to a “backhaul” client node that originates UDP in the downlink direction from client to UE(s). In Figure 3(a) and Figure 3(b), we see how the throughput and the number of packets received by the WiFi BS varies when the two BSs coverage area overlaps and when they are isolated (e.g. their distance is 10 Km). Other scenarios were tested: Figure 4 (a) represent the scenario of two WiFi BSs and Figure 4(b) two LTE BSs. It is possible to see the behavior of the two technologies.
Table 1
Throughput A |
Throughput B |
Packet loss A |
Packet loss B |
|
Distant BSs Figure 3(a) |
73.78 Mbps |
77.55 Mbps |
4.6% |
0% |
Interfering BSs Figure 3(b) |
73.62 Mbps |
4.95 Mbps |
4.8% |
93% |
Two WiFi BSs |
53.45 Mbps |
54.41 Mbps |
27% |
25% |
Two LTE BSs |
30.88 Mbps |
30.4 Mbps |
60% |
61% |
In Figure 4(a) we can see how the channel is split between the two BSs and the Carrier Sensing Multiple Access keeps a high throughput and a low packet loss.
In Figure 4(b) we can see how the interference between the two LTE cells affects the throughput and gives a high packetloss.
In Table 1 results from simulations are summarized.
Reference
[1] R. Spreitzer, V. Moonsamy, T. Korak, S. Mangard. “Systematic Classification of Side-Channel Attacks on Mobile Devices” ArXiv2016
[2] K. Razavi, B. Gras, E. Bosman, B. Preneel, C. Giurida, and H. Bos. “Flip Feng Shui: Hammering a Needle in the Software Stack”. In Proceedings of the 25th USENIX Security Symposium, 2016.
[3] D. Gruss, C. Maurice, and S. Mangard. “Rowhammer.js: A Remote Software-Induced Fault
Attack in JavaScript”. In Proceedings of the 13th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), 2016.
[4] V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss, C. Maurice, G. Vigna, H. Bos, K. Razavi, and C. Giuffrida, “Drammer: Deterministic Rowhammer Attacks on Mobile Platforms,” in Conference on Computer and Communications Security – CCS 2016. ACM, 2016,
pp. 1675-1689.
[5] Z. B. Aweke, S. F. Yitbarek, R. Qiao, R. Das, M. Hicks, Y. Oren, and T. Austin. “ANVIL:
Software-Based Protection Against Next-Generation Rowhammer Attacks“. In Proceedings of the 21st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016.
[6] Ge, Q., Yarom, Y., Cock, D., & Heiser, G. (2016).”A survey of microarchitectural timing attacks and countermeasures on contemporary hardware“. Journal of Cryptographic Engineering
[7] Dong HyukWoo and Hsien-Hsin S. Lee. “Analyzing performance vulnerability due to resource denial of service attack on chip multiprocessors“. In Workshop on Chip Multiprocessor Memory Systems and Interconnects, Phoenix, AZ, US, 2007.
[8] X. Zhang, Y. Xiao, and Y. Zhang, “Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices” in Conference on Computer and Communications Security – CCS 2016. ACM, 2016, pp. 858-870.
[9] M. Seaborn and T. Dullien. “Exploiting the DRAM Rowhammer Bug to Gain Kernel Privileges.” In Black Hat USA (BH-US), 2015.
[10] M. Salyzyn. AOSP Commit 0549ddb9: “UPSTREAM: pagemap: do not leak physical addresses to non-privileged userspace“. http://goo.gl/Qye2MN,November 2015.
[11] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu. “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors”. In Proceedings of the 41st International Symposium on Computer
Architecture (ISCA), 2014.
[12] Ernie Brickell. “Technologies to improve platform security“. Workshop on Cryptographic Hardware and Embedded Systems’11 Invited Talk, September 2011.
[13] S. Skorobogatov, “The Bumpy Road Towards iPhone 5c NAND Mirroring,” arXiv ePrint Archive, Report 1609.04327, 2016.
[14] Xin, Chunsheng, Liangping Ma, and Chien-Chung Shen. “A path-centric channel assignment framework for cognitive radio wireless networks” Mobile Networks and Applications 13.5 (2008): 463-476.
[15] Qualcomm wants LTE deployed in unlicensed spectrum. http://www.fiercewireless.com/story/qualcomm-wants-lte-deployed-unlicen%
sed-spectrum/2013-11-21
[16] Huawei U-LTE solution creates new market opportunities for mobile operators. http://www.huawei.com/ilink/en/about-huawei/newsroom/ press-release/HW 3%27768.
[17] S. Gollakota, F. Adib, D. Katabi, and S. Seshan. “Clearing the RF smog: making 802.11 robust to cross-technology interference”. In Proc. of ACM SIGCOMM, 2011.
[18] Y. Yubo, Y. Panlong, L. Xiangyang, T. Yue, Z. Lan, and Y. Lizhao. “ZIMO: building cross-technology MIMO to harmonize Zigbee smog with WiFi flash without intervention”. In Proc. of MobiCom, 2013.
[19] Long-Term Evolution Protocol: How the Standard Impacts Media Access Control Tim Godfrey WMSG Advanced Technology, http://www.nxp.com/files-static/training_presentation/TP_LTE_PHY_MAC.pdf
[20] https://www.nsnam.org/wiki/LAA-WiFi-Coexistence