> We send it in 2 parts: first comes GET / HTTP/1.0 \n Host: www.you and second sends as tube.com \n .... In this example, ISP cannot find blocked word YouTube in packets and you can bypass it!
If you talk to anyone from China that this is how you bypass (HTTP) "deep packet inspection", it would sound incrediblely naive. I'm not criticizing here, thanks for developing an anti-censorship tool, but my point is, any DPI that can be bypassed in this way is simply too outdated, it's far from the state-of-art threats we are facing worldwide.
What China does today is what your ISP/government is going to do tomorrow, when they upgrade the equipment. Learning a history lesson from China, can help providing insights for developers in other countries to know where this cat-and-mouse game is heading to...
> paulryanrogers: So basically it just does two things: carefully chunking HTTP header packets and encrypted DNS? Not sure this will work for very long.
Of course it will not. I'll explain why.
---
Literally, the same technique was used in China during the early days of Great Firewall, around 2007. At that time, the "censorship stack" was simple, basically, it had...
* A brute-force IP blocking list
This is a constantly updated list of IP addresses of "unwanted" web servers, such as YouTube or Facebook. They are distributed via the BGP protocol, just like how normal routing information is distributed. Once your server enters this blacklist, nothing can be done. Not all unwanted websites enter the list due to its computational/storage costs.
* A DNS poisoning infrastructure
A list of "unwanted" domain names are maintained. These domain names are entered to the national DNS root server as records with bogus IP addresses. It was used more widely than the IP blocklist, since it has zero cost to operate, but it can only block websites in the list and it takes time for the censor to be aware of a target's existence.
* A naive keyword filtering system.
All outgoing international traffic is mirrored for inspection. A keyword inspection system attempts to match the URLs in HTTP requests against a blacklist of unwanted keywords. Rumors said the string matching was performed by hardware in ASIC/FPGA, allowing enormous throughput.
* A TCP reset attack system
Once an unwanted TCP connection is identified by the keyword inspection system, the TCP Reset attack system fires a bogus RST packet to your computer, fooled by the packet, your operating system will voluntarily work against you and terminate the connection, saving the censors' CPU time. The keyword filtering system paired with reset attack was the preferred way to carry out censorship.
That's all. The principle of operation was simple and easy to understand. So what were the options for bypassing it? There were a lot. To begin with, the blocked IP addresses were blocked, you could do nothing about it. But in the earliest day, accessing them was as simple as finding a random HTTP proxy server. Later, the inspection system was upgraded to match HTTP proxy requests. Then, you could simply play some magic tricks with your HTTP requests, like the example in the beginning, so that your request wouldn't trigger a match. Around the same time, in-browser web proxy tools became popular, they were PHP scripts running on a web server that fetched pages. However, they became useless when the keyword matching system was upgraded to match the content of the entire page, not simply the requests (remember, few sites had HTTPS). At this point, all plaintext proxy techniques and HTTP request "massaging" techniques were all officially dead.
Some naive rot13-like techniques were later implemented to some web proxies, HTTPS web proxies were also a thing, but they saw limited use.
* New: A complete keyword filtering system - Inspect all HTML pages (Was: A naive keyword filtering system)
Another target to attack was the DNS poisoning system, sometimes all you needed was a correct IP address, since not all IPs were included in the blocklist due to the costs. Initially, all one needed to do was modifying one nameserver to 8.8.8.8. However, countermeasures were quickly deployed. A simple countermeasure was rerouting 8.8.8.8 to the ISP's nameserver, continued feeding the same bogus records to you. Nevertheless, there were always alternative resolvers to use. So the system was upgraded to provide a DNS spoofing infrastructure - at the instant an outgoing DNS packet is detected, the spoofing system would immediately answer with a bogus packet. The real packet would arrive at a hundred milliseconds later, but it would be too late, your OS had already accepted the bogus result.
And ironically, even if DNSSEC was widely supported (it was not), it couldn't do anything but returning an SERVFAIL, since DNSSEC can only check whether the result was true, dropping the bogus packet and accepting the true one was outside the capabilities of a standard DNSSEC implementation.
* New: A Real-time DNS Spoofing System
Better tools were developed later, that acted like a transparent resolver between the upstream resolver and your computer, that identified the bogus results to drop them, but the use was limited. Also, at this point, the IP blocklist has been greatly expanded. Even if a correct IP could be obtained, it was still inaccessible. Around 2008 or so, a special open source project was launched by developers in China - /etc/hosts list, whenever someone found a Facebook IP address that was not in the blocklist yet, one sent patches to the project. There were also shell scripts to keep your list up-to-date.
However, a /etc/hosts list was useful but its usefulness was limited. First, it was a matter of time before a new IP address was blocked. Also, a working IP address still was restricted by the same keyword filtering system.
* New: Expanded IP Blocklist.
Some people also realized that the firewall was only able to terminate a connection by fooling the operating system. Soon, iptables rules for blocking RST packets appeared in technical blogs. By ignoring all RST packets, one essentially gained immunity at the expense of network stability, as legitimate RSTs were also ignored. Soon, the censorship responded by upgrading the reset attack system, so that RST packets were sent to both directions - even if you ignored RST, the server on the other side would still terminate it. Also, RST was now "latched-on" for a limited time, when the first RST was triggered, the target remained inaccessible in several minutes.
* New: Bidirectional TCP Reset Attack
* New: "Latched-On" Reset Attack
When HTTPS was enabled, it was impossible to perform keyword inspection in the HTML pages - at this time, censor sometimes still wished to allow partial access, only triggering the block when detected a match. This strategy cannot be applied to HTTPS, since the content was all encrypted. Some people realized some popular websites supported HTTPS but not enabled it by default, such as Wikipedia. The Great Firewall responded by implementing a HTTPS certificate matching subsystem in the keyword matching system, when a particular certificate was matched, you were greeted by a TCP RST packet (this system has been removed later when HTTPS saw widespread use).
* New: Certificate-Based HTTPS Blocking System
At this point, around 2010, the only reliable way to browse the web was using a fully-encrypted proxy, such as SSH dynamic port forwarding or a VPN, which required purchasing a VPS from a hosting provider. SSH was more popular due to its ease of use - all one needed was finding a SSH server and ran "ssh -D 1337", so that port 1337 would be a SOCKS5 proxy provided by OpenSSH. OpenVPN was reserved for heavy web users, since it's more difficult to setup, but had better performance.
From the beginning to the 2010s, anyone who was using VPN or SSH can enjoy reliable web browsing (only be disturbed from time to time due to the overloaded international bandwidth). However, the good days came to an end when the Great Firewall implemented a real-time traffic classifier, it was first applied to SSH. It observed the SSH packets in real-time and attempted to identify whether an overlay proxy traffic was carried on top of it. The blocking mechanism was enhanced as well, now it was able to dynamically inserting null route entries when it decided that the communication with a server was unwanted. The IP blocking system was also improved, now it was able to collect unwanted IP addresses at a faster rate with help of the traffic classifier. If you used SSH as a proxy, after a while the connection would be identified, with all packets dropped, repeated offenses would earn you a permanent IP block. For VPNs, the firewall implemented a real-time classifier to detect OpenVPN's TLS handshakes. When handshakes were detected, a RST packet is sent (or if you use UDP, all packets are dropped). Repeated offenses would earn you a permanent IP block as well.
New: Real-Time Traffic Classifier
New: Real-Time IP Blocking
New: Actively Updated IP Blocklist using Classifiers as Feedback
Traffic classifiers would later be expanded to cover HTTPS-in-HTTPS as well, so a naive HTTPS proxy wouldn't work, and possibly have other features, it's a mystery.
BTW, after Google exited from China, the HTTPS version was immediately blocked, and for HTTP, a ridiculous keyword blocklist was enforced and it generated huge amount of false-positive RSTs for harmless words, apparently a deliberate decision, preferring false-positive over false-negative. Eventually, all Google services had been permanently blocked. The IP block became extensive, major websites have been completely blocked, the unblocked sites were only exceptions. For most people, the arrival of widely-used HTTPS was too late and useless, since IPs were blocked. And as mentioned, SSH and VPNs were classified and blocked as well.
This was when a new generation of proxy tools started to gain popularity,
Shadowsocks being the most well-known example. From a cryptographic perspective, it was a big step backwards. Since Diffie-Hellman handshakes were subjected to traffic classifiers, these tools only used symmetric encryption with fixed keys. Their encryption protocols were ad-hoc, and not cryptographically robust. While it was a matter of fact that nobody could break a simple AES-CBC encryption, nobody would trust these tools for one's confidential data as well (for example, AEAD was unsupported for many years). But since the goal was bypassing censorship, not secrecy, they became extremely popular. It was not seen as an major issue, since the widespread use of HTTPS offered robust secrecy. DNS encryption was still essential (usually the SOCKS-5 interface was provided by these tools, SOCKS-5 can be configured to pass the original domain name to the proxy, the proxy can resolve the names inside its encrypted connection), but became less useful when used on its own, since the IP blocklist was huge by the time.
The landscape of the Internet has changed dramatically since 2013 as well. The universal adoption of HTTPS eventually rendered all keyword-based inspection useless. A few sites were considered too large to block, including Amazon AWS and GitHub. One side of the battle started becoming a mutual assured destruction game - either allowing people to exploit a large platform to publish uncensored material, or blocking the platform altogether and creating economic damages. I am confident that the MAD game will continue to play out, however, Russia's response to AWS domain fronting showed this strategy could fail if major platforms don't want to cooperate, it was a bit worrying, at least. But anyway, encrypting SNIs should be the next step.
But I digressed, back to Shadowsocks, et al, since the state was eliminated (pun intended), all one could see was encrypted raw TCP packets, there was no reliable way for the firewall to classify Shadowsocks-like tools for many years (until recently, possibly by exploiting cryptographic-related issues, but we are not sure how successful it is). But the censorship system started getting weirder and weirder - sometimes, connections break without any apparent reason at all, sometimes data rate was extremely low, sometimes a few IPs were blocked mysteriously, and so on, but life kept going on. There were several possible hypotheses, one was that the traffic classifiers were getting more and more functionalities, and occasionally they could hit something. Another was that the TCP RST was sent in a probabilistic manner to suspected endpoints to degrade reliability. The only thing that could be confirmed was the significantly increased use of QoS by the ISPs, so that all unknown protocols would be classified as "low priority", degrading the reliability of all anti-censorship tools. At this point, bad connectivity and censorship was indistinguishable.
It's safe to say, that at this point, nobody ever understands how the Great Firewall of China work anymore. This is the end of our story.
For simplicity, I skipped many less used techniques, such as Tor's domain fronting, or CDN-based circumvention, or obfsproxy4 that featured Diffie-Hellman keys indistinguishable from random strings, and possibly others. I'm well-aware of them. But it's expected that, unless everything is encrypted and all infoleak is plugged (then, we will start playing the mutual assured destruction game), all these tools are doing is an endless cat-and-mouse game.
Developers of anti-censorship tools need to consider countermeasures based on what China is currently doing. So that when the same techniques used by China are implemented by their own ISPs in the future, they are always prepared to act.
Fantastic breakdown on the recent history of censorship in China, thanks for sharing it.
You mentioned that for many of these efforts bypassing censorship trumped secrecy concerns. Is this still the case?
If I were a citizen regularly bypassing censorship of an authoritarian government, I’d be concerned for my safety if it was well documented that I regularly accessed censored material.
From what I gather, the regime doesn't really intend to arrest anybody who simply regularly accesses western websites. Some big corps also have their special VPN channels to access foreign websites so that they can do business normally. Hell, even the foreign ministry spokesperson posts regularly on Twitter. What they want is to stop this floodgate of information being opened to the common mass, that's when things could get problematic.
People are arrested for producing things that are deemed potentially destabilizing for the regime/country, but nobody as far as I know ever got arrested for accessing blocked materials.
Of course, if you are also actively producing content it would be much wiser to camouflage your identity much better, if you can. That's when the secrecy becomes a major concern.
> You mentioned that for many of these efforts bypassing censorship trumped secrecy concerns. Is this still the case?
Yes, it's still the case, but how bad is a matter of debate.
To make it specific, we can use two criteria to evaluate anti-censorship circumvention tools: (a) How cryptographically robust it is? (secrecy) and (b) How well they can avoid detection? (visibility) The situation is complicated, since they are related but independent.
First, OpenVPN has good secrecy, but high visibility, since it's handshake is obvious, and it even led to a complete block. Second, everything that exploits a bug in the DPI system will have circumvention capabilities, but bad secrecy and high visibility - ultimately, the fact that a TCP connection has been created cannot be hidden, and the fact that you are bypassing censorship will be clear - on the other hand, high visibility doesn't necessarily mean it can be blocked (fixing such a bug can be difficult). [0] Third, a protocol with cryptographic flaws (such as not providing good protection against ciphertext modification) can otherwise have low (or high) visibility, but allows attackers to compromise infosec in some ways. Finally, Tor has circumvention capabilities, excellent secrecy, but high visibility - it's anonymity depends on its large anonymity set, not hiding the fact that someone is using it (which is unpractical), and its network is completely open.
Primarily, my personal concern is whether the circumvention tools are cryptographically robust, so that my secrecy won't be compromised when I browse a HTTP website (The NSA can always wiretap at the exit node, but at least it should not be vulnerable at the entry point). I don't trust these tools, if cryptographers kept discovering implementation flaws from established protocols, why should I trust a tool with ad-hoc crypto? For example, Shadowsocks did not have any forms of forward secrecy, if someone is recording all the outgoing traffic, and later take control over my computer, using a single key allows the decryption of everything. On the other hand, some people argue that flaws may exist, but exploitable ones are rare. But still, I think it's a bad practice to lower the standard of secrecy. If I have to use them, I'll run an additional layer of TLS on top of these tools, so that my connection will always be as secure as TLS, while the outer layer provides circumvention. Fortunately, most people are protected anyway by HTTPS.
> If I were a citizen regularly bypassing censorship of an authoritarian government, I’d be concerned for my safety if it was well documented that I regularly accessed censored material.
If your goal is totally avoid detection of using any circumvention tool at all, it's going to be much harder. Many privacy tools are developed to exercise one's rights to privacy, but they are not designed to avoid detection. On the other hand, the same tools are usually promoted for citizens in oppressive regimes. This can be dangerous. For example, a full-disk encryption software that includes clickable links to its official website, with automatic update, what can possibly go wrong? If the regime is authoritarian enough, the regime can simply make a list of all users that have accessible to these servers before and hunt them down.
A huge amount of work needs to be done to fix this problem. However, if you are in China, it's not that dangerous. In the authoritarianism spectrum, there are Kazakhstan, Iran, China, Russia, and others. However, China is nowhere close to the extreme. Being economically open at large, the censorship of information in China cannot, and was never meant to prevent all forms of access. The purpose is merely to increase the costs from doing so. In fact, criticisms of the government in domestic social media are sometimes tolerated, often the censorship only kicks in when it became popular.
By installing an Internet censorship system in China, the consequences are: (a) Most people are not interested in accessing block websites, at not in a regular basis, even if methods are available. (b) Accessing information doesn't necessarily mean a change of point-of-view, especially when the opinions are completely different from one's education, personal experience, or worldview. (c) Foreign platforms cannot gain any significant influence, even if they are accessed to many. For example, the Chinese Twitter community is an interesting place (if one digs deeper below the political flamewars at the surface), you could see people coming from the entire political spectrum. There are even jokes, such as "Twitter - the future of governance in China", but they are irrelevant in the big picture. (d) IT workers are required to use Google and other blocked sites for doing one's job.
Under this background, regularly bypassing censorship in China just for web browsing is perfectly safe [1]. If you want the best invisibility, I recommend you to use the most popular VPN service used by the highest number of people, and run your own encrypted tunnel inside that. The downside is that these services are too popular to be stable, most IT workers still prefer to use a personal hosting service.
[0] Due to the increased centralization of the web, changes are expected. With SNI encryption, if all the censor can see is a connection to an unknown website on CloudFlare's server, it's less of a threat. But different opinions exist, one says the pressure of censoring everything vs. not to censor can lead to an decrease of censorship or a faster overthrow of the censorship system , but others say the censorship/anti-censorship forces are in a dynamic balance, the introduction of centralized services with SNI encryption can actually break the balance. What used to be a slow censorship progression that needed in 5 years can speed up to 2 years, creating an accelerated and more aggressive censorship, and ends up to be a net negative everyone. Whether it is the case is yet to be seen.
[1] Unless you are in regions like Xinjiang, where separatist conflicts are seen as a threat, and that the censorship has extra objectives.
Thanks for this summary. The firewall has been a lot stricter recently and it's been a real pain in the ass, even for legitimate things. I can only speculate they are using deep learning type tools now to do their blocking
> I can only speculate they are using deep learning type tools now to do their blocking
It needs careful justification before making such a statement - the censorship system has a serious constraint on computational costs - it needs to operate on the stream of the entire outgoing international traffic, and to make a decision in real-time (or for back-analysis). We are talking about many terabytes per seconds of traffic, any censorship tools that have a high computational costs cannot be deployed for such a purpose, even if it runs okay on a single PC. Also, a high false-positive rate is not acceptable, as it will create massive service disruption and practically useless.
Unlike the case for SOCKS5-over-SSH, HTTPS-over-HTTPS, or VPN handshakes, which can be detected by relatively simpler rules, most deep learning tools required excessive CPU time, so it's unlikely that complex deep learning algorithms are being used, at least not the category that costs the highest CPU time (anything with "AI").
Given these constraints, the algorithms available to the censorship system is rather limited, it seems. What types of algorithms are being used, then? Unfortunately, nobody can answer this question. This is the fundamental question people are facing today. 10 years ago, every sysadmin in China knew the censorship system works, but today, the system has became completely opaque.
Were they doing that full page text matching in an ASIC too?! Doesn't that basically involve writing a simple parser also? Else what prevents things like usage of Google analytics/fonts etc from triggering a match and blocking?
> Else what prevents things like usage of Google analytics/fonts etc from triggering a match and blocking?
The blocking was/is complementary. Usually, domain names themselves were blocked by DNS poisoning (or IP blocking if it escalated), domains themselves (or the names of the websites) did not appear in the keyword blocklist. A link to Google Analytics or Facebook button could stuck the webpage from loading properly until a timeout, but merely mentioning or linking a domain name would not trigger a keyword match of the page itself.
The intention of keyword matching was to allow partial access while still blocking unwanted content. Usually, only the most politically unwanted keyword entered the keyword list. For example, Wikipedia could be accessed normally, but as soon as "a word that should not be named" appeared in the webpage, the connection would be reset immediately. An interesting phenomenon was, sometimes the page could partially load and stopped exactly before the forbidden word. And since the censorship system worded on mirrored traffic, sometimes a slight processing delay allowed the full page to load before the RST was received, it would be a "I'm feeling lucky moment".
Anyway, there was how the system worked before 2010. The extensive use of HTTPS rendered it useless, and it appeared that some forms of keyword filtering has already been lifted, since it's already a pointless exercise.
For quite some time after keyword matching became ineffectively, DNS poisoning remained the only form of censorship for many unwanted but not significant websites, for example, Hacker News. But recently, SNI matching was implemented.
> Host: www.youtube.com
> We send it in 2 parts: first comes GET / HTTP/1.0 \n Host: www.you and second sends as tube.com \n .... In this example, ISP cannot find blocked word YouTube in packets and you can bypass it!
If you talk to anyone from China that this is how you bypass (HTTP) "deep packet inspection", it would sound incrediblely naive. I'm not criticizing here, thanks for developing an anti-censorship tool, but my point is, any DPI that can be bypassed in this way is simply too outdated, it's far from the state-of-art threats we are facing worldwide.
What China does today is what your ISP/government is going to do tomorrow, when they upgrade the equipment. Learning a history lesson from China, can help providing insights for developers in other countries to know where this cat-and-mouse game is heading to...
> paulryanrogers: So basically it just does two things: carefully chunking HTTP header packets and encrypted DNS? Not sure this will work for very long.
Of course it will not. I'll explain why.
---
Literally, the same technique was used in China during the early days of Great Firewall, around 2007. At that time, the "censorship stack" was simple, basically, it had...
* A brute-force IP blocking list
This is a constantly updated list of IP addresses of "unwanted" web servers, such as YouTube or Facebook. They are distributed via the BGP protocol, just like how normal routing information is distributed. Once your server enters this blacklist, nothing can be done. Not all unwanted websites enter the list due to its computational/storage costs.
* A DNS poisoning infrastructure
A list of "unwanted" domain names are maintained. These domain names are entered to the national DNS root server as records with bogus IP addresses. It was used more widely than the IP blocklist, since it has zero cost to operate, but it can only block websites in the list and it takes time for the censor to be aware of a target's existence.
* A naive keyword filtering system.
All outgoing international traffic is mirrored for inspection. A keyword inspection system attempts to match the URLs in HTTP requests against a blacklist of unwanted keywords. Rumors said the string matching was performed by hardware in ASIC/FPGA, allowing enormous throughput.
* A TCP reset attack system
Once an unwanted TCP connection is identified by the keyword inspection system, the TCP Reset attack system fires a bogus RST packet to your computer, fooled by the packet, your operating system will voluntarily work against you and terminate the connection, saving the censors' CPU time. The keyword filtering system paired with reset attack was the preferred way to carry out censorship.
That's all. The principle of operation was simple and easy to understand. So what were the options for bypassing it? There were a lot. To begin with, the blocked IP addresses were blocked, you could do nothing about it. But in the earliest day, accessing them was as simple as finding a random HTTP proxy server. Later, the inspection system was upgraded to match HTTP proxy requests. Then, you could simply play some magic tricks with your HTTP requests, like the example in the beginning, so that your request wouldn't trigger a match. Around the same time, in-browser web proxy tools became popular, they were PHP scripts running on a web server that fetched pages. However, they became useless when the keyword matching system was upgraded to match the content of the entire page, not simply the requests (remember, few sites had HTTPS). At this point, all plaintext proxy techniques and HTTP request "massaging" techniques were all officially dead.
Some naive rot13-like techniques were later implemented to some web proxies, HTTPS web proxies were also a thing, but they saw limited use.
* New: A complete keyword filtering system - Inspect all HTML pages (Was: A naive keyword filtering system)
Another target to attack was the DNS poisoning system, sometimes all you needed was a correct IP address, since not all IPs were included in the blocklist due to the costs. Initially, all one needed to do was modifying one nameserver to 8.8.8.8. However, countermeasures were quickly deployed. A simple countermeasure was rerouting 8.8.8.8 to the ISP's nameserver, continued feeding the same bogus records to you. Nevertheless, there were always alternative resolvers to use. So the system was upgraded to provide a DNS spoofing infrastructure - at the instant an outgoing DNS packet is detected, the spoofing system would immediately answer with a bogus packet. The real packet would arrive at a hundred milliseconds later, but it would be too late, your OS had already accepted the bogus result.
And ironically, even if DNSSEC was widely supported (it was not), it couldn't do anything but returning an SERVFAIL, since DNSSEC can only check whether the result was true, dropping the bogus packet and accepting the true one was outside the capabilities of a standard DNSSEC implementation.
* New: A Real-time DNS Spoofing System
Better tools were developed later, that acted like a transparent resolver between the upstream resolver and your computer, that identified the bogus results to drop them, but the use was limited. Also, at this point, the IP blocklist has been greatly expanded. Even if a correct IP could be obtained, it was still inaccessible. Around 2008 or so, a special open source project was launched by developers in China - /etc/hosts list, whenever someone found a Facebook IP address that was not in the blocklist yet, one sent patches to the project. There were also shell scripts to keep your list up-to-date.
However, a /etc/hosts list was useful but its usefulness was limited. First, it was a matter of time before a new IP address was blocked. Also, a working IP address still was restricted by the same keyword filtering system.
* New: Expanded IP Blocklist.
Some people also realized that the firewall was only able to terminate a connection by fooling the operating system. Soon, iptables rules for blocking RST packets appeared in technical blogs. By ignoring all RST packets, one essentially gained immunity at the expense of network stability, as legitimate RSTs were also ignored. Soon, the censorship responded by upgrading the reset attack system, so that RST packets were sent to both directions - even if you ignored RST, the server on the other side would still terminate it. Also, RST was now "latched-on" for a limited time, when the first RST was triggered, the target remained inaccessible in several minutes.
* New: Bidirectional TCP Reset Attack
* New: "Latched-On" Reset Attack
When HTTPS was enabled, it was impossible to perform keyword inspection in the HTML pages - at this time, censor sometimes still wished to allow partial access, only triggering the block when detected a match. This strategy cannot be applied to HTTPS, since the content was all encrypted. Some people realized some popular websites supported HTTPS but not enabled it by default, such as Wikipedia. The Great Firewall responded by implementing a HTTPS certificate matching subsystem in the keyword matching system, when a particular certificate was matched, you were greeted by a TCP RST packet (this system has been removed later when HTTPS saw widespread use).
* New: Certificate-Based HTTPS Blocking System
At this point, around 2010, the only reliable way to browse the web was using a fully-encrypted proxy, such as SSH dynamic port forwarding or a VPN, which required purchasing a VPS from a hosting provider. SSH was more popular due to its ease of use - all one needed was finding a SSH server and ran "ssh -D 1337", so that port 1337 would be a SOCKS5 proxy provided by OpenSSH. OpenVPN was reserved for heavy web users, since it's more difficult to setup, but had better performance.
From the beginning to the 2010s, anyone who was using VPN or SSH can enjoy reliable web browsing (only be disturbed from time to time due to the overloaded international bandwidth). However, the good days came to an end when the Great Firewall implemented a real-time traffic classifier, it was first applied to SSH. It observed the SSH packets in real-time and attempted to identify whether an overlay proxy traffic was carried on top of it. The blocking mechanism was enhanced as well, now it was able to dynamically inserting null route entries when it decided that the communication with a server was unwanted. The IP blocking system was also improved, now it was able to collect unwanted IP addresses at a faster rate with help of the traffic classifier. If you used SSH as a proxy, after a while the connection would be identified, with all packets dropped, repeated offenses would earn you a permanent IP block. For VPNs, the firewall implemented a real-time classifier to detect OpenVPN's TLS handshakes. When handshakes were detected, a RST packet is sent (or if you use UDP, all packets are dropped). Repeated offenses would earn you a permanent IP block as well.
New: Real-Time Traffic Classifier
New: Real-Time IP Blocking
New: Actively Updated IP Blocklist using Classifiers as Feedback
Traffic classifiers would later be expanded to cover HTTPS-in-HTTPS as well, so a naive HTTPS proxy wouldn't work, and possibly have other features, it's a mystery.
BTW, after Google exited from China, the HTTPS version was immediately blocked, and for HTTP, a ridiculous keyword blocklist was enforced and it generated huge amount of false-positive RSTs for harmless words, apparently a deliberate decision, preferring false-positive over false-negative. Eventually, all Google services had been permanently blocked. The IP block became extensive, major websites have been completely blocked, the unblocked sites were only exceptions. For most people, the arrival of widely-used HTTPS was too late and useless, since IPs were blocked. And as mentioned, SSH and VPNs were classified and blocked as well.
This was when a new generation of proxy tools started to gain popularity,