Linguistic — Linghao‘s Utopia

I am an enthusiastic graphic designer, I have been fortunate to explore various design practices and perspectives, from intercultural to decolonized design, as I have transitioned from China to America.

Currently, I am pursuing my second master's degree in the Design and Environmental Analysis program at Cornell University. I invite you to join me in my world of design and photography utopia.

Linghao Li ｜李凌昊

PhD in Design
MA Design in D+EA ‘24
MA Graphic Design and Visual Experience ‘22
BFA Visual Communication Design ‘16
NCSU | Cornell University｜SCAD｜TAFA
+ 1 912-391-7213 | ll933@cornell.edu
Behance / Instagram / Linkedin / Mail

Design Works
Photography Works
Research Topic
About Myself
Resume/CV

Throughout my academic and professional journey in past, I have been exposed to various aspects of research and design theory. However, my practical design work has primarily relied on existing research findings, and I hadn't actively engaged in in-depth research to analyze and comprehend design challenges from diverse perspectives. This inclination towards practice-based education likely contributed to my previous approach. Nevertheless, a pivotal shift occurred when I embarked on my research-based education in the United States.

Building upon my foundation in design theory and art education, I have significantly broadened my research interests. During my time at Cornell University's Human-Centered Design Department, I had the privilege of systematically exploring Pluriversal Design under the guidance of Dr. Renata Leitão. Concurrently, I pursued a minor in Anthropology, mentored by Dr. Viranjini Munasinghe. This diverse academic experience, encompassing Cultural Anthropology, Visual Studies, History of Photography, and Design for Interaction, propelled me into the realm of interdisciplinary research and helped me define my unique research trajectory.

A pivotal moment in solidifying my academic direction came through my studies with Professor Andrew Moisey. His deep insights into visual studies and photography profoundly influenced my approach to understanding how visual culture shapes and reflects societal narratives. His mentorship has been instrumental in affirming my commitment to pursuing visual studies as the core of my future academic career.

Presently, my research focuses on the intersection of visual culture, design, art history, and anthropology. I am particularly fascinated by the history of photography and visual media in early 20th-century Northeast Asia, especially in relation to how these mediums influenced socio-political narratives during periods of colonization.

My passion for these topics drives my commitment to advancing knowledge in these areas. I invite you to explore more about my work and research interests in the following messages. This dedication fuels my pursuit of creating meaningful contributions to the understanding of visual culture and design in East Asia. Welcome to my academic journey.

The Impact of Artificial Intelligence and National Policies on Linguistic Diversity: A Case Study of the Decline in Chinese Internet Content

23SP
ANTHR6475 Class Final Paper
“Culture, Language, and Thought Department of Anthropology”
Prof. V. Santiago-Irizarry
Cornell University

KeyWords

Artificial Intelligence; Language Models; Linguistic Anthropology; Internet Culture; English Dominance; Chinese Internet Content; Technological Factors; National Policies; Minority Language and Culture; Information Exchange; Communication Limitations; Case Study.

Abstract

The rapid development of artificial intelligence (AI), particularly large language models (LLMs), has the potential to revolutionize linguistic anthropology. However, this progress also raises concerns about the impact of AI on linguistic diversity. Specifically, there is a growing bias towards English content in AI training data and broader internet content. This bias presents challenges for non-English languages, such as Chinese. The decline of Chinese internet content has a number of negative implications for both Chinese companies and AI training.

This paper examines the influence of AI and national policies on linguistic diversity, with a focus on the decline of Chinese internet content. The paper identifies a number of factors contributing to this decline, including the global dominance of American tech companies and China's promotion of Chinese education in ethnic minority areas and stringent control over news, publication, and internet usage. The paper concludes by discussing the future implications of these trends for linguistic anthropology, AI, and internet culture.

Introduction

Artificial intelligence (AI) has transformed various sectors, including linguistic anthropology. Large language models (LLMs), a type of AI, have shown incredible prowess in understanding, generating, and translating languages. However, a significant issue arises from the fact that the data used to train these models predominantly comes from English sources. This issue extends beyond AI training data to the broader landscape of the Internet, where English content significantly outweighs that of other languages.

A particularly poignant example of this disparity is the case of Chinese internet content. Despite China's large population and significant global influence, Chinese content only makes up a fraction of total internet content. This decline in Chinese internet culture is a worrying trend that poses challenges for Chinese companies and AI training. From the perspective of linguistic anthropology, the extinction of the Chinese language in the Internet world would be a significant loss. Language is a key part of culture, and it is through language that we express our thoughts, feelings, and experiences. The loss of the Chinese language would mean the loss of a unique and valuable cultural heritage.

There are a number of factors that have contributed to the decline of Chinese internet content. One factor is the global dominance of American tech companies. These companies, such as Google, Facebook, and Twitter, have a significant presence in China and operate in English. This gives English content an advantage over Chinese content, as it is more easily accessible to users.

Another factor that has contributed to the decline of Chinese internet content is national policies. The Chinese government has a strict control over news, publication, and Internet usage. This control limits the exchange of information and communication, potentially contributing to the decrease in Chinese internet content.

The decline of Chinese internet content has a number of negative consequences. One consequence is that it limits the ability of Chinese people to access information and communicate with each other. This can have a negative impact on their social and political participation.

Another consequence of the decline of Chinese internet content is that it makes it more difficult for Chinese companies to compete in the global market. This is because English content is more widely available and accessible to users around the world.

The decline of Chinese language content on the Internet is a worrying trend that can have many negative consequences. The iterative development of English-based Large Language Model (LLM) technologies is accelerating the rate of Chinese language extinction on the Internet. In order to preserve linguistic and cultural diversity, it is important to analyse and discuss the reasons for this decline and to find solutions to the factors contributing to it.

AI, Language Models, and Linguistic Anthropology

Evolution of AI and Language Models

Artificial Intelligence (AI) has made significant advances over the years, with language models being one of the most impactful applications. Early AI language models were rule-based systems, relying on hand-coded rules and grammatical structures that were often limited in scope and flexibility. With the advent of machine learning, these models evolved into statistical systems, using algorithms to learn from large amounts of data. More recently, the introduction of deep learning and neural networks has further revolutionized AI language models. These models, such as OpenAI's GPT-3 and GPT-4, use a network of artificial neurons to learn complex patterns from large datasets, significantly improving their ability to understand, generate and translate language.
Role of AI in Linguistic Anthropology

AI, particularly large language models (LLMs), is playing an increasingly important role in linguistic anthropology, the study of how language influences social life. These models have the ability to analyze vast amounts of linguistic data, revealing intricate patterns and trends that would be difficult for humans to detect manually. They offer insights into how language is used in different cultures, how it evolves over time, and its role in expressing social status and group identity. In addition, AI models have the ability to translate under-documented languages or dialects, helping to preserve and understand minority languages.

Language Variation Across Cultures: LLMs like GPT-3 can be used to analyze extensive amounts of linguistic data, revealing patterns and trends that might be challenging for humans to identify manually. For instance, they can analyze how language usage varies across different cultures, providing linguistic anthropologists with valuable insights. As an example, LLMs have been used to analyze Twitter data to understand linguistic variations and cultural differences across various regions (Eisenstein et al., 2014).
Language Change Over Time:LLMs can also track how language changes over time, an area of interest for many linguistic anthropologists. For example, Grieve, Nini and Guo (2017) used machine learning models to analyze a corpus of 3.5 million books, tracking changes in American English over the past century (Grieve et al., 2017).
Language and Social Status: LLMs can be used to examine how language conveys social status or group identity. Nguyen, Doğruöz, Rosé, and de Jong (2016) used machine learning models to analyze Dutch social media data, revealing how language use can indicate users' ethnicity, age, and gender (Nguyen et al., 2016).
Preservation of Minority Languages: AI models can help preserve and understand minority languages by translating under-documented languages or dialects. The Universal Dependencies project is an example of an initiative that uses AI to annotate and parse text from a wide range of languages, including minority and under-resourced languages (Shimazu et al., 2020).
These technologies have revolutionized the study of language, opening up new avenues of research and offering the potential to gain a better understanding of the world around us.

Cultural Bias in AI Training Data

AI language models are only as good as the data they are trained on. If this training data is biased, the models' outputs will likely reflect these biases. This is a significant issue in the field of AI, where training data often predominantly comes from English sources. This English bias can lead to a number of problems. Firstly, it can limit the models' ability to understand and generate non-English languages, limiting their usefulness in linguistic anthropology. Secondly, it can lead to the models reflecting and perpetuating cultural biases inherent in English language content. For instance, if English content online contains stereotypes or prejudices, these biases could be learned and reproduced by the AI models. Lastly, the dominance of English content could contribute to the decline of non-English languages online, as we see in the case of Chinese internet content.

For example, OpenAI's previous GPT-3 language model had 92.65% of its training data in English, while Chinese only accounted for a meager 0.1% (Brown et al., 2020). This could potentially be attributed to American companies' cultural biases, although a broader look at internet data reveals a similar disparity, with English content making up 58.8% and Chinese content only 1.7% (Ani Petrosyan, 2023).

In other words, AI language models were more likely to generate text that was biased against women and minorities if they were trained on data that was biased against these groups. This suggests that it is important to carefully consider the biases that may be present in AI training data, in order to mitigate the negative effects of these biases.

The Dominance of English on the Internet

Historical Overview of English Dominance

The dominance of English on the internet has its roots in the historical spread of the language and its use as a global lingua franca. The internet was initially developed in the United States, and as it expanded globally, the majority of its content was in English (Leiner et al., 2009). In addition, the early adopters and major contributors of Internet content were predominantly English-speaking individuals and companies. Even as internet access has expanded to non-English speaking regions, the amount of English content online has remained disproportionately high, affecting the representation of other languages (Brown et al., 2020).
Impact on non-English Languages and Cultures

The dominance of English online has significant impacts on non-English languages and cultures. On a practical level, it can make it difficult for non-English speakers to access information and services online. On a cultural level, it can contribute to the erosion of linguistic diversity, as languages with less online presence may be perceived as less valuable or relevant (Pimienta, Daniel et al., 2009). The dominance of English can also shape global discourses, as English-language narratives and perspectives may be overrepresented (Warschauer et al., 2002).

Despite having one of the largest populations of internet users, Chinese content represents only a fraction of total internet content. This is partly due to the dominance of English online, but also due to other factors such as national policies and technological infrastructure. The decline of Chinese internet culture to the point of nearly eliminating the Chinese language online is concerning. This issue makes it difficult for Chinese companies to train AI using native language data, as most available data comes from English sources. If this trend continues, it is worth considering whether the next language evolution will be driven by Internet technology.

This phenomenon can be attributed to both technological factors and, more significantly, the influence of national policies. The Chinese government's promotion of Chinese education in ethnic minority areas has created a monolingual environment for younger generations, eroding the language and cultural identity of minority groups in China. Additionally, the government's strict control over news, publication, and internet firewall contributes to limited information exchange and communication. The decline of Chinese internet culture poses a challenge for Chinese companies and AI training, as most available data comes from English sources. If this trend continues, it may contribute to the further marginalization of the Chinese language online (Brown et al., 2020).

The Decline of Chinese Content Online

Statistical Analysis of the Decline

In recent years, rapid advances in artificial intelligence (AI) technology, such as large-scale language models, have brought great convenience and appeal. However, such advances are not without their drawbacks, as evidenced by the potential for cultural bias in AI training data. For example, OpenAI's previous GPT-3 language model had 92.65% of its training data in English, while Chinese accounted for a meagre 0.1% (Brown et al., 2020). This may be due to cultural bias in US companies, but it also has to do with the closed ecology of the Chinese internet world. As of 2023, Chinese content accounts for only 1.7% of all internet content, lagging behind languages like Italian, Persian, Portuguese, Turkish, Japanese, German, French, Spanish, and Russian (Ani Petrosyan, 2023). Even the Vietnamese have surpassed the Chinese in the number of web pages, and at this rate, China's internet culture may soon only be able to claim superiority when compared to countries like Nepal or Kenya. While the Chinese language is not at risk of extinction considering its vast population and country size, the decline in internet information density is becoming an increasingly irreversible reality.

No intelligent and conscientious Chinese person would be optimistic about the quality rating of this 1.7% of Chinese web pages. The US has wiki, China has a similar internet product - Baidu Encyclopedia; the US has reddit, China has a similar internet product - Pastebar. And because of China's Internet information censorship and developed self-censorship habits, these Internet products will keep adding their own blocking words to block more and more information (Liang & Lu, 2010). And more and more Chinese internet information is even closed to the search ecosystem, so that other companies' AI cannot easily search and exploit it. Finally, there is a very small amount of excellent Chinese internet data left, but again, it is first blocked by administrators and then passively managed with contaminated data that is really too embarrassing to give to the AI to learn.

The decline of Chinese internet culture to the point where Chinese is almost eliminated from the internet is worrying. This problem makes it difficult for Chinese companies to use native-language data to train AI, as most of the available data comes from English-language sources. If this trend continues, it is worth considering whether the next language evolution will be driven by internet technology.
Impact on Chinese Companies and AI Training

The decline of Chinese internet content not only affects the preservation of Chinese culture and language online, but also practical aspects such as AI training. Chinese companies find it difficult to train AI using native language data, as most available data comes from English sources. This imbalance in training data can lead to AI models that are less effective at understanding and generating Chinese content, which could affect applications ranging from translation services to chatbots and content recommendation systems (Brown et al., 2020). And while Chinese internet companies have begun training on large-scale language models, and some have even developed some Chinese AI products similar to Chat-GPT, these products are mired in controversy over data sources.

Baidu, a Chinese tech giant, unveiled its new AI chatbot, ERNIE Bot, in March 2023. The bot was designed to be a comprehensive and informative conversational AI, capable of answering a wide range of questions.

However, shortly after its launch, ERNIE Bot was accused of censorship and misinformation. For example, when asked about sensitive topics such as the Tiananmen Square protests, ERNIE Bot would either refuse to answer or give evasive answers. Additionally, ERNIE Bot was found to be spreading misinformation on a variety of topics, including the COVID-19 pandemic.

Baidu has defended ERNIE Bot, saying that it is still under development and that it is constantly being improved. The company has also said that it is working to address the concerns about censorship and misinformation. However, the controversy has raised questions about the reliability of ERNIE Bot and the potential for AI chatbots to be used for censorship and misinformation.

ERNIE Bot is not the only AI chatbot to be accused of censorship and misinformation. In recent years, there have been a number of reports of AI chatbots being used to spread propaganda, promote hate speech, and suppress dissent. This raises concerns about the potential for AI chatbots to be used to manipulate public opinion and undermine democracy.(Yang, 2023)

Examples such as these are not isolated. Microsoft's XiaoIce and BabyQ, created by Beijing-based AI company Turing Robot, were taken down to be attuned according to China's censorship rules (Pei Li & Adam Jourdan, 2017).

Future Implications: Language Evolution and Internet Technology
The decline of Chinese internet content and the dominance of English have profound implications for the future of language evolution and internet technology. If this trend continues, it is conceivable that the next significant language evolution will be driven by internet technology, with English becoming even more dominant. This could lead to a further erosion of linguistic diversity on the Internet, which is an essential component of cultural diversity. It could also exacerbate the bias in AI language models, making them less useful for non-English speakers and less able to preserve and understand non-English languages and cultures (Pimienta, Prado, & Blanco, 2009).

Dominance of English and Language Evolution: As the internet becomes an increasingly integral part of our lives, it is likely that the languages we use online will influence the languages we use offline. For example, if English continues to dominate the internet, it might become a more prevalent second language worldwide, even in regions where English is not currently widely spoken. This could eventually lead to a situation where English becomes the default language for global communication, both online and offline (David Graddol, 2006).

Erosion of Linguistic Diversity: Linguistic diversity is essential to cultural diversity and human creativity, but this diversity is at risk if the dominance of English online continues. Languages without a strong online presence could be perceived as less relevant, leading to a decline in their use. This is particularly concerning for languages that are already vulnerable or endangered (Moseley, Christopher & Nicolas, Alexandre, 2010).

Bias in AI Language Models: The underrepresentation of non-English languages online also affects the training data for AI language models. As these models learn from the data they are trained on, a dominance of English in the training data could lead to models that are less effective at understanding and generating non-English content. This could limit the usefulness of AI technologies for non-English speakers, and hinder the preservation and understanding of non-English languages and cultures. (Caliskan et al., 2017).

Technological Factors Influencing Language Distribution Online

Internet Infrastructure and Language Distribution

The availability and quality of internet infrastructure can significantly affect language distribution online. For example, countries with widespread, high-speed internet access tend to produce more online content, increasing the representation of their language online. A 2011 study by the International Telecommunication Union found that developed countries, where English is often a primary or secondary language, had a higher percentage of individuals using the internet than developing countries (International Telecommunication Union, 2011). The internet population in China is estimated to be 1.02 billion as of January 2023. About 75.6 percent of the Chinese population had used the internet. The penetration rate denotes the share of the population that has access to a certain communication medium. For comparison, the global average internet penetration rate had resided at about 64.4 percent as of January 2023, which is the second highest in the world after South Korea (Lai Lin Thomala, 2023) . But the fact that so many Internet users produce only one percent of the information on the Internet is a phenomenon worth reflecting on and observing.
Technology's Role in Language Decline or Growth

Technology can both contribute to language decline and stimulate language growth. On the one hand, the dominance of certain languages in technology, especially English, can contribute to language decline. When software, websites, and online services are predominantly in English, users may be incentivized to learn and use English rather than their native language. On the other hand, technology can also promote language growth. For instance, social media platforms can offer spaces for minority language communities to use and promote their language (Elin Haf Gruffydd Jones & Enrique Uribe Jongbloed, 2013).
The Impact on Chinese Language Online

In China, both internet infrastructure and technology have played a role in shaping the online language landscape. Despite China having one of the largest internet user bases globally, the country's content is underrepresented online. One reason is the popularity of closed, domestic platforms. Chinese users often favor domestic platforms like Weibo or WeChat, where content is primarily in Chinese and stays within these platforms, contributing little to the global pool of Chinese content on the open web. Moreover, government regulation and censorship can limit the exchange of information, resulting in less Chinese content being produced and accessible online(King et al., 2013).

The Influence of China National Policies on Language and Culture

Chinese Policies: Education and Internet Control

National policies have a significant impact on language and cultural dynamics. In China, two main policies can be seen as contributing to the decline of Chinese content online: education policy and internet control.

The Chinese government has promoted Mandarin Chinese as the standard language in schools, even in ethnic minority regions. While this helps to unify the country, it may also result in younger generations being less fluent in their ethnic languages, thereby reducing the creation of digital content in these languages (King et al., 2013). The Chinese government has promoted Mandarin Chinese as the standard language in schools, even in ethnic minority regions. While this helps to unify the country, it may also result in younger generations being less fluent in their ethnic languages, thereby reducing the creation of digital content in these languages

On the other hand, internet control in China, often referred to as the 'Great Firewall', also affects the creation of online content. The government's strict regulation of news, publishing, and internet use limits the exchange of information, resulting in less Chinese content being produced and accessible online. The Great Firewall (GFW) is a combination of legislative actions and technologies implemented by the Chinese government to regulate the internet domestically (Clayton et al., 2006). It involves blocking access to selected foreign websites and slowing down cross-border internet traffic. By scanning transmission control protocol (TCP) packets for keywords or sensitive words, the Great Firewall determines whether to close access. Its effects include limiting access to foreign information sources, blocking foreign internet tools and mobile apps (e.g. Google Search, Facebook, Twitter, Wikipedia, and others), and requiring foreign companies to comply with domestic regulations (Mozur, Paul & Goel, Vindu, 2014) (Branigan, 2012).

In addition to censorship, the Great Firewall has influenced China's internal internet economy by favoring domestic companies and reducing the effectiveness of foreign internet products (Denyer, 2023). The Chinese government employs various techniques to control the Great Firewall, including modifying search results and pressuring global conglomerates to remove content, as exemplified by Apple's removal of the Quartz news app from its Chinese App Store after reporting on the Hong Kong protests (Nick Statt, 2019).
Impact on Minority Language and Cultural Identity

These policies have a significant impact on minority languages and cultural identities in China. As Mandarin is promoted over ethnic languages, the latter may face erosion. This can lead to a loss of cultural identity among ethnic minorities, as language is a vital part of cultural identity. The limited online representation of minority languages further marginalizes these communities, as their languages and cultures become less visible on a global platform. This lack of visibility can lead to further neglect and marginalization of these languages, potentially accelerating their decline.

For example, the Chinese government’s education policy changed during the year, and the entire education system began to change from public to market-based. China’s higher education exploded in size, while investment in basic education did not increase much. At a time when society lacked many opportunities, Mongolians living on the margins of China, like other Chinese, began to value investment in basic education for the next generation with the expectation that their next generation would make the leap up through higher education or maintain their current social class. Such an ideology and the Gaokao form of examination dictated that Mongolians completely abandoned the learning of their own Mongolian language and Mongolian script. This is a kind of opportunism, which is also in line with social development. No young people do not want to get more opportunities, learning Chinese and English will help them to get more possibilities in the future while learning the Mongolian language does not see any future.

“Sattar was the former director of the Xinjiang Education Department. He was arrested in 2017. Later, he was regarded by the Chinese Communist Party (CCP) as a “two-faced person” who supported the independence movement by editing textbooks.

Sattar Sawut was sentenced to death for his role in the publication of school textbooks said to incite interethnic hatred. Five other Uyghurs were convicted in the same case. The former head of the local justice department was also sentenced to death for conspiring with Muslim separatists ” (Shohret Hoshur, 2018) It is worth sharing the education issues in Xinjiang, China, and Hong Kong, China in this period. The Chinese government reviewed its lack of necessary regulation of education in 2021. This is because the Chinese government believes that the chaos in Xinjiang is related to the Xinjiang basic education textbook incident and Xinjiang religious education. I grew up in this era, and all of my fathers and mothers, except my parents, worked in the Chinese basic education system. Many of them were taught in Mongolian schools. But the only difference between these schools and other Chinese basic education schools is the name of the school. None of the schools offer a single class for Mongolian language courses simply because Gaokao does not have a test component for the ethnic language. This is the reason why I can’t speak Mongolian or read Mongolian script. And these decades of development have made the new generation of Mongolians in China completely forget their ethnic identity.

In 2021, the Chinese government has issued an executive order to emphasize Chinese and Mandarin in basic education for all ethnic groups in the country (Alice Su, 2020). The Chinese government has issued an executive order to emphasize Chinese and Mandarin language education in basic education for all ethnic groups across the country, in order to shape the overall Chinese identity. This order sparked protests in many places, but these protests almost always faded away. At present, the once-supportive policies of Gaokao for Chinese minorities are also beginning to be withdrawn.
Comparative Analysis with Other Countries' Policies

Different countries have different approaches to language policy. For instance, in India, the government officially recognizes 22 languages and promotes multilingualism in schools and in public life, which has resulted in a diverse online language presence (Meganathan, 2011). Although China also has a diverse ethnic and linguistic composition, the government's ideological influence has stifled linguistic diversity on the internet.

In contrast, in many African countries, colonial languages like English or French are often the languages of education, government, and online content, even though they are not the first languages of the majority of the population. This can result in a lower online presence for local languages. I am still lacking sufficient observations in this section and I hope to have the opportunity to add to this section later on.

Conclusion

This study explores the dynamic interplay between artificial intelligence, language distribution on the internet, and national policies, with a particular focus on the decline of Chinese internet content. We found that the dominance of English in AI language models reflects its prevalence on the internet. The lack of Chinese content online, despite China having the largest internet user population in the world, is a complex issue with roots in both technological factors and national policies. This decline has significant implications not only for the Chinese language, but also for the diversity of languages and cultures represented online.

From a linguistic anthropological perspective, the declining presence of the Chinese language online may indicate a potential cultural shift. If this trend continues, it is plausible to imagine a future where Chinese, one of the most widely spoken languages in the world, may gradually lose its relevance in the digital landscape. This could lead to an erosion of linguistic diversity, which is an essential part of cultural diversity. Furthermore, the biases in AI language models may exacerbate this problem, making them less useful to non-English speakers and less able to preserve and understand non-English languages and cultures.

The implications for artificial intelligence are no less profound. As AI and language models increasingly shape our digital experiences, the underrepresentation of certain languages could lead to biased and less inclusive AI systems. This could perpetuate cultural biases and create further inequalities in the digital world.

Future research should continue to explore the complex factors that drive language distribution online and their impact on different languages and cultures. More detailed studies could examine the impact of specific national policies on language use and cultural representation online. In addition, research could explore potential strategies to counter current trends and promote a more diverse and inclusive digital landscape. For example, exploring how AI can be used to support and promote minority languages and cultures could be an exciting avenue of research.

The potential disappearance of Chinese content online is a major concern. However, it is a phenomenon that is not inevitable, but constructed through various technological, cultural and political processes. By understanding these processes, we can begin to imagine and work towards alternative futures where all languages and cultures have a place in our increasingly digital world.

Reference

Alice Su. (2020, September 3). China cracks down on Inner Mongolian minority fighting for its mother tongue. Los Angeles Times. https://www.latimes.com/world-nation/story/2020-09-03/china-inner-mongolia-bilingual-education-assimilation-xinjiang-resistance-crackdown

Ani Petrosyan. (2023, February 24). Most used languages online by share of websites 2023. Statista. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Branigan, T. (2012, June 28). New York Times launches website in Chinese language. The Guardian. https://www.theguardian.com/media/2012/jun/28/new-york-times-launches-chinese-website
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners (arXiv:2005.14165). arXiv. https://doi.org/10.48550/arXiv.2005.14165
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.1126/science.aal4230
Clayton, R., Murdoch, S. J., & Watson, R. N. M. (2006). Ignoring the Great Firewall of China. In G. Danezis & P. Golle (Eds.), Privacy Enhancing Technologies (Vol. 4258, pp. 20–35). Springer Berlin Heidelberg. https://doi.org/10.1007/11957454_2
David Graddol. (2006). English next. British Council. https://www.teachingenglish.org.uk/publications/case-studies-insights-and-research/english-next
Denyer, S. (2023, April 12). China’s scary lesson to the world: Censoring the Internet works. Washington Post. https://www.washingtonpost.com/world/asia_pacific/chinas-scary-lesson-to-the-world-censoring-the-internet-works/2016/05/23/413afe78-fff3-11e5-8bb1-f124a43f84dc_story.html
Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2014). Diffusion of Lexical Change in Social Media. PLOS ONE, 9(11), e113114. https://doi.org/10.1371/journal.pone.0113114
Elin Haf Gruffydd Jones & Enrique Uribe Jongbloed. (2013). Social Media and Minority Languages. https://www.multilingual-matters.com/page/detail/Social-Media-and-Minority-Languages/?k=9781847699046
Grieve, J., Nini, A., & Guo, D. (2017). Analyzing lexical emergence in Modern American English online. English Language & Linguistics, 21(1), 99–127. https://doi.org/10.1017/S1360674316000113
King, G., Pan, J., & Roberts, M. E. (2013). How Censorship in China Allows Government Criticism but Silences Collective Expression. American Political Science Review, 107(2), 326–343. https://doi.org/10.1017/S0003055413000014
Lai Lin Thomala. (2023). China: Internet penetration rate 2022. Statista. https://www.statista.com/statistics/236963/penetration-rate-of-internet-users-in-china/
Leiner, B. M., Cerf, V. G., Clark, D. D., Kahn, R. E., Kleinrock, L., Lynch, D. C., Postel, J., Roberts, L. G., & Wolff, S. (2009). A brief history of the internet. ACM SIGCOMM Computer Communication Review, 39(5), 22–31. https://doi.org/10.1145/1629607.1629613
Liang, B., & Lu, H. (2010). Internet Development, Censorship, and Cyber Crimes in China. Journal of Contemporary Criminal Justice, 26(1), 103–120. https://doi.org/10.1177/1043986209350437
Meganathan, R. (2011). Language Policy in Education and the Role of English in India: From Library Language to Language of Empowerment. In Online Submission. https://eric.ed.gov/?id=ED530679
Moseley, Christopher & Nicolas, Alexandre. (2010). Atlas of the world’s languages in danger. UNESDOC. https://unesdoc.unesco.org/ark:/48223/pf0000187026
Mozur, Paul, & Goel, Vindu. (2014, October 6). To Reach China, LinkedIn Plays by Local Rules. The New York Times. https://www.nytimes.com/2014/10/06/technology/to-reach-china-linkedin-plays-by-local-rules.html
Nguyen, D., Doğruöz, A. S., Rosé, C. P., & de Jong, F. (2016). Computational Sociolinguistics: A Survey. Computational Linguistics, 42(3), 537–593. https://doi.org/10.1162/COLI_a_00258
Nick Statt. (2019, October 9). Apple reportedly removes Quartz news app in China over Hong Kong coverage—The Verge. The Verge. https://www.theverge.com/2019/10/9/20907228/apple-quartz-app-store-china-removal-hong-kong-protests-censorship
Pei Li & Adam Jourdan. (2017, August 4). Chinese chatbots apparently re-educated after political faux pas. Reuters. https://www.reuters.com/article/us-china-robots-idUSKBN1AK0G1
Pimienta, Daniel, Blanco, Alvaro, & Prado, Daniel. (2009). Twelve years of measuring linguistic diversity in the Internet: Balance and perspectives. https://unesdoc.unesco.org/ark:/48223/pf0000187016

Shimazu, S., Takase, S., Nakazawa, T., & Okazaki, N. (2020). Evaluation Dataset for Zero Pronoun in Japanese to English Translation. Proceedings of the Twelfth Language Resources and Evaluation Conference, 3630–3634. https://aclanthology.org/2020.lrec-1.447

Shohret Hoshur. (2018, October 10). Three Uyghur Intellectuals Jailed for Separatism, Political Study Film Reveals. Radio Free Asia. https://www.rfa.org/english/news/uyghur/intellectuals-jailed-10102018172605.html

Trifonas, P. P., & Aravossitas, T. (2018). Heritage and Language: Cultural Diversity and Education. In P. P. Trifonas & T. Aravossitas (Eds.), Handbook of Research and Practice in Heritage Language Education (pp. 3–25). Springer International Publishing. https://doi.org/10.1007/978-3-319-44694-3_53

Warschauer, M., Said, G. R. E., & Zohry, A. G. (2002). Language Choice Online: Globalization and Identity in Egypt. Journal of Computer-Mediated Communication, 7(4), JCMC744. https://doi.org/10.1111/j.1083-6101.2002.tb00157.x

Yang, F. (2023, March 24). Baidu’s ChatGPT rival ERNIE Bot an AI letdown. Asia Times. https://asiatimes.com/2023/03/baidus-chatgpt-rival-ernie-bot-an-ai-letdown/