Published using Google Docs
Constitution of the ACL Special Interest Group on Southeast Asian Natural Language Processing (SIGSEA)
Updated automatically every 5 minutes

SIGSEA | Special Interest Group in Southeast Asian NLP


CONSTITUTION OF THE ACL SPECIAL INTEREST GROUP ON SOUTHEAST ASIAN NATURAL LANGUAGE PROCESSING (SIGSEA)

I. STATEMENT OF PURPOSE

Southeast Asia faces unique challenges due to its vast linguistic and cultural diversity, encompassing over 1,300 indigenous languages (18% of the world’s language), each with distinct scripts, dialects, and complexities. With a population of 671 million, Southeast Asia is a rich mosaic of cultures and languages. However, Southeast Asian NLP, along with vision-language and speech processing, is significantly underrepresented. Advancing Southeast Asian NLP holds great potential for enhancing research quality, promoting technological inclusion, and preserving cultural heritage, benefiting both the region and the global community.

The purpose of the Association for Computational Linguistics (ACL) Special Interest Group on Southeast Asian NLP (the SIG) shall be: (1) To promote interest in Southeast Asian NLP; (2) To foster coordination and collaboration between academic and industry organizations worldwide engaged in research or applications related to Southeast Asian NLP; (3) To provide members of the ACL with a special interest in Southeast Asian NLP with a means of exchanging news of recent research developments and other matters of interest in Southeast Asian NLP; (4) To sponsor meetings, shared tasks, and workshops in Southeast Asian NLP that appear to be timely and worthwhile, operating within the framework of the ACL's general guidelines for SIGs.

  1. SIGSEA Role in Minimizing Potential Negative Impacts of Language Technology in Southeast Asia

We identify four potential negative impacts of language technology development in Southeast Asia, including cultural misrepresentation, social bias and inequality, privacy and security concerns, and the marginalization of lesser-resourced languages.

  1. Cultural Misrepresentation: Incomplete or biased data could lead to the misrepresentation or distortion of Southeast Asian cultures and languages. This could perpetuate stereotypes or overlook the rich nuances of these cultures.
  2. Social Bias and Inequality: NLP works can inherit biases from their training data, which could reinforce inequalities in Southeast Asian societies, such as gender, ethnic, or socioeconomic biases in language usage.
  3. Privacy and Security Issues: As NLP systems often deal with personal data (e.g., text and speech), there is a risk of privacy violations, especially in regions where digital infrastructure and data protection laws are still developing.
  4. Marginalization of Lesser-Resourced Languages: NLP technology could disproportionately benefit dominant languages like Thai, Vietnamese, or Indonesian, leading to the neglect of lesser-resourced languages. With over 1,300 languages in Southeast Asia, many smaller languages could be overlooked in data collection, algorithm development, and model training. This marginalization could lead to the loss of linguistic diversity and cultural heritage, as these smaller languages become digitally invisible or underrepresented in technological solutions.

As one of the ACL special interest groups devoted to the Southeast Asia region, the organization could take an active role in raising awareness of these potential negative impacts and promote best practices by:

  1. Organizing Workshops and Panels: SIGSEA could organize workshops or panels that incorporate a focus on ethical NLP practices, with a particular emphasis on Southeast Asian languages. These could address topics like reducing language biases, protecting linguistic diversity, and ensuring privacy in NLP systems.
  2. Collaborating with Local Communities: SIGSEA can facilitate collaborations between NLP researchers and local linguistic communities to ensure that technological developments reflect the needs and perspectives of the people most affected by them. This might include organizing consultations or co-designing solutions with speakers of underrepresented languages.
  3. Developing Guidelines for Responsible AI: The SIG could develop guidelines that outline responsible practices in NLP research, including best practices for collecting, processing, and using data, with a focus on ensuring inclusivity, reducing bias, protecting user privacy, and ensuring fair compensation for annotation.
  4. Raising Awareness: SIGSEA could engage in outreach activities to inform the broader Southeast Asian community, through social media, blogs, or public forums, about the importance of preserving linguistic diversity and the potential risks associated with NLP technologies.
  5. Inclusive Shared Tasks: SIGSEA could host shared tasks that encourage the development for less-resourced Southeast Asian languages. This would help promote the creation of tools that cater to underrepresented groups and languages.

  1. Ethics and Regulations

While certain research topics may align with the goals of SIGSEA, they must be thoroughly evaluated to ensure they do not violate ethical or legal boundaries. This requires prioritizing community-centered research, safeguarding vulnerable populations, and adhering to data privacy regulations. It's crucial that the research serves a meaningful purpose for those involved, particularly when working with marginalized or vulnerable groups.

In cases where research involves minor or indigenous communities and the collection of multimodal data (such as for image captioning or speech recognition systems), the dataset creation process must be driven by the actual needs of the local community. Research should not exploit indigenous language resources without addressing the community’s specific needs or providing tangible benefits. Instead, the focus should be on creating value and respecting the cultural and linguistic heritage of these communities.

SIGSEA is dedicated to promoting responsible language technology research in the ASEAN region by adhering to the following key regulations and principles:

  1. Ethical Dataset Creation: When building datasets in the ASEAN region, SIGSEA members must prioritize the privacy and rights of research participants. Participants should be fully informed about how their data will be used and must provide explicit consent when necessary. Special care is required when working with minority or indigenous communities, ensuring that protocols are followed to respect their culture, language, and intellectual property. This includes obtaining permission to use their language data and ensuring the research benefits the communities involved.
  2. Fair Compensation for Labor and Annotation: When research requires labor, such as annotation from local communities, fair compensation must be provided. This should accurately reflect the time, expertise, and effort required.
  3. Bias and Discrimination Prevention: Language technology development must not perpetuate bias or discrimination, especially toward marginalized groups in the region. SIGSEA members are expected to ensure that all communities, including minorities, are treated fairly and equitably.
  4. Avoiding Harm in Society: Research and language technologies should be developed with caution to avoid harmful societal impacts, such as the spread of misinformation, exploitation of communities, or reinforcement of stereotypes. SIGSEA aims to foster language technologies that contribute positively to society.
  5. Compliance with Local Regulations: Given that each ASEAN country may have specific regulations regarding language, data collection, intellectual property, and community engagement, SIGSEA members are expected to comply with the local laws and guidelines governing their research.

To support these goals, SIGSEA will provide clear guidelines on (1) obtaining informed consent and best practices for dataset collection in Southeast Asia, and (2) conducting responsible research overall. These guidelines will be distributed via our mailing list and made available on the SIGSEA website. Furthermore, by implementing regular ethical audits of research projects, SIGSEA will ensure that all research complies with both legal and ethical standards.

II. ELECTED OFFICERS

The elected officers of the SIG shall consist of a President and a Secretary. The President and the Secretary shall be members in good standing of the ACL.

The term of all elected officers of the SIG shall be 3 years or shorter if the elected officers organize the elections earlier. The term begins 2 months after the officers are elected.

The duties of the President shall be: (1) To have primary executive authority over actions and activities of the SIG. (2) To prepare a written report on the activities of the SIG for the Executive Committee of the ACL, for presentation to the ACL at its Annual Business Meeting. (3) To designate a Liaison Representative for the SIG, who shall be primarily responsible for communication with members of the SIG, answering inquiries about the SIG, and communication with the Executive Committee of the ACL.

The duties of the Secretary shall be: (1) To maintain a membership roster of the SIG. (2) To be responsible for any moneys awarded to the SIG by the ACL; to collect and manage any dues that may be required by the organization; and to present a written annual report on the SIG finances to the Executive Committee of the ACL; (3) To conduct elections as mandated in Section IV; (4) To act as a Liaison Representative, who shall be primarily responsible for communication with members of the SIG, answering inquiries about the SIG, managing relations with other SIGs or organizations, communication with the Executive Committee of the ACL, and overseeing the management of public-facing online materials.

III. OTHER REPRESENTATIVES

SIGSEA will be supported by an Advisory Board of leading researchers of Southeast Asian NLP from across the world.

The duties of the Advisory Board shall be: (1) To advise on planning and coordination of SIGSEA activities with other ACL SIGs and non-ACL organizations; (2) To support the election processes of the SIG through volunteering to serve on the Election Committee.

IV. ELECTION OF OFFICERS

All elected officers of SIGSEA shall be elected by a vote of the current SIGSEA members. The vote will be conducted using an electronic form sent via electronic mail. The timeline of the elections is based on the expiration of the current terms of the officers (time n):

The election process shall be managed by a three-member Election Committee formed by the SIG Secretary and two volunteer members from the current SIGSEA members. Members of the Election Committee cannot run for elected positions in the elections they manage. If the SIG Secretary is running for election, a third volunteer from the Advisory Board will take the place of the SIG Secretary.

All SIG members will be given notice to submit nominations for SIG officers at the time nominations open and 2 weeks before their closing. Candidates must be nominated by at least two members and must accept the nomination. Candidates must be current members in good standing in SIGSEA and in good standing in ACL. Candidates are required to submit a CV, a short biography, and their vision as future officers, which will be shared with all voting members through the electronic nomination form.

All SIG members will be given notice to vote on SIG officers at the time voting opens and 2 weeks before its closing.  Votes will be entered by electronic means. Votes arriving by the notified closing date will be counted. The Election Committee will use Single Transferable Vote (aka Multi-winner Ranked-choice Voting) to determine the winners.

The results shall be notified to the active members within 2 weeks immediately after the closing of the votes. The previous officers will remain active for 2 months into the next term to ease the transition.

If a vacancy occurs among the officers of the SIG, the remaining two elected officers will each nominate a replacement officer from among the Advisory Board to serve up to the remainder of the previous officer's term.

V. MEMBERSHIP & DUES

Subject to the approval of the Executive Committee of the ACL, the SIG may levy annual membership dues if needed, where the rate is determined by the elected officers, i.e., the President and the Secretary.

Everyone who professes an interest in Southeast Asian NLP can join. It does not require ACL membership. The membership can be obtained either through provided electronic means or by filling out a membership form in conjunction with a SIG meeting. Membership will be terminated on request. Termination requests are to be communicated by email to the SIGSEA Secretary.

VI. ETHICAL CONSIDERATIONS

Southeast Asia’s linguistic and cultural diversity offers immense potential for language technology, yet poses ethical challenges such as cultural misrepresentation, social bias and inequality, privacy concerns, and the risk of marginalizing lesser-resourced languages. SIGSEA acknowledges these risks and is committed to promoting ethical, inclusive, and community-centered language technology development in the region. Potential negative impacts include the misrepresentation of Southeast Asian languages due to incomplete or biased data, reinforcement of societal biases embedded in NLP systems, privacy risks associated with handling personal data in regions with developing data protection laws, and the neglect of over 1,300 lesser-resourced languages, which threatens cultural heritage and linguistic diversity.

To address these challenges, SIGSEA will organize workshops and panels on ethical NLP practices, focusing on Southeast Asian languages. We will facilitate collaborations with local communities to ensure technology development reflects the perspectives of those most affected, and we will create guidelines for responsible AI in NLP, emphasizing inclusivity, bias reduction, and privacy. Through outreach and shared tasks, SIGSEA will raise awareness and encourage development for lesser-resourced Southeast Asian languages, supporting tools that cater to underrepresented groups.

SIGSEA members will adhere to ethical practices in dataset creation, respecting privacy, obtaining explicit consent, and ensuring community benefits, especially when working with minority or indigenous groups. Fair compensation will be provided for tasks such as annotation, and efforts will be made to prevent bias and discrimination within language technology. Research practices will also comply with local regulations across ASEAN countries, protecting cultural heritage and community rights. SIGSEA will distribute guidelines on consent, dataset collection, and responsible research via its mailing list and website, with regular ethical audits of research projects to maintain compliance. These efforts underscore SIGSEA's commitment to advancing language technology responsibly, respecting Southeast Asia’s cultural diversity, and fostering positive societal impact.