1 of 49

AULA 12:

Agentes + Python

2 of 49

AULA 11:

Agentes + Dados

3 of 49

Google

Forms

Mapeamento do processo de avaliação de vídeos

Arquivo

XLSX

Coluna F

Segunda linha

Links vídeos

Critérios de avaliação

Relatório

Avaliações

Arquivo

XLSX

Página

SITE

4 of 49

5 of 49

6 of 49

7 of 49

8 of 49

9 of 49

Obtenção de dados

10 of 49

11 of 49

12 of 49

13 of 49

14 of 49

15 of 49

Encodings

16 of 49

17 of 49

18 of 49

19 of 49

Chunks

20 of 49

21 of 49

22 of 49

23 of 49

Word embeddings are numerical representations of a word’s meaning. They are formed based on the assumption that meaning is contextual. That is, a word’s meaning is dependant on its neighbors:

24 of 49

Creating vector embeddings

Embeddings translate the complexities of human language to a format that computers can understand. It uses neural networks to assign numerical values to the input data, in a way that similar data has similar values.

25 of 49

What is Word Embedding?

Word Embedding is a language modeling technique for mapping words to vectors of real numbers. It represents words or phrases in vector space with several dimensions. Word embeddings can be generated using various methods like neural networks, co-occurrence matrices, probabilistic models, etc. Word2Vec consists of models for generating word embedding. These models are shallow two-layer neural networks having one input layer, one hidden layer, and one output layer.

Word2Vec is a widely used method in natural language processing (NLP) that allows words to be represented as vectors in a continuous vector space. Word2Vec is an effort to map words to high-dimensional vectors to capture the semantic relationships between words, developed by researchers at Google. Words with similar meanings should have similar vector representations, according to the main principle of Word2Vec. Word2Vec utilizes two architectures:

26 of 49

Word embeddings are represented as mathematical vectors. This representation enables to perform standard mathematical operations in words, like addition and subtraction.

27 of 49

Embeddings

28 of 49

What are Vector Embeddings?

Embeddings are numerical machine learning representations of the semantic of the input data. They capture the meaning of complex, high-dimensional data, like text, images, or audio, into vectors. Enabling algorithms to process and analyze the data more efficiently.

29 of 49

How do embeddings work?

The quality of the vector representations drives the performance. The embedding model that works best for you depends on your use case.

Word Embeddings: Example

30 of 49

Neural Networks

Word2Vec

understand the context

nuances of the word “right” are blended

BERT

use of a word in its surroundings

31 of 49

32 of 49

A Dummy’s Guide to Word2Vec

33 of 49

34 of 49

FAISS

35 of 49

Situação

36 of 49

FAISS

37 of 49

FAISS

São definidos os nomes dos arquivos denso_vectors_file para armazenar vetores densos e faiss_index_file para armazenar o índice Faiss.

38 of 49

IndexFlatL2

39 of 49

IndexFlatL2

Mede a distância L2 (ou euclidiana) entre todos os pontos dados entre nosso vetor de consulta e os vetores carregados no índice.

É simples, muito preciso, mas não muito rápido.

40 of 49

Mais métodos: Velocidade ou precisão?

41 of 49

Mais métodos: Velocidade ou precisão?

Ainda que um método seja mais rápido que outro, devemos também tomar nota dos resultados ligeiramente diferentes retornados. Anteriormente, com nossa pesquisa exaustiva em L2, estávamos retornando 7460, 10940, 3781 e 5747. Agora, vemos uma ordem de resultados ligeiramente diferente – e dois IDs diferentes, 5013 e 5370.

42 of 49

43 of 49

44 of 49

45 of 49

ChromaDB

46 of 49

47 of 49

48 of 49

49 of 49

AULA 13:

Chatbots + conteúdos