A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | AD | AE | AF | AG | AH | AI | AJ | AK | AL | AM | AN | AO | AP | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Proposer | Dataset name | Training items (approx) | Test items (approx) | # of classes | Feature types | URL/reference | Brief description | Readiness | Votes | Agrawal Abhishek | Ahmadli Aydin | Balhar Jiří | Doubravová Petra | Eliáš Richard | Fischer Claire | Gokirmak Memduh | Henriette Bertrand | Houška Petr | Chalupa Michael | Chembrolu Surya Prakash | Ihnatchenko Bohdan | Jareš Antonín | Karella Tomáš | Kratochvíl Jonáš | Kremel Tomáš | Kumová Věra | Nekvinda Michal | Pilař Tomáš | Pospěch Michal | Procházka Štěpán | Shafiq Chaman | Schmidtová Patrícia | Souček Tomáš | Šerý Martin | Špaček Jan | Teste Alexis | Tryhubyshyn Iryna | Vainer Jan | Vandas Marek | OB | ZZ | |
2 | Ondrej Bojar | ob-SampleData | 120k | 1k | 4 | real, integer, categorical | I will bring the dataset | This is just a fake entry. The goal is to predict the color of the Teddy bear based on its measurements and properties (cuddliness etc.) | 74 | 3 | 3 | 0 | 3 | 0 | 3 | 3 | 3 | 0 | 0 | 3 | 0 | 0 | 3 | 3 | 4 | 3 | 0 | 3 | 3 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 7 | 0 | ||
3 | Aydin Ahmadli | Scene recognition | 2000 | 400 | 2 | real, integer | https://www.openml.org/d/312 | It contains characteristics about images, their classes - 6 different labels: {Beach, Sunset, FallFoliage, Field, Mountain, Urban}.Problem is binary classification. We have to decide whether image is 'Urban' or not. | R | 0 | |||||||||||||||||||||||||||||||||
4 | Aydin Ahmadli | Robot Navigation | 5400 | 4 | real,integer | https://www.openml.org/d/1526 | Given features such as 24 different sensor readings, we have to decide which action will robot take - 4 output classes : {Move-Forward, Slight-Right-Turn, Sharp-Right-Turn, Slight-Left-Turn} | R | 0 | 0 | |||||||||||||||||||||||||||||||||
5 | Aydin Ahmadli | ID recognition from Walking | 59k | 22 | real,integer | https://archive.ics.uci.edu/ml/datasets/User+Identification+From+Walking+Activity | Datas collected from Android smartphone positioned in the chest pocket of 22 participants. Input features : {Time-step, x acceleration, y acceleration, z acceleration}.... Output Classes: 22 User ID | 0 | 0 | ||||||||||||||||||||||||||||||||||
6 | Tomáš Kremel | Financial well-being survey | 6k | 5 | integer | https://www.consumerfinance.gov/data-research/financial-well-being-survey-data/ | A person’s financial well-being comes from their sense of financial security and freedom of choice—both in the present and when considering the future. The survey dataset includes respondents’ scores, as well as measures of individual and household characteristics. | C | 6 | 1 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||
7 | Tomáš Kremel | 3 million Russian troll tweets | 3M | text, time, enum, integer | https://github.com/fivethirtyeight/russian-troll-tweets/ | Data on nearly 3 million tweets sent from Twitter handles connected to the Internet Research Agency, a Russian "troll factory" and a defendant in an indictment filed by the Justice Department in February 2018, as part of special counsel Robert Mueller's Russia investigation. | C | 2 | 1 | 1 | ? | ||||||||||||||||||||||||||||||||
8 | Tomáš Kremel | Airbags | 26k | 2 | enum, integer, time | https://maths-people.anu.edu.au/~johnm/datasets/airbags/ | Did airbags, over 1997-2002 in the US, reduce accident risk? | 0 | |||||||||||||||||||||||||||||||||||
9 | Tomáš Karella | Grants | 6k per year (2006 - 2019) | 2(3) | text, enum, interger, real | https://data.gov.cz/datov%C3%A1-sada?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A1-sada%2Fhttp---opendata.praha.eu-api-3-action-package_show-id-mhmp-granty-2006 | Prague City Hall is sharing data about the grant requests. Every request contains information about the applicant, info about the project, the verdict and assigned money. | C | 0 | motion (SAVEE) database has been recorded as a pre-requisite for the development of an automatic emotion recognition system. The database consists of recordings from 4 male actors in 7 different emotions, 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically-balanced for each emotion. The data were recorded in a visual media lab with high quality audio-visual equipment, processed and labeled. To check the quality of performance, the | 0 | ||||||||||||||||||||||||||||||||
10 | Tomáš Karella | Car accidents | 100k per year (2007 - 2019) | 4 | text, enum, interger, real | https://www.policie.cz/clanek/statistika-nehodovosti-900835.aspx?q=Y2hudW09Mg%3d%3d | Police department of Czech republic shares information about car accidents. It includes the severity of the injuries. It would be possible to merge this dataset with the chmi weather statistics. | C | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||
11 | Tomáš Karella | DOHMH New York City Restaurant Inspection Results | 385k | 2 | text, enum, interger, real | https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j/data | Data describing restaurant inspections in New York. The target value could the closure of the restaurant, columns contains info about the location, type of restaurant, etc... | C | 0 | ||||||||||||||||||||||||||||||||||
12 | Věra Kumová | College Scorecard | 7k | 2 | enum, int, real | https://collegescorecard.ed.gov/data/ | Data about US schools. The target value could be one of flags variables - e.g. "Flag for women-only college" | P | 0 | ||||||||||||||||||||||||||||||||||
13 | Věra Kumová | Women, Business and the Law | 187 | 4 | 0-1 | https://datacatalog.worldbank.org/dataset/women-business-and-law | The study examines 35 questions in 8 eight areas (about equal opportunities for women), yes-no answears. Data are originally time series but one year could be also used for classification - each row is one country and the target value could be "Income group" of the country. | R | 0 | ||||||||||||||||||||||||||||||||||
14 | Věra Kumová | University students behaviour | 2k | 2 | enum, int, text | https://data.brno.cz/dataset/?id=sociologicky-vyzkum-chovani-studentu-vs | Survey among students in Brno. The target value could be the information, whether a student has additional income for study or not. | C | 6 | 1 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||
15 | Jan Špaček | Dancing | ~36k latin, ~33k standard | (see desc) | (see desc) | https://www.dancerank.cz | Results of Czech dancesport competitions, can be used to predict results of future competitions and/or national championships. To coerce the task into the simple ML formulation, we can generate a fixed number of features for each couple (# of competitions, # of finals, average time between competitions etc. before time t) and predict a discrete result (what class will the couple achieve after time t? how long will the couple continue dancing together after time t?). Alternately, similar data is available for international dance competitions (under WDSF). | 1 | 1 | ||||||||||||||||||||||||||||||||||
16 | Jan Špaček | Maturita | ~100 | (see desc) | real | Personal communication | Anonymized dataset of students from my high school. Given their results at the admission test (Scio), predict their maturita scores. To pose this as a classification task, we can discretize the scores in various ways (passed? average above 2?) | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||
17 | Jan Špaček | Lítačka | arbitrary | arbitrary | (see desc) | 5 real | http://opendata.praha.eu/dataset/jizdni-rady-pid | Given coordinates of start and target positions in Prague and time of departure, predict how long the trip takes using public transport. The dataset is generated artificially by sampling pairs of stops and times and computing the shortest route in the real timetables for Prague. This can also be posed as a classification task in various ways (is the route faster than 30 minutes? is there a route with no transfer? does the fastest route use the metro? what types of vehicles the fastest route uses?). | 1 | 1 | 0 | ||||||||||||||||||||||||||||||||
18 | Abhishek Agrawal | Formspring data labelled for cyberbullying | 2 | text | http://www.chatcoder.com/Data/DataReleaseDec2011.rar | The data represented 50 ids from Formspring.me that were crawled in Summer 2010. For each id, the profile information and each post (question and answer) was extracted. Each post was loaded into Amazon's Mechanical Turk and labeled by three workers for cyberbullying content. | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||||
19 | Abhishek Agrawal | Myspace Group Data labelled for Cyberbullying | 2 | text | http://www.chatcoder.com/Data/BayzickBullyingData.rar | The folder contains a small subset of data from crawl of myspace groups. The data has been manually labelled for bullying content by 3 independent coders. Each input file was split into a window of 10 posts each. Each window was judged to determine if there was cyberbullying content anywhere in the window. The labels are contained in separate files. For a window to be labelled as containing cyberbullying, at least 2 out of 3 users had to label it as cyberbullying. | 0 | ||||||||||||||||||||||||||||||||||||
20 | Jan Vainer | trashnet | 0 | 500 | 6 | int | https://github.com/garythung/trashnet | The dataset can be used to learn a classifier to recognize various kinds of trash such as glass or plastic bottles. The usefulness lies in the ability to sort trash automatically without human intervention. | R | 2 | 1 | 1 | |||||||||||||||||||||||||||||||
21 | Jan Vainer | Sound20 | 20000 | 4000 | 19 | real | https://github.com/ivclab/Sound20 | The dataset contains sample sounds of various animals (insects) and musical instruments. It could be used to classify the animals based on the sounds they make, to distinguish between animal sounds and instruments. The data are in the form of spectrograms of given sounds | R | 0 | |||||||||||||||||||||||||||||||||
22 | Jan Vainer | Savee | 1000 | 200 | 6 | real | http://kahlan.eps.surrey.ac.uk/savee/ | Database of audiovisual emotional speech. Emotion classification. | P | 2 | 1 | 1 | |||||||||||||||||||||||||||||||
23 | Alexis Teste | Parkinson's Disease Classification Data Set | 750 | 750 | integer, real | https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification | The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 (65.1±10.9) at the Department of Neurology in Cerrahpaşa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82 (61.1±8.9). During the data collection process, the microphone is set to 44.1 KHz and following the physician’s examination, the sustained phonation of the vowel /a/ was collected from each subject with three repetitions. | 2 | 1 | 1 | |||||||||||||||||||||||||||||||||
24 | Alexis Teste | mfeat-factors | 2000 | 10 | https://www.openml.org/d/12 | One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same original character. 200 instances per class (for a total of 2,000 instances) have been digitized in binary images. | 0 | ||||||||||||||||||||||||||||||||||||
25 | Alexis Teste | nursery | 10000 | 3000 | 5 | nominal | https://www.openml.org/d/26 | Nursery Database was derived from a hierarchical decision model originally developed to rank applications for nursery schools. It was used during several years in 1980's when there was excessive enrollment to these schools in Ljubljana, Slovenia, and the rejected applications frequently needed an objective explanation. The final decision depended on three subproblems: occupation of parents and child's nursery, family structure and financial standing, and social and health picture of the family. The model was developed within expert system shell for decision making DEX | 0 | ||||||||||||||||||||||||||||||||||
26 | Martin Šerý | Banknote authentication | 1000 | 300 | 2 | real | https://archive.ics.uci.edu/ml/datasets/banknote+authentication | Data were extracted from images that were taken for the evaluation of an authentication procedure for banknotes. | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||
27 | Martin Šerý | DOTA 2 game results | 7000 | 300 | 2 | categorical | https://archive.ics.uci.edu/ml/datasets/Dota2+Games+Results | Predict the result of the match based on initial draft of heroes in DOTA 2 game | 1 | 1 | |||||||||||||||||||||||||||||||||
28 | Martin Šerý | Student performance | 500 | 150 | 20 | categorical | https://archive.ics.uci.edu/ml/datasets/Student+Performance | Student performance analysis - predict final grade based on student grades, demographic, social and school related features. | 1 | 1 | |||||||||||||||||||||||||||||||||
29 | Iryna Tryhubyshyn | Cyber-Trolls detection | 20k | 2 | text | https://dataturks.com/projects/abhishek.narayanan/Dataset%20for%20Detection%20of%20Cyber-Trolls | The dataset contains tweets that are labeled whether they are aggressive or not | C | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||
30 | Iryna Tryhubyshyn | Programmer's salary survey | 9k | real, integer, categorical | I will translate the dataset in English https://github.com/devua/csv/tree/master/salaries | The dataset contains results of survey about salary of Ukrainian programmers. Features contains job position, programming language, age, city, work experience, company size and type(outsource, product, outstaff), education, English level and so on. Can be either classification or regression task. | C | 0 | |||||||||||||||||||||||||||||||||||
31 | Iryna Tryhubyshyn | Survey of programmers who emigrated from Ukraine | 1700 | 5 | real, integer, categorical | I will translate the dataset in English https://github.com/devua/csv/tree/master/relocation | The dataset contains information about life satisfaction, current salary, country and job position, purpuses of leaving Ukraine. More than 20 questions, mostly multichoice. We can try to predict life quality changes | C | 0 | ||||||||||||||||||||||||||||||||||
32 | Marek Vandas | sentence-language | 120k | 12 | integer, categorial | https://tatoeba.org/eng/downloads | The dataset contains list of sentences and language labels. Task is to categorize sentences to language. | C | 0 | ||||||||||||||||||||||||||||||||||
33 | Marek Vandas | section-category | 70k | 2-16 | real, integer | I will bring the dataset | The dataset contains images of cross sections (intersection of some 3d rigid beam with 2d plane), the task is to categorize cross sections to one of categories - categories can be seen at https://www.dlubal.com/-/media/Images/website/pages/solutions/online-tools/glossary/000014/01-en.png | P | 0 | ||||||||||||||||||||||||||||||||||
34 | Marek Vandas | spoken-command | 1500 * # of classes | 2-5000 | real, integer | https://github.com/JohannesBuchner/spoken-command-recognition | The dataset contains list of spoken command and is artificially created by software (variantions are based on noise and other sound filters). Task is to recognize command from spoken word. | R | 0 | ||||||||||||||||||||||||||||||||||
35 | Chembrolu Surya | Crowdsourced mapping dataset | 10545 | 300 | 6 | real | https://archive.ics.uci.edu/ml/datasets/Crowdsourced+Mapping | The dataset contains NDVI(vegetative index)values that are obtained over a period of 17 months and in particular on 27 different days. The data is collected using satellite imagery and OpenStreetMap and the task is to classify the land cover based NDVI values. There are six different classes: Farm, Forest, Grass, Orchid, Water, impervious | 0 | ||||||||||||||||||||||||||||||||||
36 | Chembrolu Surya | SkyTrax Review Dataset | over 17k | split from training samples | 2 | real,text,categorical | https://raw.githubusercontent.com/quankiquanki/skytrax-reviews-dataset/master/data/airport.csv | The dataset is obtained by scraping reviews given on skytrax airlines website and reviews are about experiences of different airports by the travellers. The dataset consists of recommended column with either 0 or 1 implying whether the particular airport is recommended or not. | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||
37 | Chembrolu Surya | Thyroid Disease Dataset | 7200 | split from training samples | 3 | real,integer | https://sci2s.ugr.es/keel/dataset_smja.php?cod=1179#sub1 | The task is to detect thyroid condition of a patient which can be 1 for normal, 2 for hyperthyroid, 3 for hypothyroid | 0 | ||||||||||||||||||||||||||||||||||
38 | Štěpán Procházka | underlying-distribution | 10k / arbitrarily many | 2k / arbitrarily many | 5-10 | real | artificially generated | The dataset examples will be real valued vectors generated by some distribution (e.g., normal, uniform, etc.), the goal is to classify which distribution generated an example (may be thought of as trying to find out what is a distribution of a random variable from a fixed width sequence of independent trials) | P/C | 0 | |||||||||||||||||||||||||||||||||
39 | Štěpán Procházka | fps-cheater-detection | 1K | 100 | 2 | real, integer | https://github.com/Nexosis/sampledata/blob/master/csgo-small.csv + some new data if someone knows where to get them | The goal of this task is to distinguish between cheaters and non cheaters in FPS game based on their in-game statistics (accuracy, K/D ratio etc.). The choice of the exact game title may be different, if better data are available. | P | 2 | 1 | 1 | |||||||||||||||||||||||||||||||
40 | Štěpán Procházka | fake-news-classification | 1K+ | 100+ | 2 | categorical, integer, real | may be scraped from pages similar to this one https://www.politifact.com/personalities/donald-trump/statements/by/ | The goal is to tell if the text (FB post, tweet) presents objective information, based on extracted features - length of text, number of emojis, uppercase-to-lowercase character ratio etc. The data may be collected from various sources (public social media accounts etc.) | P | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||
41 | Michal Pospěch | czech-presidental-election | 1086 | taken from training sample | 10 | real, integer, categorical | Taken from CVVM | Based on various demographic and socio-economic indicators try to predict who did people vote for in the first round of Czech presidential election 2018 | C, processing needed though | 14 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||||||||||||||
42 | Michal Pospěch | star-wars | 1200~ | taken from training sample | 3 | categorical | https://github.com/fivethirtyeight/data/blob/master/star-wars-survey/StarWars.csv | The goal is to predict who do the respondents think shot first, Han or Greedo. | C | 1 | 1 | ||||||||||||||||||||||||||||||||
43 | Michal Pospěch | mlb | 170k~ | taken from training sample | 2 | real | https://github.com/fivethirtyeight/data/tree/master/mlb-elo | Predict winner of MLB match based on ratings of teams and their pitchers | C, some processing needed | 0 | |||||||||||||||||||||||||||||||||
44 | Memduh Gokirmak | music genre identification | 5-10 | real | Identify the genre of a musical audio file | 1 | 1 | 0 | |||||||||||||||||||||||||||||||||||
45 | Memduh Gokirmak | document word segmentation | 2 | real | find the boundaries of words in images of text | 0 | |||||||||||||||||||||||||||||||||||||
46 | Memduh Gokirmak | text difficulty evaluation | ~5 | real | MICUSP or something else | assign a difficulty level for readers to a natural language text | P | 0 | |||||||||||||||||||||||||||||||||||
47 | Petr Houška | fitts-nail097-houska | 2000 | taken from training sample | 4 | real | https://github.com/petrroll/NAIL087-fitts-exp/tree/master/data | The goal is to predict participant id based on fitts' experiment results (speed of click, length, and size) | 1 | 1 | |||||||||||||||||||||||||||||||||
48 | Petr Houška | Audit Data Data Set | 777 | taken from training sample | 2 | real, integer, cat | https://archive.ics.uci.edu/ml/datasets/Audit+Data# | The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. | 0 | ||||||||||||||||||||||||||||||||||
49 | Petr Houška | Militarized Interstate Disputes v4.2 | ~2000 | taken from training sample | 20 | categorical, real, dates | http://www.correlatesofwar.org/data-sets/MIDs | The goal of the dataset is to predict outcome of a battle/war/skyrmish based on prdictors from 1993-2010. | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||
50 | Petra Doubravová | General mortality | ~300000 | taken from training sample | real, integer, categorical | https://ec.europa.eu/eurostat/web/health/data/database | May be used for obtaining informations about mortality in different countries, their causes depending on age, sex and other features and real use is in prevention | C | 0 | ||||||||||||||||||||||||||||||||||
51 | Petra Doubravová | Sentiment analysis | ~400 maybe more, can be extended | taken from training sample | 3 | text in czech | my data | Classification of facebook posts mostly on big bank - negative, postive, neutral, especialy in czech language | C | 2 | 1 | 1 | |||||||||||||||||||||||||||||||
52 | Petra Doubravová | 0 | |||||||||||||||||||||||||||||||||||||||||
53 | Abhishek Agrawal | Musk dataset | 6598 | taken from training sample | 2 | Integer | https://archive.ics.uci.edu/ml/datasets/Musk+(Version+2) | This dataset describes a set of 102 molecules of which 39 are judged by human experts to be musks and the remaining 63 molecules are judged to be non-musks. The goal is to learn to predict whether new molecules will be musks or non-musks. However, the 166 features that describe these molecules depend upon the exact shape, or conformation, of the molecule. Because bonds can rotate, a single molecule can adopt many different shapes. To generate this data set, all the low-energy conformations of the molecules were generated to produce 6,598 conformations. Then, a feature vector was extracted that describes each conformation. When learning a classifier for this data, the classifier should classify a molecule as "musk" if ANY of its conformations is classified as a musk. A molecule should be classified as "non-musk" if NONE of its conformations is classified as a musk. | 0 | ||||||||||||||||||||||||||||||||||
54 | Jonáš Kratochvíl | Movie recommendation | 90000 | 10000 | 32 | real, cat | https://ufal.mff.cuni.cz/courses/npfl054/materials | IMDb movie database dataset | C | 0 | |||||||||||||||||||||||||||||||||
55 | Jonáš Kratochvíl | Human machine dialogue prediction | 1000 | 100 | n^3 | real, cat | Predicting price range location and type of food based on human machine dialogue | P | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||
56 | Jonáš Kratochvíl | Signal/noise classification | 800000 | 20000 | 35 | real, cat | http://opendata.cern.ch/record/328 | Predict whether CERN detector sensors a noise or signal based of various measurements | C | 0 | |||||||||||||||||||||||||||||||||
57 | Ondrej adds other possible sources, please enter your name in the Proposer column if you like the dataset and volunteer to review it. Then please add all the other details here, in the row. Highlighted ones seem very interesting. | 0 | |||||||||||||||||||||||||||||||||||||||||
58 | https://linked.opendata.cz/dataset/czso-deaths-by-selected-causes-of-death | Time series of death reasons; some conversion would be needed for classification. To see the data, select "Prejit na datovy zdroj" from the last "Prozkoumat" drop-down menu. The data will be downloaded | 0 | ||||||||||||||||||||||||||||||||||||||||
59 | https://linked.opendata.cz/dataset/czso-job-applicants | Unemployed registered people counts by region; but perhaps hard to use for classification | 0 | ||||||||||||||||||||||||||||||||||||||||
60 | 0 | ||||||||||||||||||||||||||||||||||||||||||
61 | Various datasets from | https://data.gov.cz/datov%C3%A9-sady | ...many sources from Czech Republic | 0 | |||||||||||||||||||||||||||||||||||||||
62 | http://opendata.praha.eu/dataset | ...Prague sources | 0 | ||||||||||||||||||||||||||||||||||||||||
63 | Chembrolu Surya | https://www.netmetr.cz/open-data.html | Results of internet speed tests over a longer period of time; the goal could be to predict internet connection type (LAN, G4, ...) | 0 | |||||||||||||||||||||||||||||||||||||||
64 | 0 | ||||||||||||||||||||||||||||||||||||||||||
65 | https://data.gov.cz/datov%C3%A1-sada?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A1-sada%2Fhttp---opendata.praha.eu-api-3-action-package_show-id-ipr-bonita_klimatu_z_hlediska_prirozene_ventilace_uzemi | This dataset alone is just a map, indicating air bonity (not exactly air quality but the speed of air change, so that immission has perhaps not so bad effects). It would be fabulous to link it with some categorical description of the area (family houses, skyscrapers, ..) and predict air bonity based on "a picture" (i.e. to local observations; not that you would be processing pictures) and altitude | 0 | ||||||||||||||||||||||||||||||||||||||||
66 | http://www.geoportalpraha.cz/cs/fulltext_geoportal | Search for 'bonita' to see other possible maps/data. OR do not enter any keyword and only select e.g. 'budovy' | 0 | ||||||||||||||||||||||||||||||||||||||||
67 | https://data.gov.cz/datov%C3%A1-sada?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A1-sada%2F3751165 | Hundreds of thousands of tenders (verejne zakazky), including evaluation criteria, texts, etc. We could change this into various tasks, e.g. predicting price range based on keywords from the description, predicting relevant evaluation criteria based on keywords from the description or based on who is proposing the tender (i.a. to find e.g. municipalities known for obscure practices etc.) | 0 | ||||||||||||||||||||||||||||||||||||||||
68 | https://golemio.cz/cs/oblasti | Various datasets on Prague | 0 | ||||||||||||||||||||||||||||||||||||||||
69 | Tomáš Souček | Votes from Czech Parlament | approx. 10-60k (x200) | 2 | https://www.psp.cz/sqw/hp.sqw?k=1300 | Votes of all deputies from Czech Parlament from 1993 to present.. can be used to predict how one or more deputies would vote given votes of other deputies | C | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||
70 | 0 | ||||||||||||||||||||||||||||||||||||||||||
71 | 0 | ||||||||||||||||||||||||||||||||||||||||||
72 | 0 | ||||||||||||||||||||||||||||||||||||||||||
73 | 0 | ||||||||||||||||||||||||||||||||||||||||||
74 | 0 | ||||||||||||||||||||||||||||||||||||||||||
75 | 0 | ||||||||||||||||||||||||||||||||||||||||||
76 | 0 | ||||||||||||||||||||||||||||||||||||||||||
77 | 0 | ||||||||||||||||||||||||||||||||||||||||||
78 | 0 | ||||||||||||||||||||||||||||||||||||||||||
79 | 0 | ||||||||||||||||||||||||||||||||||||||||||
80 | 0 | ||||||||||||||||||||||||||||||||||||||||||
81 | 0 | ||||||||||||||||||||||||||||||||||||||||||
82 | 0 | ||||||||||||||||||||||||||||||||||||||||||
83 | 0 | ||||||||||||||||||||||||||||||||||||||||||
84 | 0 | ||||||||||||||||||||||||||||||||||||||||||
85 | 0 | ||||||||||||||||||||||||||||||||||||||||||
86 | 0 | ||||||||||||||||||||||||||||||||||||||||||
87 | 0 | ||||||||||||||||||||||||||||||||||||||||||
88 | 0 | ||||||||||||||||||||||||||||||||||||||||||
89 | 0 | ||||||||||||||||||||||||||||||||||||||||||
90 | 0 | ||||||||||||||||||||||||||||||||||||||||||
91 | 0 | ||||||||||||||||||||||||||||||||||||||||||
92 | 0 | ||||||||||||||||||||||||||||||||||||||||||
93 | 0 | ||||||||||||||||||||||||||||||||||||||||||
94 | 0 | ||||||||||||||||||||||||||||||||||||||||||
95 | 0 | ||||||||||||||||||||||||||||||||||||||||||
96 | 0 | ||||||||||||||||||||||||||||||||||||||||||
97 | 0 | ||||||||||||||||||||||||||||||||||||||||||
98 | 0 | ||||||||||||||||||||||||||||||||||||||||||
99 | 0 | ||||||||||||||||||||||||||||||||||||||||||
100 | 0 |